Titan is a scalable graph database that can distribute and query graph data across multiple machines. This presentation provides a general introduction to graph computing and Titan in particular. It also focuses on some recent development for Titan 0.9 and TinkerPop 3.]]>

Titan is a scalable graph database that can distribute and query graph data across multiple machines. This presentation provides a general introduction to graph computing and Titan in particular. It also focuses on some recent development for Titan 0.9 and TinkerPop 3.]]>

Presents Titan and Faunus at the Gitpro conference help April 12, 2014.]]>

Presents Titan and Faunus at the Gitpro conference help April 12, 2014.]]>

Slides from the meetup presentation in NYC (March 2014). Covers the current version of Titan and Faunus.]]>

Slides from the meetup presentation in NYC (March 2014). Covers the current version of Titan and Faunus.]]>

Problem solving in the 21st century increasingly depends on the analysis of complex systems. Developing new drugs, understanding risk in financial networks, searching for answers in knowledge graphs, personalization and recommendation in social networks all require the analysis of systems composed of interconnected entities that exhibit complex behavior as a whole. Graph computing provides a conceptual model and practical platform for developing such analyses. This talk presents graph computing as an important component of every developer’s toolbox. We introduce the Aurelius graph cluster which is an open-source stack enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop. This stack addresses challenging problems in graph partitioning, graph query language design and graph algorithm development with solutions inspired by physics, biology and neuroscience.]]>

Problem solving in the 21st century increasingly depends on the analysis of complex systems. Developing new drugs, understanding risk in financial networks, searching for answers in knowledge graphs, personalization and recommendation in social networks all require the analysis of systems composed of interconnected entities that exhibit complex behavior as a whole. Graph computing provides a conceptual model and practical platform for developing such analyses. This talk presents graph computing as an important component of every developer’s toolbox. We introduce the Aurelius graph cluster which is an open-source stack enabling graph computing at scale by building on distributed systems like Cassandra, HBase, and Hadoop. This stack addresses challenging problems in graph partitioning, graph query language design and graph algorithm development with solutions inspired by physics, biology and neuroscience.]]>

This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century. Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.]]>

This presentation introduces Titan, Faunus, and scalable graph computing in general. We present a case study of how Pearson builds an education social network on top of Titan, Faunus, and Cassandra to support learning in the 21st century. Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Faunus is an open source global graph processing engine build on top of Hadoop and compatible with Cassandra that can analyze graphs, compute graph statistics, and execute global traversals. Titan and Faunus are components of the Aurelius Graph Cluster which enables scalable graph computation and powers applications in social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security.]]>

An introduction to graph databases and graph computing frameworks in general and overview of the Aurelius graph cluster in particular. Discusses Titan and Faunus and demonstrates how to build a knowledge graph using the cluster. This presentation was given at Data Day Texas in 2013. http://datadaytexas.com/]]>

An introduction to graph databases and graph computing frameworks in general and overview of the Aurelius graph cluster in particular. Discusses Titan and Faunus and demonstrates how to build a knowledge graph using the cluster. This presentation was given at Data Day Texas in 2013. http://datadaytexas.com/]]>

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems. This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.]]>

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems. This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.]]>

The problems we are faced with in the 21st century require efficient analysis of ever more complex systems. This presentation outlines how such problems can be better understood and effectively solved if they are modeled as graphs or networks. We present two tools for to help solve such problems at scale: Titan, which is a real-time distributed graph database based on Apache Cassandra and Hbase and Faunus, which is a batch analytics framework for graphs based on Apache Hadoop. We discuss their current development status as of November 2012 and illustrate an example application for the GitHub coding network.]]>

The problems we are faced with in the 21st century require efficient analysis of ever more complex systems. This presentation outlines how such problems can be better understood and effectively solved if they are modeled as graphs or networks. We present two tools for to help solve such problems at scale: Titan, which is a real-time distributed graph database based on Apache Cassandra and Hbase and Faunus, which is a batch analytics framework for graphs based on Apache Hadoop. We discuss their current development status as of November 2012 and illustrate an example application for the GitHub coding network.]]>

This poster describes an efficient approach to maintaining multiple views on large, evolving social networks. Abstract: The Social Semantic Web (SSW) refers to the mix of RDF data in web content, and social network data associated with those who posted that content. Applications to monitor the SSW are becoming increasingly popular. For instance, marketers want to look for semantic patterns relating to the content of tweets and Facebook posts relating to their products. Such applications allow multiple users to specify patterns of interest, and monitor them in real-time as new data gets added to the web or to a social network. In this paper, we develop the concept of SSW view servers in which all of these types of applications can be simultaneously monitored from such servers. The patterns of interest are views. We show that a given set of views can be compiled in multiple possible ways to take advantage of common substructures, and define the concept of an optimal merge. We develop a very fast MultiView algorithm that scalably and efficiently maintains multiple subgraph views. We show that our algorithm is correct, study its complexity, and experimentally demonstrate that our algorithm can scalably handle updates to hundreds of views on real-world SSW databases with up to 540M edges.]]>

This poster describes an efficient approach to maintaining multiple views on large, evolving social networks. Abstract: The Social Semantic Web (SSW) refers to the mix of RDF data in web content, and social network data associated with those who posted that content. Applications to monitor the SSW are becoming increasingly popular. For instance, marketers want to look for semantic patterns relating to the content of tweets and Facebook posts relating to their products. Such applications allow multiple users to specify patterns of interest, and monitor them in real-time as new data gets added to the web or to a social network. In this paper, we develop the concept of SSW view servers in which all of these types of applications can be simultaneously monitored from such servers. The patterns of interest are views. We show that a given set of views can be compiled in multiple possible ways to take advantage of common substructures, and define the concept of an optimal merge. We develop a very fast MultiView algorithm that scalably and efficiently maintains multiple subgraph views. We show that our algorithm is correct, study its complexity, and experimentally demonstrate that our algorithm can scalably handle updates to hundreds of views on real-world SSW databases with up to 540M edges.]]>

Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Graphs are a versatile data model for capturing and analyzing rich relational structures. Graphs are an increasingly popular way to represent data in a wide range of domains such as social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security. This presentation discusses Titan's data model, query language, and novel techniques in edge compression, data layout, and vertex-centric indices which facilitate the representation and processing of Big Graph Data across a Cassandra cluster. We demonstrate Titan's performance on a large scale benchmark evaluation using Twitter data. Presented at the Cassandra 2012 Summit. ]]>

Titan is an open source distributed graph database build on top of Cassandra that can power real-time applications with thousands of concurrent users over graphs with billions of edges. Graphs are a versatile data model for capturing and analyzing rich relational structures. Graphs are an increasingly popular way to represent data in a wide range of domains such as social networking, recommendation engines, advertisement optimization, knowledge representation, health care, education, and security. This presentation discusses Titan's data model, query language, and novel techniques in edge compression, data layout, and vertex-centric indices which facilitate the representation and processing of Big Graph Data across a Cassandra cluster. We demonstrate Titan's performance on a large scale benchmark evaluation using Twitter data. Presented at the Cassandra 2012 Summit. ]]>

Users querying massive social networks or RDF databases are often not 100% certain about what they are looking for due to the complexity of the query or heterogeneity of the data. In this paper, we propose “probabilistic subgraph” (PS) queries over a graph/network database, which afford users great flexibility in specifying “approximately” what they are looking for. We formally define the probability that a substitution satisfies a PS-query with respect to a graph database. We then present the PMATCH algorithm to answer such queries and prove its correctness. Our experimental evaluation demonstrates that PMATCH is efficient and scales to massive social networks with over a billion edges.]]>

Users querying massive social networks or RDF databases are often not 100% certain about what they are looking for due to the complexity of the query or heterogeneity of the data. In this paper, we propose “probabilistic subgraph” (PS) queries over a graph/network database, which afford users great flexibility in specifying “approximately” what they are looking for. We formally define the probability that a substitution satisfies a PS-query with respect to a graph database. We then present the PMATCH algorithm to answer such queries and prove its correctness. Our experimental evaluation demonstrates that PMATCH is efficient and scales to massive social networks with over a billion edges.]]>

]]>

]]>

This presentation gives an overview of Probabilistic Soft Logic and introduces some application areas.]]>

This presentation gives an overview of Probabilistic Soft Logic and introduces some application areas.]]>

Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence. ]]>

Continuous Markov random fields are a general formalism to model joint probability distributions over events with continuous outcomes. We prove that marginal computation for constrained continuous MRFs is #P-hard in general and present a polynomial-time approximation scheme under mild assumptions on the structure of the random field. Moreover, we introduce a sampling algorithm to compute marginal distributions and develop novel techniques to increase its efficency. Continuous MRFs are a general purpose probabilistic modeling tool and we demonstrate how they can be applied to statistical relational learning. On the problem of collective classification, we evaluate our algorithm and show that the standard deviation of marginals serves as a useful measure of confidence. ]]>

]]>

]]>

Slides presenting our work on COSI at the ASONAM conference 2010 Note: The images used in this presentation are copyright by the respective owners as indicated with the picture. Pictures used are either CC or fair use. Please notify the author if you feel that your images are unfairly used in this presentation.]]>

Slides presenting our work on COSI at the ASONAM conference 2010 Note: The images used in this presentation are copyright by the respective owners as indicated with the picture. Pictures used are either CC or fair use. Please notify the author if you feel that your images are unfairly used in this presentation.]]>

]]>

]]>

]]>

]]>