Advertisement

Titan: The Rise of Big Graph Data

Founder at RReduX, Inc.
Jun. 14, 2012
Advertisement

More Related Content

Advertisement

Titan: The Rise of Big Graph Data

  1. TITAN THE RISE OF BIG GRAPH DATA MARKO A. RODRIGUEZ MATTHIAS BROECHELER http://THINKAURELIUS.COM
  2. ABSTRACT A graph is a data structure composed of vertices/dots and edges/lines. A graph database is a software system used to persist and process graphs. The common conception in today's database community is that there is a tradeoff between the scale of data and the complexity/interlinking of data. To challenge this understanding, Aurelius has developed Titan under the liberal Apache 2 license. Titan supports both the size of modern data and the modeling power of graphs to usher in the era of Big Graph Data. Novel techniques in edge compression, data layout, and vertex-centric indices that exploit significant orders are used to facilitate the representation and processing of a single atomic graph structure across a multi-machine cluster. To ensure ease of adoption by the graph community, Titan natively implements the TinkerPop 2 Blueprints API. This presentation will review the graph landscape, Titan's techniques for scale by distribution, and a collection of satellite graph technologies to be released by Aurelius in the coming summer months of 2012.
  3. SPEAKER BIOGRAPHIES Dr. Marko A. Rodriguez is the founder of the graph consulting firm Aurelius. He has focused his academic and commercial career on the theoretical and applied aspects of graphs. Marko is a cofounder of TinkerPop and the primary developer of the Gremlin graph traversal language. Dr. Matthias Broecheler has been researching and developing large-scale graph database systems for many years in both academia and in his role as a cofounder of the Aurelius graph consulting firm. He is the primary developer of the distributed graph database Titan. Matthias focuses most of his time and effort on novel OLTP and OLAP graph processing solutions.
  4. SPONSORS As the leading education services company, Pearson is serious about evolving how the world learns. We apply our deep education experience and research, invest in innovative technologies, and promote collaboration throughout the education ecosystem. Real change is our commitment and its results are delivered through connecting capabilities to create actionable, scalable solutions that improve access, affordability, and achievement. Aurelius is a team of software engineers and scientists committed to applying graph theory and network science to problems in numerous domains. Aurelius develops the theory and technology whereby graphs can be used to model, understand, predict, and influence the behavior of complex, interrelated social, economic, and physical networks. Jive is the pioneer and world's leading provider of social business solutions. Our products apply powerful technology that helps people connect, communicate and collaborate to get more work done and solve their biggest business challenges. Millions of users and many of the worldʼs most successful companies rely on Jive day in and day out to get work done, serve their customers and stay ahead of their competitors.
  5. OUTLINE 1. ThE GRAPH LANDSCAPE An introduction to graph computing. Graph technologies on the market today. 2. INTRODUCTION TO TITAN Getting up and running with Titan. Titan's techniques for scalability. 3. THE FUTURE OF AURELIUS Satellite technologies and the OLAP story. The graph landscape reprise.
  6. PART 1: ThE GRAPH LANDSCAPE MARKO A. RODRIGUEZ
  7. GRAPH
  8. EDGE VERTEX GRAPH
  9. EDGE VERTEX GRAPH G = (V, E) Graph Vertices Edges
  10. G = (V, E) Classic Textbook Graph Structure
  11. A homogenous set of vertices... V
  12. ...connected by a homogenous set of edges. E
  13. RESTRICTED MODELING People and follows relationships...
  14. RESTRICTED MODELING People and follows relationships... ...xor webpages and citations.
  15. AN INTEGRATED MODEL IS TYPICALLY DESIRED references createdBy follows references references follows mentions
  16. AN INTEGRATED MODEL IS USEFUL references createdBy follows references references follows mentions Allows for more interesting/novel algorithms. (beyond "textbook" graph algorithms) Allows for a universal model of things and their relationships. (a single, unified model of a domain of interest)
  17. THE PROPERTY GRAPH G = (V, E, λ) Current Popular Graph Structure * Directed, attributed, edge-labeled graph * Multi-relational graph with key/value pairs on the elements
  18. VERTEX
  19. PROPERTIES name:hercules VERTEX
  20. PROPERTIES KEY VALUE name:hercules VERTEX
  21. name:hercules
  22. name:hercules mother name:alcmene type:human
  23. name:hercules LABEL mother EDGE name:alcmene type:human
  24. name:hercules mother name:alcmene type:human
  25. name:hercules mother father name:jupiter name:alcmene type:god type:human
  26. IS HERCULES A DEMIGOD? DEMIGOD = HALF HUMAN + HALF GOD name:hercules mother father name:jupiter name:alcmene type:god type:human
  27. name:hercules mother father name:jupiter name:alcmene type:god type:human gremlin> hercules ==>v[0]
  28. name:hercules mother father name:jupiter name:alcmene type:god type:human gremlin> hercules.out('mother','father') ==>v[1] ==>v[2]
  29. DEMIGOD = HALF HUMAN + HALF GOD name:hercules mother father name:jupiter name:alcmene type:god type:human gremlin> hercules.out('mother','father').type ==>human ==>god
  30. DEMIGOD = HALF HUMAN + HALF GOD name:hercules type:demigod mother father name:jupiter name:alcmene type:god type:human gremlin> hercules.type = 'demigod' ==>demigod
  31. COMPUTING PROCESS STRUCTURE
  32. COMPUTING PROCESS STRUCTURE TRAVERSAL GRAPH
  33. COMPUTING PROCESS STRUCTURE TRAVERSAL GRAPH COMPUTING GRAPH-BASED
  34. WhY GRAPH-BASED COMPUTING?
  35. WhY GRAPH-BASED COMPUTING? INTUITIVE MODELING
  36. WhY GRAPH-BASED COMPUTING? INTUITIVE MODELING EXPRESSIVE QUERYING
  37. WhY GRAPH-BASED COMPUTING? INTUITIVE MODELING EXPRESSIVE QUERYING NUMEROUS ANALYSES Mixing Patterns Ranking Inference Motifs Path Expressions Centrality Scoring Geodesics
  38. ANALYSES ARE THE EPIPHENOMENA OF TRAVERSAL f( )→
  39. WHAT IS THE SIGNIFICANCE OF GRAPH ANALYSIS?
  40. ANALYSES YIELD INSIGHTS ABOUT THE MODEL TA TS D A UC OD PR = DE DATA CIS -D ION RIV SU EN PP OR T
  41. RECOMMENDATION People you may know. SOCIAL GRAPH Products you might like. RATINGS GRAPH Movies you should watch and SOCIAL+RATINGS the friends you should watch them with. GRAPH
  42. WHO ELSE MIGHT HERCULES KNOW? cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows hydra jupiter knows 3 6
  43. cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows hydra jupiter knows 3 6 gremlin> hercules ==>v[0]
  44. cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows hydra jupiter knows 3 6 gremlin> hercules.out('knows') ==>v[1] ==>v[2] ==>v[3]
  45. cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows hydra jupiter knows 3 6 gremlin> hercules.out('knows').out('knows') ==>v[4] ==>v[5] ==>v[5] ==>v[6] ==>v[5]
  46. cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows hydra jupiter knows 3 6 gremlin> hercules.out('knows').out('knows').groupCount.cap ==>v[4]=1 ==>v[5]=3 ==>v[6]=1
  47. HERCULES PROBABLY KNOWS NEPTUNE cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows hydra jupiter knows 3 6 knows
  48. HERCULES PROBABLY KNOWS NEPTUNE PH cerberus pluto knows A 1 4 knows knows E" GR YL hercules nemean neptune ST knows knows 0 2 5 K OO knows knows EX TB hydra jupiter knows "T 3 6 IS A IS knows TH
  49. HERCULES PROBABLY KNOWS NEPTUNE cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father ...PROBABLY MORE SO WHEN OTHER TYPES OF EDGES ARE ANALYZED
  50. cerberus pluto knows 1 4 knows knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father
  51. cerberus pluto knows 1 4 knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father
  52. cerberus pluto knows 1 4 knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH
  53. human flesh 7 cerberus pluto knows 1 4 knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH
  54. likes human flesh 7 likes cerberus pluto knows 1 4 knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH
  55. tartarus 8 likes human flesh 7 likes cerberus pluto knows 1 4 knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH
  56. tartarus 8 likes human flesh likes likes 7 likes cerberus pluto knows 1 4 dislikes knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH
  57. tartarus 8 RATINGS GRAPH likes human flesh likes likes 7 likes cerberus pluto knows 1 4 dislikes knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH
  58. NEMEAN MIGHT LIKE TARTARUS PRODUCT GRAPH tartarus smellsOf 8 RATINGS GRAPH likes human flesh likes likes 7 likes cerberus pluto knows 1 4 dislikes composedOf knows likes knows hercules nemean neptune knows knows 0 2 5 knows knows brother hydra jupiter knows 3 6 father SOCIAL GRAPH * Collaborative Filtering + Content-Based Recommendation
  59. PATH FINDING How is this person related to this film? MOVIE GRAPH Which authors of this book also BOOK GRAPH wrote a New York Times bestseller? Which movies are based on a book by a MOVIE+BOOK New York Times bestseller? GRAPH
  60. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  61. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york gremlin> hercules ==>v[0]
  62. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york gremlin> hercules.out('depictedIn') ==>v[7]
  63. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie hercules in new york gremlin> hercules.out('depictedIn').as('movie') ==>v[7]
  64. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') ==>v[8] ==>v[10]
  65. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role') ==>v[0] ==>v[6]
  66. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules) ==>v[0]
  67. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2) ==>v[8]
  68. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') ==>v[9]
  69. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie star hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star') ==>v[9]
  70. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie star hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star').select ==>[movie:v[7], star:v[9]]
  71. WHO PLAYED HERCULES IN WHAT MOVIE? jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 movie star hercules in new york gremlin> hercules.out('depictedIn').as('movie').out('hasActor') .out('role').retain(hercules).back(2).out('actor') .as('star').select{it.name} ==>[movie:hercules in new york, star:arnold schwarzenegger]
  72. jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  73. jupiter hercules 6 0 depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  74. jupiter hercules depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  75. fred saberhagen 13 writtenBy jupiter hercules depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  76. fred albuquerque saberhagen livesIn 14 13 writtenBy jupiter hercules depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  77. fred santa fe albuquerque saberhagen 25-North livesIn 15 14 13 writtenBy jupiter hercules depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  78. marko fred rodriguez santa fe albuquerque saberhagen livesIn 25-North livesIn 16 15 14 13 writtenBy jupiter hercules depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  79. marko fred rodriguez santa fe albuquerque saberhagen livesIn 25-North livesIn 16 15 14 13 thinksHeIs writtenBy jupiter hercules depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york
  80. TRANSPORTATION GRAPH marko fred rodriguez santa fe albuquerque saberhagen livesIn 25-North livesIn 16 15 14 13 thinksHeIs BOOK GRAPH writtenBy PROFILE jupiter hercules GRAPH depictedIn the arms of 6 0 12 hercules depictedIn role role depictedIn ernest arnold graves schwarzenegger actor hasActor hasActor actor 11 10 7 8 9 hercules in new york MOVIE GRAPH
  81. SOCIAL INFLUENCE Who are the most influential people in java, mathematics, art, surreal art, politics, ...? Which region of the social graph will propagate this advertisement this furthest? Which 3 experts should review this submitted article? Which people should I talk to at the upcoming conference and what topics should I talk to them about? SOCIAL + COMMUNICATION + EXPERTISE + EVENT GRAPH
  82. PATTERN IDENTIFICATION This connectivity pattern is a sign of financial fraud. When this motif is found, a red flag will be raised. TRANSACTION GRAPH Healthy discourse is typified by a discussion board with a branch factor in this range and a concept clique score in this range. DISCUSSION GRAPH
  83. KNOWLEDGE DISCOVERY The terms "ice", "fans", "stanley cup," WIKIPEDIA GRAPH are classified as "sports" Given that all identified birds fly, it can be deduced that all birds fly. If contrary evidence is provided, EVIDENTIAL LOGIC GRAPH then this "fact" can be retracted.
  84. WORLD MODEL
  85. WORLD PROCESSES WORLD MODEL
  86. WORLD PROCESSES WORLD MODEL A single world model and various types of traversers moving through that model to solve problems.
  87. COMPUTING PROCESS STRUCTURE TRAVERSAL GRAPH COMPUTING GRAPH-BASED
  88. GRAPH COMPUTING ENGINES
  89. MEMORY-BASED GRAPHS Graph Framework Application NetworkX http://networkx.lanl.gov/ iGraph http://igraph.sourceforge.net/ JUNG http://jung.sourceforge.net/
  90. DISK-BASED GRAPHS Graph Database Neo4j Application Application http://neo4j.org/ Application OrientDB http://orientdb.org InfiniteGraph http://objectivity.com DEX http://www.sparsity-technologies.com/dex
  91. CLUSTER-BASED GRAPHS Bulk Synchronous Parallel Processing Application Application Application Hama 3 http://incubator.apache.org/hama/ 2 1 Giraph http://incubator.apache.org/giraph/ GoldenOrb http://goldenorbos.org/ * In the same spirit as Google's Pregel
  92. MEMORY-bASED GRAPHS Graph size is constrained by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. * Based on typical behavior
  93. MEMORY-bASED GRAPHS Graph size is constrained by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. DISK-BASED GRAPHS Graph size is constrained by local disk. Optimized for local graph algorithms. Oriented towards property graphs. * Based on typical behavior
  94. MEMORY-bASED GRAPHS Graph size is constrained by local machine's RAM. Rich graph algorithm and visualization packages. Oriented towards "textbook-style" graphs. DISK-BASED GRAPHS Graph size is constrained by local disk. Optimized for local graph algorithms. Oriented towards property graphs. CLUSTER-BASED GRAPHS Graph size is constrained to cluster's total RAM. Optimized for global graph algorithms. Oriented towards "textbook-style" graphs. * Based on typical behavior
  95. TINKERPOP Support for various graph vendors Open source graph product group * Encompassing the various graph computing styles Simple, well-defined products Provides a vendor-agnostic graph framework http://tinkerpop.com * Based on future directions
  96. TINKERPOP Graph Server Graph Algorithms Object-Graph Mapper Traversal Language Dataflow Processing http://tinkerpop.com Generic Graph API http://${project.name}.tinkerpop.com
  97. TINKERPOP INTEGRATION http://tinkerpop.com
  98. AND NOW THERE IS ANOTHER...
  99. TITAN
  100. PART 2: INTRODUCTION TO TITAN MATTHIAS BROECHELER
  101. WhY CREATE TITAN? A number of Aurelius' clients... ...need to represent and process graphs at the 100+ billion edge scale w/ thousands of concurrent transactions. ...need both local graph traversals (OLTP) and batch graph processing (OLAP). ...desire a free, open source distributed graph database.
  102. TITAN's KEY FEATURES Titan provides... ..."infinite size" graphs and "unlimited" users by means of a distributed storage engine. ...real-time local traversals (OLTP) and support for global batch processing via Hadoop (OLAP). ...distribution via the liberal, free, open source Apache2 license.
  103. matthias$
  104. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$
  105. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$
  106. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$ cd titan titan$
  107. matthias$ wget http://thinkaurelius/titan.zip % Total % Received % Xferd Average Speed Time Time 100 99999 0 99999 0 0 11078 0 --:--:-- 0:01:01 matthias$ unzip titan.zip Archive: titan.zip creating: titan/ ... matthias$ cd titan titan$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin>
  108. gremlin> g = TitanFactory.open('/tmp/local-titan') ==>titangraph[local:/tmp/local-titan]
  109. DE MO INE ACH gremlin> g = TitanFactory.open('/tmp/local-titan') LM ==>titangraph[local:/tmp/local-titan] LO CA
  110. gremlin> g.createKeyIndex('name',Vertex.class) ==>null gremlin> g.stopTransaction(SUCCESS) ==>null
  111. name:saturn name:sky name:sea type:titan type:location type:location lives father lives name:jupiter brother name:neptune type:god type:god father brother brother name:hercules type:demigod mother name:pluto type:god name:alcmene type:human pet battled battled battled lives time:1 time:2 time:12 lives name:tartarus type:location name:nemean name:hydra name:cerberus type:monster type:monster type:monster gremlin> g.loadGraphML('data/graph-of-the-gods.xml') ==>null * The Graph of the Gods is a toy dataset distributed with Titan
  112. name:saturn name:sky name:sea type:titan type:location type:location lives father lives name:jupiter brother name:neptune type:god type:god father brother brother name:hercules type:demigod mother name:pluto type:god name:alcmene type:human pet battled battled battled lives time:1 time:2 time:12 lives name:tartarus type:location name:nemean name:hydra name:cerberus type:monster type:monster type:monster gremlin> hercules = g.V('name','hercules').next() ==>v[24]
  113. name:saturn name:sky name:sea type:titan type:location type:location lives father lives name:jupiter brother name:neptune type:god type:god father brother brother name:hercules type:demigod mother name:pluto type:god name:alcmene type:human pet battled battled battled lives time:1 time:2 time:12 lives name:tartarus type:location name:nemean name:hydra name:cerberus type:monster type:monster type:monster gremlin> hercules.out('mother','father') ==>v[44] ==>v[16]
  114. name:saturn name:sky name:sea type:titan type:location type:location lives father lives name:jupiter brother name:neptune type:god type:god father brother brother name:hercules type:demigod mother name:pluto type:god name:alcmene type:human pet battled battled battled lives time:1 time:2 time:12 lives name:tartarus type:location name:nemean name:hydra name:cerberus type:monster type:monster type:monster gremlin> hercules.out('mother','father').name ==>alcmene ==>jupiter
  115. THAT WAS TITAN LOCAL. NEXT IS TITAN DISTRIBUTED. Broecheler, M., Pugliese, A., Subrahmanian, V.S., "COSI: Cloud Oriented Subgraph Identification in Massive Social Networks," Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining, pp. 248-255, 2010. http://www.knowledgefrominformation.com/2010/08/01/cosi-cloud-oriented-subgraph-identification-in-massive-social-networks/
  116. BACKEND AGNOSTIC -OR-
  117. TITAN DISTRIBUTED VIA CASSANDRA titan$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","cassandra"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[cassandra:77.77.77.77] gremlin> * There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
  118. INHERITED FEATURES Continuously available with no single point of failure. No write bottlenecks to the graph as there is no master/slave architecture. Built-in replication ensures data is available during machine failure. Caching layer ensures that continuously accessed data is available in memory. Elastic scalability allows for the introduction and removal of machines. Cassandra available at http://cassandra.apache.org/
  119. TITAN DISTRIBUTED VIA HBASE titan$ bin/gremlin.sh ,,,/ (o o) -----oOOo-(_)-oOOo----- gremlin> conf = new BaseConfiguration(); ==>org.apache.commons.configuration.BaseConfiguration@763861e6 gremlin> conf.setProperty("storage.backend","hbase"); gremlin> conf.setProperty("storage.hostname","77.77.77.77"); gremlin> g = TitanFactory.open(conf); ==>titangraph[hbase:77.77.77.77] gremlin> * There are numerous graph configurations: https://github.com/thinkaurelius/titan/wiki/Graph-Configuration
  120. INHERITED FEATURES Strictly consistent reads and writes. Linear scalability with the addition of machines. Base classes for backing Hadoop MapReduce jobs with HBase tables. HDFS-based data replication. Generally good integration with the tools in the Hadoop ecosystem. HBase available at http://hbase.apache.org/
  121. TITAN AND THE CAP THEOREM Partitionability y Ava c ten il is abi s on ty li C
  122. Titan is all about ...
  123. Titan is all about numerous concurrent users...
  124. Titan is all about numerous concurrent users... high availability....
  125. Titan is all about numerous concurrent users... high availability.... dynamic scalability...
  126. THE HOW OF TITAN DATA MANAGEMENT EDGE COMPRESSION VERTEX-CENTRIC INDICES
  127. THE HOW OF TITAN DATA MANAGEMENT
  128. DATA MANAGEMENT MAIN DESIGN PRINCIPLES Immutable, Atomic Edges Optimistic Concurrency Control hercules cerberus battled 1 hercules time:12 cerberus 2 battled + + + hercules time:12 successful:true cerberus + - 3 battled + Fined-Grained Locking Control
  129. DATA MANAGEMENT TYPE DEFINITION Datatype Constraints Edge Label Signatures TitanKey timeKey = TitanLabel battled = g.makeType().name("time") g.makeType().name("battled") .dataType(Integer.class) .signature(timeKey) time:12 time:"twelve" hercules cerberus battled time:12 Functional Declarations TitanLabel father = g.makeType().name("father") .functional() hercules jupiter father mars father Data management configurations allow Titan to optimize how information is stored/retrieved from disk.
  130. DATA MANAGEMENT TYPE DEFINITION Endogenous Indices g.createKeyIndex("name",Vertex.class) Unique Property Key/Value Pairs TitanKey status = name:jupiter g.makeType().name("status") name:hercules .unique() name:hermes name:jupiter name:neptune status:king of the gods status:king of the gods Data management configurations allow Titan to optimize how information is stored/retrieved from disk.
  131. DATA MANAGEMENT LOCKING SYSTEM Ensures consistency over non-consistent storage backends. hercules father jupiter write hercules jupiter father neptune father hercules write 1. Acquire lock at the end of the transaction. - locking mechanism depends on storage layer consistency guarantees. 2. Verify original read. 3. Fail transaction if any precondition is violated.
  132. DATA MANAGEMENT ID MANAGEMENT [0,1,2,3,4,5,6,7,8,9,10,11] Global ID Pool Maintained by Storage Engine
  133. DATA MANAGEMENT ID MANAGEMENT [0,1,2] [3,4,5] [0,1,2,3,4,5,6,7,8,9,10,11] Global ID Pool Maintained by Storage Engine [6,7,8] [9,10,11] Pool Subsets Assigned to Individual Instances
  134. THE HOW OF TITAN EDGE COMPRESSION
  135. EDGE COMPRESSION Natural graphs have a small world, community/cluster property. Community 1 Community 2 High intra-connectivity within a community and low inter-connectivity between communities. Watts, D. J., Strogatz, S. H., "Collective Dynamics of 'Small-World' Networks," Nature 393 (6684), pp. 440–442, 1998.
  136. EDGE COMPRESSION
  137. EDGE COMPRESSION knows 12345678 12345683
  138. EDGE COMPRESSION knows 12345678 12345683
  139. EDGE COMPRESSION knows 12345678 12345683 12345678 9 12345683 24 bytes
  140. EDGE COMPRESSION knows 12345678 12345683 12345678 9 12345683 24 bytes 12345678 9 +5
  141. EDGE COMPRESSION knows 12345678 12345683 12345678 9 12345683 24 bytes 12345678 9 +5 + 12345678 9 5 7 bytes
  142. THE HOW OF TITAN VERTEX-CENTRIC INDICES
  143. VERTEX-CENTRIC INDICES THE SUPER NODE PROBLEM Natural, real-world graphs contain vertices of high degree. Even if rare, their degree ensures that they exist on many paths. Traversing a high degree vertex means touching numerous incident edges and potentially touching most of the graph in only a few steps.
  144. VERTEX-CENTRIC INDICES A SUPER NODE SOLUTION A "super node" only exists from the vantage point of classic "textbook style" graphs. In the world of property graphs, intelligent disk-level filtering can interpret a "super node" as a more manageable low-degree vertex. Vertex-centric querying utilizes B-Trees and sort orders for speedy lookup of incident edges with particular qualities.
  145. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES vertex.query() stars:5 likes likes stars:2 stars:2 likes knows knows stars:3 stars:3 likes likes knows 8 edges
  146. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES vertex.query().direction(OUT) stars:5 likes likes stars:2 stars:2 likes knows knows stars:3 stars:3 likes likes 7 edges
  147. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES vertex.query().direction(OUT) .labels("likes") stars:5 likes likes stars:2 stars:2 likes stars:3 stars:3 likes likes 5 edges
  148. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES vertex.query().direction(OUT) .labels("likes").has("stars",5) stars:5 likes 1 edge
  149. VERTEX-CENTRIC INDICES PUSHDOWN PREDICATES Query Query.direction(Direction) PREDICATES Query Query.labels(String... labels) Query Query.has(String, Object, Compare) Query Query.has(String, Object) Query Query.range(String, Object, Object) GETTERS Iterable<Vertex> Query.vertices() Iterable<Edge> Query.edges()
  150. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING battled time:1 time:2 battled time:12 battled knows knows
  151. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING battled time:1 time:2 battled battled time:12 battled knows knows knows
  152. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING battled time:1 battled w/ time 1-5 time:2 battled time:12 battled battled w/ time 5-10 knows TitanLabel battled = g.makeType().name("battled") .primaryKey(time) knows knows
  153. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING brother father mother knows battled
  154. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING brother father mother knows battled
  155. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING brother father family TypeGroup family = TypeGroup.of(2,"family"); mother TitanLabel father = g.makeType().name("father") .group(family).makeEdgeLabel(); TitanLabel mother = knows g.makeType().name("mother") .group(family).makeEdgeLabel(); TitanLabel brother = battled g.makeType().name("brother") .group(family).makeEdgeLabel();
  156. VERTEX-CENTRIC INDICES DISK-LEVEL SORTING/INDEXING brother father family mother knows battled vertex.query().group("family")...
  157. THAT IS HOW TITAN WORKS DATA MANAGEMENT EDGE COMPRESSION VERTEX-CENTRIC INDICES
  158. WHAT IF YOU WANTED TO CREATE TWITTER FROM SCRATCH? SIMULATING TWITTER
  159. 3 BILLION EDGES 100 MILLION VERTICES 10000 CONCURRENT USERS 50 MACHINES 1 GRAPH DATABASE COMING JULY 2012
  160. PART 3: THE FUTURE OF AURELIUS MARKO A. RODRIGUEZ MATTHIAS BROECHELER
  161. AURELIUS' GRAPH COMPUTING STORY Titan as the highly scalable, distributed graph database solution. OLTP
  162. AURELIUS' GRAPH COMPUTING STORY Titan as the highly scalable, distributed graph database solution. Titan as the source (and potential sink) for other graph processing solutions. OLTP OLAP
  163. FAUNUS GOD OF HERDS
  164. FAUNUS PATH ALGEBRA FOR HADOOP battled battled hercules cretan bull theseus A · A ◦ n(I) ally hercules theseus Derived graphs are single-relational and are typically much smaller than their multi-relational source. Therefore, derived graphs can be subjected to textbook-style graph algorithms in both a meaningful and efficient manner. WHO IS THE MOST CENTRAL ALLY?
  165. FAUNUS PATH ALGEBRA FOR HADOOP B = A · A ◦ n(I) B · B ◦ n(I) ally ally ally ally ally ally ally ally ally ally ally ally ally My allies' allies are my allies. 2 (A · A ) ◦ n(I)
  166. FAUNUS PATH ALGEBRA FOR HADOOP Used for global graph operations. Implements the multi-relational path algebra as a collection of Map/Reduce operations Reduce a massive property graph into a smaller semantically-rich single-relational graph. Project codename: TinkerPoop Support for HadoopGraph and HDFS file formats Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29-41, 2009. http://arxiv.org/abs/0806.2274
  167. FULGORA GODDESS OF LIGHTNING
  168. FULGORA AN EFFICIENt IN-MEMORY GRAPH ENGINE Non-transactional, in-memory graph engine. It is not a database. Process ~90 billion edges in 68-Gigs of RAM assuming a small world topology. Perform complex graph algorithms in-memory. global graph analysis multi-relational graph analysis Similar in spirit to Twitter's Cassovary: https://github.com/twitter/cassovary
  169. THE AURELIUS OLAP FLOW Stores a massive-scale property graph Analyzes compressed, large-scale single or multi-relational Generates a large-scale graphs in memory single-relational graph Map/Reduce Load into RAM on a single-machine Update graph with derived edges Update element properties with algorithm results to a stats package
  170. THE AURELIUS OLAP FLOW Stores a massive-scale property graph Analyzes compressed, large-scale single or multi-relational Generates a large-scale graphs in memory single-relational graph Map/Reduce Load into RAM on a single-machine ally ally_centrality:0.0123 hercules theseus hercules to a stats package
  171. THE AURELIUS OLAP FLOW Stores a massive-scale property graph Analyzes compressed, large-scale single or multi-relational Generates a large-scale graphs in memory single-relational graph to a stats package
  172. AURELIUS' USE OF BLUEPRINTS Aurelius products use the Blueprints API so any graph product can communicate with any other graph product. The code for graph databases, frameworks, algorithms, and batch-processing are written in terms of the Blueprints API. Aurelius encourages developers to use Blueprints/ TinkerPop in order to grow a rich ecosystem of interoperable graph technologies.
  173. THE GRAPH LANDSCAPE REPRISE Speed of Traversal/Process Size of Graph/Structure * Not to scale. Did not want to overlap logos.
  174. NEXT STEPS Make use of and/or contribute to the free, open source Titan product. Learn about applying graph theory and network science. http://thinkaurelius.com http://thinkaurelius.github.com/titan/
  175. THANK YOU
  176. CREDITS PRESENTERS MARKO A. RODRIGUEZ MATTHIAS BROCHELER FINANCIAL SUPPORT PEARSON EDUCATION AURELIUS LOCATION PROVISIONS JIVE SOFTWARE MANY THANKS TO DAN LAROCQUE TINKERPOP COMMUNITY STEPHEN MALLETTE BOBBY NORTON KETRINA YIM
Advertisement