KNOWLEDGEINFORMATIONDATAAdding Value Through GraphAnalysisMatthias Broecheler, CTO@mbroecheler               AURELIUSMarch...
"                                   "                                                     "               "           ...
"                                "                                                  "                "                ...
"                                  "                                                    "                "            ...
Knowledge               ValueInformation   Data
likes(Jane Joe, cute mamals):0.8                                    Knowledge         userid:3552"  clicked timestamp:   ...
Graph Databases                                                           &likes(Jane Joe, cute mamals):0.8               ...
IGraph Foundation                   AURELIUS                   THINKAURELIUS.COM
name: Neptune   name: Alcmene                         type: god       type: godVertex                                     ...
name: Neptune                  name: Alcmene                                   type: god                      type: godEdg...
name: Neptune                  name: Alcmene                            type: god                      type: god          ...
name: Neptune                  name: Alcmene                            type: god                      type: god          ...
Apache 2            Aurelius Graph Cluster          TITAN                                 FAUNUS                          ...
IITitan Graph Database                       AURELIUS                       THINKAURELIUS.COM
Titan Features  Numerous Concurrent Users  Many Short Transactions    read/write  Real-time Traversals (OLTP)  High A...
Storage Backends               PartitionabilityConsistency                       Availability
$ ./titan-0.2.0/bin/gremlin.sh!  ! ! !,,,/!         (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open(/tmp/tita...
name: Neptune                  name: Alcmene                                  type: god                      type: god    ...
Vertex-Centric Indices  Sort and index edges per   vertex by primary key    Primary key can be composite  Enables effici...
battled         battled        battled time: 1        time: 3        time: 5       mother                       battled   ...
battled         battled        battled time: 1        time: 3        time: 5       mother                       battled   ...
battled    battled        battled time: 1   time: 3        time: 5                                battled                 ...
battled    battled time: 1   time: 3                       v   v.query()!                             .direction(OUT)!    ...
Titan FeaturesI.  Data ManagementII.  Vertex-Centric     Indices
Titan FeaturesIII.  Graph   PartitioningIV.  Edge Compression
IIITITAN 0.3.0 [-SNAPSHOT]                          AURELIUS                          THINKAURELIUS.COM
Titan Embedding  Rexster RexPro    lightweight Gremlin     Server    binary protocol  Titan Gremlin Engine  Embedded ...
Graph Indexing  Vertex and Edge indexing  Pluggable index provider    ElasticSearch    Lucene  Full-text search  Num...
name: Neptune                  name: Alcmene                            age: 5200                      age: 3300          ...
name: Neptune                  name: Alcmene                                   age: 5200                      age: 3300   ...
name: Neptune                  name: Alcmene                                   age: 5200                      age: 3300   ...
name: Neptune                  name: Alcmene                              age: 5200                      age: 3300        ...
name: Neptune                  name: Alcmene                            age: 5200                      age: 3300          ...
IVFaunus Graph Analytics                         AURELIUS                         THINKAURELIUS.COM
Faunus Features  Hadoop-based Graph   Computing Framework  Graph Analytics  Breadth-first Traversals  Global Graph Comp...
Faunus Architecture         g._()!
Faunus Work Flowg.V.out                        .out                   .count()                                  hdfs://use...
Apache 2            Aurelius Graph Cluster          TITAN                                 FAUNUS                          ...
What’s New  Faunus 0.1 released  Bulk Import / Export for Titan    loaded graph into Titan    loading derivations into...
Faunus Setup$ bin/gremlin.sh !         ,,,/!         (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open(bin/tit...
Build a Knowledge Graph  Based on DBPedia    Graph version of Wikipedia    ~290 million edges (~1B triples)1.  Bulk loa...
Graph OLTPgremlin> g = TitanFactory.open(bin/cassandra.local)   !==>titangraph[cassandrathrift:10.176.213.110]!gremlin> g....
gremlin> g.V(name,Learning).out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>M...
Apache 2            Aurelius Graph Cluster          TITAN                                 FAUNUS                          ...
Speed of Traversal/Process     The Graph LandscapeIllustration only, not to scale                                         ...
TINKERPOP.COM
Thanks!   Vadas Gintautas    Marko Rodriguez   @vadasg            @twarko   Stephen Mallette   Daniel LaRocque   @spmallet...
We are Hiring   AURELIUS  THINKAURELIUS.COM
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
Adding Value through graph analysis using Titan and Faunus
Upcoming SlideShare
Loading in...5
×

Adding Value through graph analysis using Titan and Faunus

15,797

Published on

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems.
This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.

Published in: Technology
0 Comments
24 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
15,797
On Slideshare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
244
Comments
0
Likes
24
Embeds 0
No embeds

No notes for slide

Adding Value through graph analysis using Titan and Faunus

  1. 1. KNOWLEDGEINFORMATIONDATAAdding Value Through GraphAnalysisMatthias Broecheler, CTO@mbroecheler AURELIUSMarch V, MMXIII THINKAURELIUS.COM
  2. 2. " " " " " " " " "Communities of Interest Finding Influencers "Understanding Behavior "
  3. 3. " " " " " " " " "Information Integration Recommendation "Question Answering "
  4. 4. " " " " " " " " "Fraud Detection Risk Analysis "Market Valuation "
  5. 5. Knowledge ValueInformation Data
  6. 6. likes(Jane Joe, cute mamals):0.8 Knowledge userid:3552" clicked timestamp: addid:9914 Information 93932342 "2013-03-03 18:52:48:112;12.123.211.192; ACCESS/TRR;http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trendi Datang ; ACTION=CLICK|DELAY=250|x=450|y=632!
  7. 7. Graph Databases &likes(Jane Joe, cute mamals):0.8 Graph Analysis Knowledge userid:3552" clicked timestamp: addid:9914 Information 93932342 "2013-03-03 18:52:48:112;12.123.211.192; ACCESS/TRR;http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trendi Datang ; ACTION=CLICK|DELAY=250|x=450|y=632!
  8. 8. IGraph Foundation AURELIUS THINKAURELIUS.COM
  9. 9. name: Neptune name: Alcmene type: god type: godVertex Property name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod name: Pluto name: Cerberus type: god type: monster Graph
  10. 10. name: Neptune name: Alcmene type: god type: godEdge brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father Edge battled brother Property time:12 name: Pluto name: Cerberus type: god type: monster Edge Type pet Graph
  11. 11. name: Neptune name: Alcmene type: god type: god brother mothername: Saturn name: Jupiter name: Herculestype: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path
  12. 12. name: Neptune name: Alcmene type: god type: god brother mothername: Saturn name: Jupiter name: Herculestype: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Degree
  13. 13. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  14. 14. IITitan Graph Database AURELIUS THINKAURELIUS.COM
  15. 15. Titan Features  Numerous Concurrent Users  Many Short Transactions   read/write  Real-time Traversals (OLTP)  High Availability  Dynamic Scalability  Variable Consistency Model   ACID or eventual consistency  Real-time Big Graph Data
  16. 16. Storage Backends PartitionabilityConsistency Availability
  17. 17. $ ./titan-0.2.0/bin/gremlin.sh! ! ! !,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open(/tmp/titan)!==>titangraph[local:/tmp/titan]!gremlin> v = g.V(‘name’,’Hercules’)!==>v[4]!gremlin> v.out(‘father’).out(‘brother’).name!
  18. 18. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster petgremlin> v.out(‘father’).out(‘brother’).name!
  19. 19. Vertex-Centric Indices  Sort and index edges per vertex by primary key   Primary key can be composite  Enables efficient focused traversals   Only retrieve edges that matter  Uses push down predicates for quick, index-driven retrieval
  20. 20. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 father fought fought
  21. 21. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 .direction(OUT)! father
  22. 22. battled battled battled time: 1 time: 3 time: 5 battled v v.query()! time: 9 .direction(OUT)! .labels(‘battled’)!
  23. 23. battled battled time: 1 time: 3 v v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
  24. 24. Titan FeaturesI.  Data ManagementII.  Vertex-Centric Indices
  25. 25. Titan FeaturesIII.  Graph PartitioningIV.  Edge Compression
  26. 26. IIITITAN 0.3.0 [-SNAPSHOT] AURELIUS THINKAURELIUS.COM
  27. 27. Titan Embedding  Rexster RexPro   lightweight Gremlin Server   binary protocol  Titan Gremlin Engine  Embedded Storage Backend   in-JVM method calls  Native clients   Java, Python, Clojure
  28. 28. Graph Indexing  Vertex and Edge indexing  Pluggable index provider   ElasticSearch   Lucene  Full-text search  Numeric range search  Geographic search
  29. 29. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Herculesage: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet
  30. 30. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
  31. 31. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
  32. 32. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘age’,Cmp.GREATER_THAN,5000)
has(‘title’,Txt.CONTAINS,’god’).vertices()!
  33. 33. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Herculesage: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘location’,Geo.WITHIN,
 Geoshape.circle(38,23,100).edges()!
  34. 34. IVFaunus Graph Analytics AURELIUS THINKAURELIUS.COM
  35. 35. Faunus Features  Hadoop-based Graph Computing Framework  Graph Analytics  Breadth-first Traversals  Global Graph Computations  Batch Big Graph Data
  36. 36. Faunus Architecture g._()!
  37. 37. Faunus Work Flowg.V.out .out .count() hdfs://user/ubuntu/ output/job-0/ output/job-1/ graph* output/job-2/ { sideeffect*Compressed HDFS Graphs  stored in sequence files  variable length encoding  prefix compression
  38. 38. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  39. 39. What’s New  Faunus 0.1 released  Bulk Import / Export for Titan   loaded graph into Titan   loading derivations into Titan   RDF support  Many optimizations   vertex compression
  40. 40. Faunus Setup$ bin/gremlin.sh ! ,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open(bin/titan-hbase.properties)!==>faunusgraph[titanhbaseinputformat]!gremlin> g.getProperties()!==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!==>faunus.output.location=dbpedia!==>faunus.output.location.overwrite=true!gremlin> g._() !12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
  41. 41. Build a Knowledge Graph  Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples)1.  Bulk load RDF into Faunus   6 m1.xlarge2.  Convert to property graph3.  Bulk load into Titan   3 m1.xlarge with Cassandra4.  OLTP+OLAP   Total Time: ~ 2 hours
  42. 42. Graph OLTPgremlin> g = TitanFactory.open(bin/cassandra.local) !==>titangraph[cassandrathrift:10.176.213.110]!gremlin> g.V(name,Random_walker_algorithm).both.name!==>Random_walk!==>Segmentation_(image_processing)!==>Graph_(mathematics)!==>Laplacian_matrix!==>Graph!==>Laplacian_matrix!==>Electrical_network!==>Resistor!==>Electrical_resistance_and_conductance!==>Ground_(electricity)!==>Direct_current!==>Voltage_source!==>Precomputation!==>Category:Computer_vision!==>Random_Walker_(Computer_Vision)!==>List_of_algorithms!==>Segmentation_(image_processing)!==>Watershed_(image_processing)!==>Random_walker_(computer_vision)!==>Random_Walker_(computer_vision)!
  43. 43. gremlin> g.V(name,Learning).out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>Middle_Ages!==>Early_modern_Europe!==>Armenian_Kingdom_of_Cilicia!==>Lingua_franca!==>Vatican_City!==>Vulgar_Latin!==>Romance_languages!
  44. 44. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results aureliusgraphs@googlegroups.com back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  45. 45. Speed of Traversal/Process The Graph LandscapeIllustration only, not to scale Size of Graph
  46. 46. TINKERPOP.COM
  47. 47. Thanks! Vadas Gintautas Marko Rodriguez @vadasg @twarko Stephen Mallette Daniel LaRocque @spmallette AURELIUS THINKAURELIUS.COM
  48. 48. We are Hiring AURELIUS THINKAURELIUS.COM
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×