Adding Value through graph analysis using Titan and Faunus
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Adding Value through graph analysis using Titan and Faunus

  • 15,083 views
Uploaded on

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems. ...

In this presentation we discuss how graph analysis can add value to your data and how to use open source tools like Titan and Faunus to build scalable graph processing systems.
This presentation gives an update on the development status of Titan and Faunus with a preview of what is to come.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
15,083
On Slideshare
10,651
From Embeds
4,432
Number of Embeds
16

Actions

Shares
Downloads
208
Comments
0
Likes
22

Embeds 4,432

http://nosql.mypopescu.com 4,342
https://twitter.com 40
http://feeds.feedburner.com 16
http://www.hanrss.com 8
http://www.newsblur.com 6
http://newsblur.com 4
http://www.scoop.it 3
http://127.0.0.1 3
http://www.bing.com 2
http://tweetedtimes.com 2
http://www.soso.com 1
http://72.30.186.176 1
http://j.mp 1
http://dev.newsblur.com 1
http://translate.googleusercontent.com 1
http://www.verious.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. KNOWLEDGEINFORMATIONDATAAdding Value Through GraphAnalysisMatthias Broecheler, CTO@mbroecheler AURELIUSMarch V, MMXIII THINKAURELIUS.COM
  • 2. " " " " " " " " "Communities of Interest Finding Influencers "Understanding Behavior "
  • 3. " " " " " " " " "Information Integration Recommendation "Question Answering "
  • 4. " " " " " " " " "Fraud Detection Risk Analysis "Market Valuation "
  • 5. Knowledge ValueInformation Data
  • 6. likes(Jane Joe, cute mamals):0.8 Knowledge userid:3552" clicked timestamp: addid:9914 Information 93932342 "2013-03-03 18:52:48:112;12.123.211.192; ACCESS/TRR;http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trendi Datang ; ACTION=CLICK|DELAY=250|x=450|y=632!
  • 7. Graph Databases &likes(Jane Joe, cute mamals):0.8 Graph Analysis Knowledge userid:3552" clicked timestamp: addid:9914 Information 93932342 "2013-03-03 18:52:48:112;12.123.211.192; ACCESS/TRR;http://adserve.domain.com/render.cgi?uid=F32282DA39B&flagtru&xls=trendi Datang ; ACTION=CLICK|DELAY=250|x=450|y=632!
  • 8. IGraph Foundation AURELIUS THINKAURELIUS.COM
  • 9. name: Neptune name: Alcmene type: god type: godVertex Property name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod name: Pluto name: Cerberus type: god type: monster Graph
  • 10. name: Neptune name: Alcmene type: god type: godEdge brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father Edge battled brother Property time:12 name: Pluto name: Cerberus type: god type: monster Edge Type pet Graph
  • 11. name: Neptune name: Alcmene type: god type: god brother mothername: Saturn name: Jupiter name: Herculestype: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path
  • 12. name: Neptune name: Alcmene type: god type: god brother mothername: Saturn name: Jupiter name: Herculestype: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Degree
  • 13. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 14. IITitan Graph Database AURELIUS THINKAURELIUS.COM
  • 15. Titan Features  Numerous Concurrent Users  Many Short Transactions   read/write  Real-time Traversals (OLTP)  High Availability  Dynamic Scalability  Variable Consistency Model   ACID or eventual consistency  Real-time Big Graph Data
  • 16. Storage Backends PartitionabilityConsistency Availability
  • 17. $ ./titan-0.2.0/bin/gremlin.sh! ! ! !,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open(/tmp/titan)!==>titangraph[local:/tmp/titan]!gremlin> v = g.V(‘name’,’Hercules’)!==>v[4]!gremlin> v.out(‘father’).out(‘brother’).name!
  • 18. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster petgremlin> v.out(‘father’).out(‘brother’).name!
  • 19. Vertex-Centric Indices  Sort and index edges per vertex by primary key   Primary key can be composite  Enables efficient focused traversals   Only retrieve edges that matter  Uses push down predicates for quick, index-driven retrieval
  • 20. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 father fought fought
  • 21. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 .direction(OUT)! father
  • 22. battled battled battled time: 1 time: 3 time: 5 battled v v.query()! time: 9 .direction(OUT)! .labels(‘battled’)!
  • 23. battled battled time: 1 time: 3 v v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
  • 24. Titan FeaturesI.  Data ManagementII.  Vertex-Centric Indices
  • 25. Titan FeaturesIII.  Graph PartitioningIV.  Edge Compression
  • 26. IIITITAN 0.3.0 [-SNAPSHOT] AURELIUS THINKAURELIUS.COM
  • 27. Titan Embedding  Rexster RexPro   lightweight Gremlin Server   binary protocol  Titan Gremlin Engine  Embedded Storage Backend   in-JVM method calls  Native clients   Java, Python, Clojure
  • 28. Graph Indexing  Vertex and Edge indexing  Pluggable index provider   ElasticSearch   Lucene  Full-text search  Numeric range search  Geographic search
  • 29. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Herculesage: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet
  • 30. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘age’,Cmp.GREATER_THAN,5000).vertices()!
  • 31. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • 32. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules age: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘age’,Cmp.GREATER_THAN,5000)
has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • 33. name: Neptune name: Alcmene age: 5200 age: 3300 title: God of the earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Herculesage: 5900 title: God of the title: Divine hero heaven and skies father father battled brother time:12 location: (38.071,23.745) name: Pluto name: Cerberus age: 4900 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘location’,Geo.WITHIN,
 Geoshape.circle(38,23,100).edges()!
  • 34. IVFaunus Graph Analytics AURELIUS THINKAURELIUS.COM
  • 35. Faunus Features  Hadoop-based Graph Computing Framework  Graph Analytics  Breadth-first Traversals  Global Graph Computations  Batch Big Graph Data
  • 36. Faunus Architecture g._()!
  • 37. Faunus Work Flowg.V.out .out .count() hdfs://user/ubuntu/ output/job-0/ output/job-1/ graph* output/job-2/ { sideeffect*Compressed HDFS Graphs  stored in sequence files  variable length encoding  prefix compression
  • 38. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 39. What’s New  Faunus 0.1 released  Bulk Import / Export for Titan   loaded graph into Titan   loading derivations into Titan   RDF support  Many optimizations   vertex compression
  • 40. Faunus Setup$ bin/gremlin.sh ! ,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open(bin/titan-hbase.properties)!==>faunusgraph[titanhbaseinputformat]!gremlin> g.getProperties()!==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!==>faunus.output.location=dbpedia!==>faunus.output.location.overwrite=true!gremlin> g._() !12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
  • 41. Build a Knowledge Graph  Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples)1.  Bulk load RDF into Faunus   6 m1.xlarge2.  Convert to property graph3.  Bulk load into Titan   3 m1.xlarge with Cassandra4.  OLTP+OLAP   Total Time: ~ 2 hours
  • 42. Graph OLTPgremlin> g = TitanFactory.open(bin/cassandra.local) !==>titangraph[cassandrathrift:10.176.213.110]!gremlin> g.V(name,Random_walker_algorithm).both.name!==>Random_walk!==>Segmentation_(image_processing)!==>Graph_(mathematics)!==>Laplacian_matrix!==>Graph!==>Laplacian_matrix!==>Electrical_network!==>Resistor!==>Electrical_resistance_and_conductance!==>Ground_(electricity)!==>Direct_current!==>Voltage_source!==>Precomputation!==>Category:Computer_vision!==>Random_Walker_(Computer_Vision)!==>List_of_algorithms!==>Segmentation_(image_processing)!==>Watershed_(image_processing)!==>Random_walker_(computer_vision)!==>Random_Walker_(computer_vision)!
  • 43. gremlin> g.V(name,Learning).out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>Middle_Ages!==>Early_modern_Europe!==>Armenian_Kingdom_of_Cilicia!==>Lingua_franca!==>Vatican_City!==>Vulgar_Latin!==>Romance_languages!
  • 44. Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results aureliusgraphs@googlegroups.com back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 45. Speed of Traversal/Process The Graph LandscapeIllustration only, not to scale Size of Graph
  • 46. TINKERPOP.COM
  • 47. Thanks! Vadas Gintautas Marko Rodriguez @vadasg @twarko Stephen Mallette Daniel LaRocque @spmallette AURELIUS THINKAURELIUS.COM
  • 48. We are Hiring AURELIUS THINKAURELIUS.COM