• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Day Texas 2013
 

Data Day Texas 2013

on

  • 1,664 views

An introduction to graph databases and graph computing frameworks in general and overview of the Aurelius graph cluster in particular. Discusses Titan and Faunus and demonstrates how to build a ...

An introduction to graph databases and graph computing frameworks in general and overview of the Aurelius graph cluster in particular. Discusses Titan and Faunus and demonstrates how to build a knowledge graph using the cluster.

This presentation was given at Data Day Texas in 2013. http://datadaytexas.com/

Statistics

Views

Total Views
1,664
Views on SlideShare
1,658
Embed Views
6

Actions

Likes
9
Downloads
0
Comments
0

1 Embed 6

https://twitter.com 6

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Health care: cancer, personalized medicinesocial systemseconomy
  • Source:http://www.digitaltrends.com/mobile/inside-knowledge-graph-googles-deep-diving-semantic-search/
  • http://socialmediatoday.com/larry-weintraub/1171711/facebook-seo-comes-life-graph-search-launches

Data Day Texas 2013 Data Day Texas 2013 Presentation Transcript

  • Graph DatabasesAnalyzing Relationships at Scale#DDTX13Matthias Broecheler, CTO@mbroecheler AURELIUSMarch XXX, MMXIII THINKAURELIUS.COM
  • THE BRAIN
  • EMERGENCE
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 PropertyVertex name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 name: Pluto name: Cerberus type: god type: monster age: 4000 Graph
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45Edge brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 father father battled time: 2 battled brother time:12 Edge Property name: Pluto name: Cerberus type: god type: monster age: 4000 Edge Label pet Graph
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mothername: Saturn name: Jupiter name: Hercules name: Hydratype: titan type: god type: demigod type: monsterage: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 pet Path
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mothername: Saturn name: Jupiter name: Hercules name: Hydratype: titan type: god type: demigod type: monsterage: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 pet Degree
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mothername: Saturn name: Jupiter name: Hercules name: Hydratype: titan type: god type: demigod type: monsterage: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 pet Shortest Paths
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mothername: Saturn name: Jupiter name: Hercules name: Hydratype: titan type: god type: demigod type: monsterage: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 pet Centrality
  • Tinkerpop Graph Stack Graph Server Graph Algorithms Object-Graph Mapper Traversal Language Dataflow Processing Generic Graph API
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mothername: Saturn name: Jupiter name: Hercules name: Hydratype: titan type: god type: demigod type: monsterage: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 pet g.V! g.E!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 v battled father father time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petv = g.V(‘name’,’Hercules’)!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 v battled father father time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petv.out(‘father’,’mother’)!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 v battled father father time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petv.out(‘father’).out(‘brother’).name!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 v battled father father time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petv.outE(‘battled’).has(‘time’,T.gt,5).inV.name!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 v battled father father time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petv.out(‘father’).out(‘brother’)!.has(‘age’,T.lt,4200).name!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petg.query().has(‘age’,T.gt,4200).vertices()!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petg.query().has(‘time’,T.lt,5).edges()!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petsaturn.as(x).in(father)!.loop(x){it.loops < 3}.next()!
  • name: Neptune name: Alcmene type: god type: human age: 4500 age: 45 brother mother name: Saturn name: Jupiter name: Hercules name: Hydra type: titan type: god type: demigod type: monster age: 10000 father father battled time: 2 battled brother time:12 name: Pluto name: Cerberus type: god type: monster age: 4000 petg.V.sideEffect{
 !it.rank = it.both.both.both.count()
}!
  • Speed of Traversal/Process The Graph LandscapeIllustration only, not to scale Size of Graph
  • Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • Titan Features  Numerous Concurrent Users  Many Short Transactions   read/write  Real-time Traversals (OLTP)  High Availability  Dynamic Scalability  Variable Consistency Model   ACID or eventual consistency  Real-time Big Graph Data
  • Storage Backends PartitionabilityConsistency Availability
  • $ ./titan-0.2.0/bin/gremlin.sh! ! ! !,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = TitanFactory.open(/tmp/titan)!==>titangraph[local:/tmp/titan]!gremlin> v = g.V(‘name’,’Hercules’)!==>v[4]!gremlin> v.out(‘father’).out(‘brother’).name!
  • Vertex-Centric Indices  Sort and index edges per vertex by primary key   Primary key can be composite  Enables efficient focused traversals   Only retrieve edges that matter  Uses push down predicates for quick, index-driven retrieval
  • battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 father fought fought
  • battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 .direction(OUT)! father
  • battled battled battled time: 1 time: 3 time: 5 battled v v.query()! time: 9 .direction(OUT)! .labels(‘battled’)!
  • battled battled time: 1 time: 3 v v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
  • Titan Server REST REXPRO$ wget http://s3.thinkaurelius.com/downloads/titan/titan-cassandra-0.3.0.zip!$ unzip titan-cassandra-0.3.0.zip!$ cd titan-cassandra-0.3.0!$ sudo bin/titan.sh config/titan-server-rexster.xml config/titan-server-cassandra.properties!
  • Graph Indexing  Vertex and Edge indexing  Pluggable index provider   ElasticSearch   Lucene  Full-text search  Numeric range search  Geographic search
  • name: Neptune name: Alcmene age: 4500 type: human title: God of the age: 45 earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Hercules name: Hydratype: titan title: God of the title: Divine hero type: monsterage: 10000 heaven and skies father father battled time: 2 battled locaion: [37.7,23.9] brother time:12 location: [39,22] name: Pluto name: Cerberus age: 4000 title: Ugly beast of the title: God of the underworld underworld pet
  • name: Neptune name: Alcmene age: 4500 type: human title: God of the age: 45 earth and ocean brother mother name: Jupiter name: Saturn age: 4800 name: Hercules name: Hydra type: titan title: God of the title: Divine hero type: monster age: 10000 heaven and skies father father battled time: 2 battled locaion: [37.7,23.9] brother time:12 location: [39,22] name: Pluto name: Cerberus age: 4000 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘title’,Txt.CONTAINS,’god’).vertices()!
  • name: Neptune name: Alcmene age: 4500 type: human title: God of the age: 45 earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Hercules name: Hydratype: titan title: God of the title: Divine hero type: monsterage: 10000 heaven and skies father father battled time: 2 battled locaion: [37.7,23.9] brother time:12 location: [39,22] name: Pluto name: Cerberus age: 4000 title: Ugly beast of the title: God of the underworld underworld petg.query().has(‘age’,GREATER_THAN,4500)
.has(‘title’,CONTAINS,’god’).vertices()!
  • name: Neptune name: Alcmene age: 4500 type: human title: God of the age: 45 earth and ocean brother mother name: Jupitername: Saturn age: 4800 name: Hercules name: Hydratype: titan title: God of the title: Divine hero type: monsterage: 10000 heaven and skies father father battled time: 2 battled locaion: [37.7,23.9] brother time:12 location: [39,22] name: Pluto name: Cerberus age: 4000 title: Ugly beast of the title: God of the underworld underworld pet g.query().has(‘location’,WITHIN,
 Geoshape.circle(38,24,50).edges()!
  • Faunus Features  Hadoop-based Graph Computing Framework  Graph Analytics  Breadth-first Traversals  Global Graph Computations  Batch Big Graph Data
  • Faunus Architecture g._()!
  • Faunus Work Flowg.V.out .out .count() hdfs://user/ubuntu/ output/job-0/ output/job-1/ graph* output/job-2/ { sideeffect*Compressed HDFS Graphs  stored in sequence files  variable length encoding  prefix compression
  • Faunus Setup$ bin/gremlin.sh ! ,,,/! (o o)!-----oOOo-(_)-oOOo-----!gremlin> g = FaunusFactory.open(bin/titan-hbase.properties)!==>faunusgraph[titanhbaseinputformat]!gremlin> g.getProperties()!==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!==>faunus.output.location=dbpedia!==>faunus.output.location.overwrite=true!gremlin> g._() !12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
  • Build a Knowledge Graph  Based on DBPedia   Graph version of Wikipedia   ~290 million edges (~1B triples)1.  Bulk load RDF into Faunus   6 m1.xlarge2.  Convert to property graph3.  Bulk load into Titan   3 m1.xlarge with Cassandra4.  OLTP+OLAP   Total Time: ~ 2 hours
  • Graph OLTPgremlin> g = TitanFactory.open(bin/cassandra.local) !==>titangraph[cassandrathrift:10.176.213.110]!gremlin> g.V(name,Random_walker_algorithm).both.name!==>Random_walk!==>Segmentation_(image_processing)!==>Graph_(mathematics)!==>Laplacian_matrix!==>Graph!==>Laplacian_matrix!==>Electrical_network!==>Resistor!==>Electrical_resistance_and_conductance!==>Ground_(electricity)!==>Direct_current!==>Voltage_source!==>Precomputation!==>Category:Computer_vision!==>Random_Walker_(Computer_Vision)!==>List_of_algorithms!==>Segmentation_(image_processing)!==>Watershed_(image_processing)!==>Random_walker_(computer_vision)!==>Random_Walker_(computer_vision)!
  • Graph OLAPgremlin> g.V(name,Learning).out.out.out.out[0..10].name !==>Latium!==>Roman_Kingdom!==>Roman_Republic!==>Roman_Empire!==>Middle_Ages!==>Early_modern_Europe!==>Armenian_Kingdom_of_Cilicia!==>Lingua_franca!==>Vatican_City!==>Vulgar_Latin!==>Romance_languages!
  • Complex Problem1.  Identify Entities2.  Identify Relationships3.  Apply Graph Analysis
  • Apache 2 Aurelius Graph Cluster TITAN FAUNUS FULGORA Map/Reduce Load Bulk Load Analysis results aureliusgraphs@googlegroups.com back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithmsproperty graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • TINKERPOP.COM
  • Thanks! Vadas Gintautas Marko Rodriguez @vadasg @twarko Stephen Mallette Daniel LaRocque @spmallette AURELIUS THINKAURELIUS.COM
  • We are Hiring AURELIUS THINKAURELIUS.COM @AURELIUSGRAPHS