• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Faunus: Graph Analytics Engine
 

Faunus: Graph Analytics Engine

on

  • 5,856 views

Faunus is a graph analytics engine built atop the Hadoop distributed computing platform. The graph representation is a distributed adjacency list, whereby a vertex and its incident edges are ...

Faunus is a graph analytics engine built atop the Hadoop distributed computing platform. The graph representation is a distributed adjacency list, whereby a vertex and its incident edges are co-located on the same machine. Querying a Faunus graph is possible with a MapReduce-variant of the Gremlin graph traversal language. A Gremlin expression compiles down to a series of MapReduce-steps that are sequence optimized and then executed by Hadoop. Results are stored as transformations to the input graph (graph derivations) or computational side-effects such as aggregates (graph statistics). Beyond querying, a collection of input/output formats are supported which enable Faunus to load/store graphs in the distributed graph database Titan, various graph formats stored in HDFS, and via arbitrary user-defined functions. This presentation will focus primarily on Faunus, but will also review the satellite technologies that enable it.

Statistics

Views

Total Views
5,856
Views on SlideShare
5,306
Embed Views
550

Actions

Likes
23
Downloads
205
Comments
0

3 Embeds 550

http://www.scoop.it 510
https://twitter.com 36
http://digaku.ansvia.com 4

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Faunus: Graph Analytics Engine Faunus: Graph Analytics Engine Presentation Transcript

    • FAUNUSMARKO A. RODRIGUEZhttp://THINKAURELIUS.COMGRAPH ANALYTICS ENGINE
    • Faunus is a graph analytics engine built atop the Hadoopdistributed computing platform. The graph representation isa distributed adjacency list, whereby a vertex and itsincident edges are co-located on the same machine.Querying a Faunus graph is possible with a MapReduce-variant of the Gremlin graph traversal language. A Gremlinexpression compiles down to a series of MapReduce-stepsthat are sequence optimized and then executed by Hadoop.Results are stored as transformations to the input graph(graph derivations) or computational side-effects such asaggregates (graph statistics). Beyond querying, a collectionof input/output formats are supported which enable Faunusto load/store graphs in the distributed graph database Titan,various graph formats stored in HDFS, and via arbitraryuser-defined functions. This presentation will focus primarilyon Faunus, but will also review the satellite technologiesthat enable it.ABSTRACThttp://FAUNUS.THINKAURELIUS.COM
    • SPONSORED BYECCO, the Evolution, Complexity and Cognition group, is a multidisciplinaryresearch group, directed by Francis Heylighen. They are localized at theVrije Universiteit Brussel (VUB), although members are distributed acrossfour continents. Researchers come from a wide variety of backgrounds,from physical science and technology to the social sciences and humanities.The philosophy is intrinsically transdisciplinary, transcending the traditionalboundaries between "hard" and "soft" sciences, and between philosophicalfoundations and practical applications.The Big-Data Interest Group (BIGDIG) is a focus group at LANL meetingmonthly to explore big-data methods and architectures. One goal of thegroup is to identify early adopters and learn from their experiences.Furthermore, they would like involve scientists that are looking for big-data solutions and foster collaboration with those who might provide theneeded technology. The BIGDIG group includes members from alldomains: science, security, sensing, computing, library, and more.The EgoSystem project is creating an integrated social model of the Los Alamos National Laboratoryand its surroundings using numerous online services such as Twitter, LinkedIn, MS Academic,Wikipedia, and more. The model is seeded with LANL PostDocs, their created artifacts andcontinuously grows to encompass their relations to other people and institutions. EgoSystem is aDirector sponsored project engineered by the Digital Library Research and Prototyping Team usingBig Graph Data technology provided by Aurelius.
    • VERTEX
    • 0 ID
    • 0name:faunusborn:2012PROPERTIES
    • 0name:faunusborn:2012EDGE1name:hadoopborn:2005
    • 0name:faunusborn:2012ID1name:hadoopborn:20055
    • 0name:faunusborn:2012LABEL1name:hadoopborn:2005dependsOn5
    • 0name:faunusborn:2012PROPERTIES1name:hadoopborn:2005dependsOnsince:20125
    • VERTICES + EDGES(ELEMENTS)
    • 0123VERTEX IDS
    • 01234567EDGE IDS
    • 0123ABAC4567EDGE LABELS
    • 0123ABACa:bc:de:fg:hi:j4567ELEMENTPROPERTIES
    • 0123ABACa:bc:de:fg:hi:j45671 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3
    • 0123ABACa:bc:de:fg:hi:j4567id props id props label id id props label idid label id id label id1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3
    • 0123ABACa:bc:de:fg:hi:j4567id propsvertexid props label idedgeid props label idedgeid label idedgeid label idedge1 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3
    • 0123ABACa:bc:de:fg:hi:j45671 e:f 4 c:d A 2 5 B 0 6 g:h A 3 7 C 3id propsvertexid props label idedgeid props label idedgeid label idedgeid label idedgeincoming edges outgoing edges
    • 0134567891011AN ADJACENCY LIST
    • 127.0.0.2 127.0.0.3 127.0.0.4AN ADJACENCY LIST+CLUSTER0134567891011
    • 01234567891011A DISTRIBUTED ADJACENCY LIST127.0.0.2 127.0.0.3 127.0.0.4
    • Hadoop is a distributed computing platform composed of two key components:HDFS:A distributed file system that stores arbitrarily large files within a cluster.MapReduce:A parallel functional computing model for key/value pair data.HADOOPhttp://hadoop.apache.org
    • 01234567891011StructureProcessFaunus provides graph input/output formats (structure)and a traversal language for graphs (process).FAUNUS AND HADOOP127.0.0.2 127.0.0.3 127.0.0.4
    • PROCESSING GRAPHSWITH FAUNUS
    • 1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12GRAPHOF THE GODS* Toy graph distributed with Faunus.
    • faunus$1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
    • faunus$ bin/gremlin.sh1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4http://gremlin.tinkerpop.com
    • faunus$ bin/gremlin.sh,,,/(o o)-----oOOo-(_)-oOOo-----gremlin>1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
    • faunus$ bin/gremlin.sh,,,/(o o)-----oOOo-(_)-oOOo-----gremlin> hdfs.ls()gremlin>1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
    • faunus$ bin/gremlin.sh,,,/(o o)-----oOOo-(_)-oOOo-----gremlin> hdfs.ls()gremlin> hdfs.copyFromLocal(graph-of-the-gods.json,graph-of-the-gods.json)==>nullgremlin>012345678910111603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
    • faunus$ bin/gremlin.sh,,,/(o o)-----oOOo-(_)-oOOo-----gremlin> hdfs.ls()gremlin> hdfs.copyFromLocal(graph-of-the-gods.json,graph-of-the-gods.json)==>nullgremlin> hdfs.ls()==>rw-r--r-- marko supergroup 2028 graph-of-the-gods.jsongremlin>012345678910111603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
    • gremlin> g = FaunusFactory.open(bin/faunus.properties)==>faunusgraph[graphsoninputformat->graphsonoutputformat]gremlin> g.getConf(faunus)==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.graphson.GraphSONInputFormat==>faunus.input.location=graph-of-the-gods.json==>faunus.graph.output.format=com.thinkaurelius.faunus.formats.graphson.GraphSONOutputFormat==>faunus.output.location=output==>faunus.output.location.overwrite=true==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat012345678910111603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:12 127.0.0.2 127.0.0.3 127.0.0.4
    • gremlin> g.V13/05/07 12:07:09 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)13/05/07 12:07:09 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map]13/05/07 12:07:09 INFO mapreduce.FaunusCompiler: Job data location: output/job-013/05/07 12:07:10 INFO input.FileInputFormat: Total input paths to process : 113/05/07 12:07:10 INFO mapred.JobClient: Running job: job_201304251105_000413/05/07 12:07:11 INFO mapred.JobClient: map 0% reduce 0%...1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:1201234567891011127.0.0.2 127.0.0.3 127.0.0.4111111111111
    • gremlin> g.V.has(type,god)13/05/07 12:08:55 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)13/05/07 12:08:55 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map,com.thinkaurelius.faunus.mapreduce.filter.PropertyFilterMap.Map]13/05/07 12:08:55 INFO mapreduce.FaunusCompiler: Job data location: output/job-013/05/07 12:08:56 INFO input.FileInputFormat: Total input paths to process : 113/05/07 12:08:57 INFO mapred.JobClient: Running job: job_201304251105_000513/05/07 12:08:58 INFO mapred.JobClient: map 0% reduce 0%...1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:1201234567891011127.0.0.2 127.0.0.3 127.0.0.4011100000000
    • gremlin> g.V.has(type,god).in(father)13/05/07 12:13:03 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)13/05/07 12:13:03 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map,com.thinkaurelius.faunus.mapreduce.filter.PropertyFilterMap.Map,com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Map,com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Reduce]13/05/07 12:13:03 INFO mapreduce.FaunusCompiler: Job data location: output/job-013/05/07 12:13:03 INFO input.FileInputFormat: Total input paths to process : 113/05/07 12:13:04 INFO mapred.JobClient: Running job: job_201304251105_000613/05/07 12:13:05 INFO mapred.JobClient: map 0% reduce 0%...1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:1201234567891011127.0.0.2 127.0.0.3 127.0.0.4000000010000
    • gremlin> g.V.has(type,god).in(father).out(mother).name13/05/07 12:25:18 INFO mapreduce.FaunusCompiler: Compiled to 3 MapReduce job(s)13/05/07 12:25:18 INFO mapreduce.FaunusCompiler: Executing job 1 out of 3:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map,com.thinkaurelius.faunus.mapreduce.filter.PropertyFilterMap.Map,com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Map,com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Reduce]13/05/07 12:25:18 INFO mapreduce.FaunusCompiler: Job data location: output/job-013/05/07 12:25:18 INFO input.FileInputFormat: Total input paths to process : 113/05/07 12:25:18 INFO mapred.JobClient: Running job: job_201305071220_0007...==>alcmenegremlin>1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:1201234567891011127.0.0.2 127.0.0.3 127.0.0.4000000001000
    • 1k1:v1k2:v2 2 3 5k1:v1vertex edgeincoming edges4edge edgeoutgoing edgesedgeTRAVERSAL DATA1. A long counter denoting how manytraversers exist at the element.-OR-2. A list of lists denoting path history ofindividual traversers at the element.counter=cheapenumerative=expensive* Each element in a rowmaintains traversal data as well.k1:v1 k1:v1 k1:v1
    • gremlin> g.V.has(type,god).in(father).out(mother).path13/05/07 14:37:59 WARN mapreduce.FaunusCompiler: Path calculations are enabled forthis Faunus job (space and time expensive)13/05/07 14:37:59 INFO mapreduce.FaunusCompiler: Compiled to 3 MapReduce job(s)13/05/07 14:37:59 INFO mapreduce.FaunusCompiler: Executing job 1 out of 3:MapSequence[com.thinkaurelius.faunus.mapreduce.transform.VerticesMap.Map,com.thinkaurelius.faunus.mapreduce.filter.PropertyFilterMap.Map,com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Map,com.thinkaurelius.faunus.mapreduce.transform.VerticesVerticesMapReduce.Reduce]13/05/07 14:38:00 INFO mapred.JobClient: Running job: job_201305071220_0005...==>[v[1], v[7], v[8]]gremlin>1603name:tartarustype:locationname:plutotype:godlivesbrothername:jupitertype:god 2brother name:neptunetype:godpet11name:cerberustype:monsterlivesfathername:saturntype:titanbrother5name:seatype:locationlives4name:skytype:locationlives7fatherbattledname:herculestype:demigod10name:hydratype:monsterbattled9name:nemeantype:monsterbattled8name:alcmenetype:humanmothertime:1 time:2 time:1201234567891011127.0.0.2 127.0.0.3 127.0.0.4[1,7,8]
    • GREMLINGRAPH TRAVERSAL LANGUAGETRANSFORM FILTER SIDE-EFFECT BRANCHt : (V [ E) ! P(V [ E) f : (V [ E) ! (V [ E [ ;) s : (V [ E)/!(V [ E)f1 f2 f3 · · · f4transform{}VidlabeloutinoutEinEinVmaporder...filter{}hashasNot[0..10]randomsimplePathback...sideEffect{}groupCountgroupByaggregatetablestorelinkInlinkOutcount...loopcopySplitfairMergeexhaustMerge...Gremlin is a functional graph language where traversals aredefined using function composition. A set of useful predefinedfunctions are provided with the language and genericlambdas/closures are possible for arbitrary mappings.http://gremlin.tinkerpop.com
    • EXAMPLE TRAVERSALSg.V.has(type,person).out(attends).has(type,academy).name.groupCountg.V.out.out.out.simplePath.count()"How many people attend each academy?"g.V.sideEffect{it.degree = it.inE(friend).count()}.degree.groupCount"What is the in-degree distribution of the friendship subgraph?""How many 3-step acyclic paths exist in the graph?"* The only memory structure is the graph,thus all data must be in the graph.g.V.as(x).out(father).out(father).linkIn(grandfather,x)"Derive all implicit grandfather relations in the graph."g.V.count()"How many vertices are in the graph?"* Mutates the graph.
    • hdfs://user/ubuntu/output/job-0/output/job-1/output/job-2/ {graph*sideeffect*g.V.out .out .count()<NullWritable, FaunusVertex> <NullWritable, FaunusVertex><NullWritable, FaunusVertex> <LongWritable, Holder<FaunusElement>><LongWritable,Iterable<Holder<FaunusElement>>><NullWritable, FaunusVertex>MAP ONLY STEPS(NO REDUCE NEEDED)MAP/REDUCE STEPSmapmapreduceFAUNUS DATA FLOWvaluekey
    • GREMLIN IN MAP/REDUCEmap(null, vertex, context) {key = context.getConf().get(provided.key)value = context.getConf().get(provided.value)if(!vertex.getProperty(key).equals(value)) {vertex.clearPaths();}context.write(vertex);}FILTERf : (V [ E) ! (V [ E [ ;)g.V.has(type,god)* Most filters are map-only steps.If the predicate returns false,then all the path metadata is cleared from the element.f(v)typegod
    • map(null, vertex, context) {for(e : vertex.getEdges(OUT)) {context.write(e.getVertex(IN).id, holder(p,vertex.pathsOnly()))}context.write(vertex.id, holder(v,vertex))}reduce(long, iterable<holder> holders, context) {vertex = new FaunusVertex(long)for(h : holders) {if(h.getTag() == v))vertex.addAll(h.getVertex())elsevertex.addPaths(h.getVertex())}context.write(null, vertex)}127.0.0.4127.0.0.3127.0.0.2GREMLIN IN MAP/REDUCEt : (V [ E) ! P(V [ E)TRANSFORMg.V.out* Traversals implement a reduce-side join.
    • map(null, vertex, context) {key = context.getConf().get(provided.key)context.write(graph,null,vertex)context.write(sideeffect,vertex.getProperty(key),vertex.getPathCount())}reduce(object, iterable<long> longs, context) {sum = 0for(l : longs) { sum += l }context.write(sideeffect,object,sum)}GREMLIN IN MAP/REDUCESIDE-EFFECTs : (V [ E)/!(V [ E)g.V.type.groupCount()s(v)type* Leverages MultipleInputs/Outputs
    • STRUCTURING GRAPHSWITH FAUNUS
    • INPUT/OUTPUT FORMATSSequenceFileOutputFormatA list of serialized vertex objects in a compressed binary format.<NullWritable,FaunusVertex>The intermediate data format between MapReduce jobswithin a Faunus pipeline.Fastest available format for both reading and writing.Compressed using variable-width and prefix encodings.gremlin> g==>faunusgraph[graphsoninputformat->graphsonoutputformat]gremlin> g.setGraphOutputFormat(SequenceFileOutputFormat)==>nullgremlin> g==>faunusgraph[graphsoninputformat->sequencefileoutputformat]gremlin>SequenceFileInputFormat
    • INPUT/OUTPUT FORMATSGraphSONOutputFormatA verbose JSON-based text-format. Each vertex is a single JSON document.Easy for developers to generate. Useful for testing and examples.Limited to JSON supported datatypes for element property values.{"name":"saturn","type":"titan","_id":0,"_inE":[{"_label":"father","_id":12,"_outV":1}]}{"name":"jupiter","type":"god","_id":1,"_outE":[{"_label":"lives","_id":13,"_inV":4},{"_label":"brother","_id":16,"_inV":3},{"_label":"brother","_id":14,"_inV":2},{"_label":"father","_id":12,"_inV":0}],"_inE":[{"_label":"brother","_id":17,"_outV":3},{"_label":"brother","_id":15,"_outV":2},{"_label":"father","_id":24,"_outV":7}]}{"name":"neptune","type":"god","_id":2,"_outE":[{"_label":"lives","_id":20,"_inV":5},{"_label":"brother","_id":19,"_inV":3},{"_label":"brother","_id":15,"_inV":1}],"_inE":[{"_label":"brother","_id":18,"_outV":3},{"_label":"brother","_id":14,"_outV":1}]}...GraphSONInputFormat* JSON specification is available at http://json.org
    • INPUT/OUTPUT FORMATSfaunus.graph.input.format=com.thinkaurelius.faunus.formats.edgelist.rdf.RDFInputFormatfaunus.input.location=graph-example-1.ntriplefaunus.graph.input.rdf.format=n-triplesfaunus.graph.input.rdf.as-properties=http://www.w3.org/1999/02/22-rdf-syntax-ns#typefaunus.graph.input.rdf.use-localname=truefaunus.graph.input.rdf.literal-as-property=trueRDFInputFormatMaps popular RDF text formats to a property graph.Configurations allow for different mappings of RDF to the property graph model.Utilizes a MapReduce step to convert an edge-list into an adjacency list.33^^xsd:intex:markofoaf:age 0uri:ex:markoage:33* RDF parsers provided by http://openrdf.org
    • INPUT/OUTPUT FORMATSRexsterInputFormatRexster{"results": {"_type":"vertex","_id":1,"name":"tiberius","age":29},"queryTime":0.123}HTTP REXPROhttp://.../vertices/1g.v(1).out(mother).out(mother).name==>aureliaRexster is a graph server that is accessed via:REST and a Gremlin binary protocol.Rexster supports any Blueprints-enabled graph database.http://rexster.tinkerpop.com
    • INPUT/OUTPUT FORMATSA Gremlin script stored in HDFS (distributed cache) allows for an arbitrary parse.def boolean read(FaunusVertex v, String line) {parts = line.split(:);v.reuse(Long.valueOf(parts[0]))parts[1].split(,).each {v.addEdge(OUT, linkedTo, Long.valueOf(it));}return true;}ScriptInputFormat0:1,2,3,41:2,32:0,3,5,63:1,2...def void write(FaunusVertex vertex, DataOutput output) {output.writeUTF(vertex.getId().toString() + :);Iterator<Edge> itty = vertex.getEdges(OUT).iterator()while (itty.hasNext()) {output.writeUTF(itty.next().getVertex(IN).getId() + ,);}output.writeUTF(n);}ScriptOutputFormat0:1,2,3,41:2,32:0,3,5,63:1,2...
    • Adam Jacobs. 2009. The Pathologies of Big Data. Communications of the ACM 52, 8 (August 2009), 36-44.doi:10.1145/1536616.1536632 http://doi.acm.org/10.1145/1536616.1536632
    • 0134567891011Serial Key/Value Data Structure Indexed Key/Indexed Value Data Structure0134567891011GLOBAL VS. LOCALGRAPH ANALYSIS
    • TITANDISTRIBUTED GRAPH DATABASEApplication Servers Reading/Writing Graph DataTitan Cluster Processing Gremlin Traversals and WritesThe biggest known Titan/Cassandra cluster to date:~120 billion edge graph stored in a 16 hi1.4xlarge machine cluster.Ego-centric graph traversals are requested by 80 m1.large machines.The cluster serves ~10,000 transactions a second w/ ~200ms return times.http://titan.thinkaurelius.comhttp://thinkaurelius.com/2013/05/13/educating-the-planet-with-pearson/
    • FAUNUS AND TITANSUPPORTED TITAN INPUT/OUTPUT FORMATSTitanCassandraInputFormatTitanCassandraOutputFormatTitanHBaseInputFormatTitanHBaseOutputFormat
    • FAUNUS AND TITANFaunus/HadoopTitan/CassandraINTRA-CLUSTER CONFIGURATIONData is processed on the machine where it is located.Limited network communication.
    • FAUNUS AND TITANINTER-CLUSTER CONFIGURATIONGraph data is offloaded to another cluster.Repeated analysis does not interfere with production graph database.
    • Graph glong counter = 0def setup(args) {g = TitanFactory.open(cassandra:localhost)}def map(vertex, args) {g.v(vertex.id).as(x).out(father).out(father).linkIn(grandfather,x)if(counter++ % 1000 == 0) g.commit()}FAUNUS AND TITANVERTEX-CENTRIC COMPUTING WITH GREMLINA Gremlin script is stored in HDFS (distributed cache).Vertex long ids are pulled out of Titan (FaunusVertex with id only).The Gremlin script is evaluated concurrently for every vertex long id.Guaranteed co-location of Gremlin script JVM and Titan vertex.* Provided by the Gremlin script()-step
    • CREDITSPRESENTED BYMARKO A. RODRIGUEZSUPPORTED BYLOS ALAMOS NATIONAL LABORATORYLANL RESEARCH LIBRARYVRIJE UNIVERSITEIT BRUSSELMANY THANKS TOMATTHIAS BRöCHELERSTEPHEN MALLETTEPAVEL YASKEVICHDAN LAROCQUEAURELIUS COMMUNITYTINKERPOP COMMUNITYKETRINA YIM