SlideShare a Scribd company logo
1 of 76
Download to read offline
BIG GRAPH DATA
Understanding a Complex World


Matthias Broecheler, CTO
@mbroecheler               AURELIUS
November XIII, MMXII       THINKAURELIUS.COM
I
Graph Foundation


                   AURELIUS
                   THINKAURELIUS.COM
name: Neptune
   name: Alcmene
                         type: god
       type: god



Vertex
                                                              Property


         name: Saturn
   name: Jupiter
   name: Hercules
         type: titan
    type: god
       type: demigod




                         name: Pluto
     name: Cerberus
                         type: god
       type: monster




                                                            Graph
name: Neptune
                  name: Alcmene
                                   type: god
                      type: god



Edge
                        brother
                         mother


       name: Saturn
               name: Jupiter
                  name: Hercules
       type: titan
                type: god
                      type: demigod



              father
                       father

                                                                                        Edge
                                                        battled
                        brother
                                                      Property
                                                      time:12


                                   name: Pluto
                    name: Cerberus
                                   type: god
                      type: monster

   Edge
   Type                                      pet



                                                                                     Graph
name: Neptune
                  name: Alcmene
                            type: god
                      type: god




                 brother
                         mother


name: Saturn
               name: Jupiter
                  name: Hercules
type: titan
                type: god
                      type: demigod



       father
                       father


                                                 battled
                 brother
                                               time:12


                            name: Pluto
                    name: Cerberus
                            type: god
                      type: monster



                                      pet



                                                                              Path
name: Neptune
                  name: Alcmene
                            type: god
                      type: god




                 brother
                         mother


name: Saturn
               name: Jupiter
                  name: Hercules
type: titan
                type: god
                      type: demigod



       father
                       father


                                                 battled
                 brother
                                               time:12


                            name: Pluto
                    name: Cerberus
                            type: god
                      type: monster



                                      pet



                                                                              Degree
I
Connected World


                  AURELIUS
                  THINKAURELIUS.COM
HEALTH
HEALTH
HEALTH
HEALTH
ECONOMY
ECONOMY
ECONOMY
ECONOMY
Social
Systems
Social
Systems
Social
Systems
Social
Systems
III
Titan Graph Database



                       AURELIUS
                       THINKAURELIUS.COM
Titan Features
  Numerous Concurrent Users
  Many Short Transactions
    read/write
  Real-time Traversals (OLTP)
  High Availability
  Dynamic Scalability
  Variable Consistency Model
    ACID or eventual consistency
  Real-time Big Graph Data
Storage Backends
               Partitionability




Consistency
                       Availability
Titan Features

I.  Data Management




II.  Vertex-Centric
     Indices
Titan Features

III.  Graph
   Partitioning




IV.  Edge Compression
Titan Ecosystem
  Native Blueprints                  Graph
                                      Server

   Implementation
                                      Graph

  Gremlin Query                    Algorithms



   Language
                       Object-Graph
                                     Mapper


  Rexster Server
                  Traversal
                                    Language
    any Titan graph can be
     exposed as a REST endpoint
    Dataflow
                                   Processing


                                     Generic
                                    Graph API
IV
Github Network



                 AURELIUS
                 THINKAURELIUS.COM
Setup




$ ./titan-0.1.0/bin/gremlin.sh!
  ! ! !,,,/!
         (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = TitanFactory.open('/tmp/titan')!
==>titangraph[local:/tmp/titan]!
Titan Storage Model
  Adjacency list in one                         5

   column family
  Row key = vertex id
  Each property and edge
   in one column
                          5
     Denormalized, i.e. stored twice
  Direction and label/key as column prefix
     Use slice predicate for quick retrieval
created
           USER             edited



                opened
       pushed
COMMENT                                           PAGE

         on
               ISSUE      COMMIT

   on
           on
             to
               in



                REPOSITORY
Defining Property Keys




gremlin>   g.makeType().name(‘username’).!
  ! ! !    dataType(String.class).!
  ! ! !    functional().!
  ! ! !    indexed().unique().!
  ! ! !    makePropertyKey()!
gremlin>   g.makeType().name(‘time’).!
  ! ! !    dataType(Long.class).!
  ! ! !    functional().makePropertyKey()!
Defining Edge Labels




gremlin>   g.makeType().name(‘on’).!
  ! ! !    makeEdgeLabel()!
gremlin>   g.makeType().name(‘pushed’).!
  ! ! !    primaryKey(time).!
  ! ! !    makeEdgeLabel()!
gremlin>   g.makeType().name(‘in’).!
  ! ! !    unidirected().!
  ! ! !    makeEdgeLabel()!
Create & Retrieve




gremlin> v = g.addVertex([username: ‘okram’])!
==>v[4]!
gremlin> v.map!
==>{username=okram}!
gremlin> g.V('username','okram')!
==>v[4]!
Titan Locking
  Locking ensures consistency
   when it is needed
                    name : Hercules
          5
  Titan uses time stamped
   consistent reads and writes
                                                                   9
   on separate CFs for locking
  Uses
                                                                name :
     Property uniqueness: .unique()
    name :
                                         Hercules
                                                                Jupiter

     Functional edges: .functional()
                father

     Global ID management
                                                     x
                                                                name :
                                                     father
    Pluto
Titan Indexing
  Vertices can be retrieved by
   property key + value
          name : Hercules
   5
  Titan maintains index in a
   separate column family as      name : Jupiter
    9
   graph is updated
  Only need to define a
   property key as .index()
Basic Queries




gremlin>   v.out(‘pushed’)!
gremlin>   v.out(‘pushed’).out(‘to’).name!
gremlin>   v.out(‘pushed’).out(‘to’).dedup.name!
gremlin>   v.out(‘pushed’).out(‘to’).dedup.!
  ! ! !    name.sort{it}!
gremlin>   v.outE(‘pushed’).has(‘time’,T.gt,1000).inV!
Basic Queries




gremlin>   v.out(‘pushed’)!
gremlin>   v.out(‘pushed’).out(‘to’).name!
gremlin>   v.out(‘pushed’).out(‘to’).dedup.name!
gremlin>   v.out(‘pushed’).out(‘to’).dedup.!
  ! ! !    name.sort{it}!
gremlin>   v.outE(‘pushed’).has(‘time’,T.gt,1000).inV!

                       Query Optimization
Vertex-Centric Indices
  Sort and index edges per
   vertex by primary key
    Primary key can be composite
  Enables efficient focused
   traversals
    Only retrieve edges that matter
  Uses push down predicates for
   quick, index-driven retrieval
battled
         battled
        battled
 time: 1
        time: 3
        time: 5



       mother
                       battled
                            v
                  v.query()!
                                     time: 9



  father
        fought
         fought
battled
         battled
        battled
 time: 1
        time: 3
        time: 5



       mother
                       battled
                            v
                  v.query()!
                                     time: 9
                                                 .direction(OUT)!

  father
battled
    battled
        battled
 time: 1
   time: 3
        time: 5




                                battled
                       v
                  v.query()!
                                time: 9
                                            .direction(OUT)!
                                            .labels(‘battled’)!
battled
    battled
 time: 1
   time: 3




                       v
   v.query()!
                             .direction(OUT)!
                             .labels(‘battled’)!
                             .has(‘time,T.lt,5)!
Recommendation
                                       Engine



gremlin>   v.out('pushed').out('to')[0..9].!
  ! ! !    in('to').in('pushed')[0..500].!
  ! ! !    except([v]).name.!
  ! ! !    groupCount.cap.next().sort{-it.value}[0..4]!
Recommendation
                                       Engine



gremlin>   v.out('pushed').out('to')[0..9].!
  ! ! !    in('to').in('pushed')[0..500].!
  ! ! !    except([v]).name.!
  ! ! !    groupCount.cap.next().sort{-it.value}[0..4]!

v = g.V(‘username’,’okram’):!
==>lvca=175!
==>spmallette=56!
==>sgomezvillamor=36!
==>mbroecheler=33!
==>joshsh=20!
Recommendation
                                       Engine



gremlin>   v.out('pushed').out('to')[0..9].!
  ! ! !    in('to').in('pushed')[0..500].!
  ! ! !    except([v]).name.!
  ! ! !    groupCount.cap.next().sort{-it.value}[0..4]!

v = g.V(‘username’,’torvalds’):!
==>iksaif=90!
==>rjwysocki=22!
==>kernel-digger=20!
==>giuseppecalderaro=16!
==>groeck=15!
Titan Embedding
  Rexster RexPro
    lightweight Gremlin
     Server
    based on Grizzly
  Titan Gremlin Engine
  Embedded Storage
   Backend
    in-JVM method calls
Graph Partitioning
Goal: Vertex Co-location
  Titan maintains multiple
   ID Pools
  Ordered Partitioner in
   Storage Backend
  Dynamically determines
   optimal partition and
   allocates corresponding    ID Pool
   IDs
What’s coming
  Full-text indexing
    external index system integration
  Bulk Loading
    integration with storage backend
     utilities and Hadoop ingestion
  240 Billion Edge Benchmark
    performance analysis and improvements
     across the entire stack
V
Faunus Graph Analytics



                         AURELIUS
                         THINKAURELIUS.COM
Faunus Features
  Hadoop-based Graph
   Computing Framework
  Graph Analytics
  Breadth-first Traversals
  Global Graph Computations
  Batch Big Graph Data
Faunus Architecture




         g._()!
Faunus Work Flow

g.V.out                        .out                   .count()




                                  hdfs://user/ubuntu/
                                      output/job-0/
                                      output/job-1/       graph*
                                      output/job-2/   {   sideeffect*
Compressed HDFS Graphs
  stored in sequence files
  variable length encoding
  prefix compression
Faunus Setup


$ bin/gremlin.sh !

         ,,,/!
         (o o)!
-----oOOo-(_)-oOOo-----!
gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')!
==>faunusgraph[titanhbaseinputformat]!
gremlin> g.getProperties()!
==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat
==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat!
==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat!
==>faunus.output.location=dbpedia!
==>faunus.output.location.overwrite=true!
gremlin> g._() !
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)!
12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1:
MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]!
12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
Graph Analytics




gremlin> g.E.has('label',’followed').keep.!
  ! ! !V.sideEffect('{it.degree = it.outE.count()}').!
  ! ! !degree.groupCount!


gremlin> g.E.has('label','pushed').keep.!
  ! ! !V.sideEffect('{it.degree = it.outE.count()}').!
  ! ! !degree.groupCount!
Follow Degree Distribution
Follow Degree Distribution



           P(k) ~ k-γ
           γ = 2.2
Pushed Degree Distribution
Global
                                   Recommendations




gremlin> g.E.has('label','pushed','to').keep.!
  ! ! !V.out('pushed').out('to').!
  ! ! !in('to').in('pushed').!
  ! ! !sideEffect('{it.score =it.pathCounter}').!
  ! ! !score.order(F.decr,'name')!

# Top 5:!
Jippi ! ! !       !60892182927!
garbear ! !       !30095282886!
FakeHeal ! !      !30038040349!
brianchandotcom   !24684133382!
nyarla   ! !      !15230275746!
What’s coming
  Faunus 0.1
  Bulk Loading
    loaded graph into Titan
    loading derivations into Titan
  Extending Gremlin Support
    currently only a subset is of
     Gremlin implemented
  Operational Tools
I
Graph = Relationship Centric
II
Graph = Agile Data Model
III
Graph = Algebraic Data Model
Aurelius Graph Cluster
                                                     Apache 2




                                Map/Reduce
                        Load & Compress




                                 Analysis results
                                 back into Titan


    Stores a massive-scale                    Batch processing of large           Runs global graph algorithms
property graph allowing real-                   graphs with Hadoop
                  on large, compressed,
 time traversals and updates
                                                          in-memory graphs
Speed of Traversal/Process
     The Graph Landscape




Illustration only, not to scale
                                         Size of Graph
TINKERPOP.COM
Thanks!


   Vadas Gintautas
    Marko Rodriguez
   @vadasg
            @twarko


   Stephen Mallette
   Daniel LaRocque
   @spmallette

                           AURELIUS
                           THINKAURELIUS.COM
AURELIUS
THINKAURELIUS.COM
XVX
Benchmark Results



                    AURELIUS
                    THINKAURELIUS.COM
XVX - I
Titan Performance Evaluation on
Twitter-like Benchmark


                         AURELIUS
                         THINKAURELIUS.COM
Twitter Benchmark
  1.47 billion followship edges
   and 41.7 million users
     Loaded into Titan using BatchGraph
     Twitter in 2009, crawled by Kwak et. al
  4 Transaction Types
       Create Account (1%)
       Publish tweet (15%)
       Read stream (76%)
       Recommendation (8%)
           Follow recommended user (30%)
                                                 Kwak, H., Lee, C., Park, H., Moon, S., “What is
                                                 Twitter, a Social Network or a News Media?,”
                                                 World Wide Web Conference, 2010.
Benchmark Setup
  6 cc1.4xl Cassandra nodes
     in one placement group
     Cassandra 1.10
  40 m1.small worker machines
     repeatedly running transactions
     simulating servers handling user
      requests
  EC2 cost: $11/hour
Benchmark Results

Transaction Type
       Number of tx
     Mean tx time
 Std of tx time
Create account
               379,019      
       115.15 ms
    5.88 ms
Publish tweet
              7,580,995 
             18.45 ms
    6.34 ms
Read stream
               37,936,184          
     6.29 ms
    1.62 ms
Recommendation
             3,793,863 
             67.65 ms
   13.89 ms
               Total
      49,690,061
         Runtime
            2.3 hours
                                                      5,900 tx/sec
Peak Load Results

Transaction Type
       Number of tx
     Mean tx time
          Std of tx time
Create account
               374,860      
       172.74 ms 
         10.52 ms
Publish tweet
              7,517,667 
             70.07 ms
          19.43 ms
Read stream
               37,618,648          
    24.40 ms
           
3.18 ms
Recommendation
             3,758,266 
            229.83 ms
          29.08 ms
               Total
      49,269,441
         Runtime
            1.3 hours
                                                     10,200 tx/sec
Benchmark Conclusion

Titan   can   handle   10s   of   thousands   of   concurrent  
users   with   short   response   5mes   even   for   complex  
traversals   on   a   simulated   social   networking  
applica5on  based  on  real-­‐world  network  data  with  
billions  of  edges  and  millions  of  users  in  a  standard  
EC2  deployment.  
For  more  informa5on  on  the  benchmark:  
hDp://thinkaurelius.com/2012/08/06/5tan-­‐provides-­‐real-­‐5me-­‐big-­‐graph-­‐
data/  

More Related Content

Viewers also liked

Come funziona Internet e perché il software libero è fondamentale
Come funziona Internet e perché il software libero è fondamentaleCome funziona Internet e perché il software libero è fondamentale
Come funziona Internet e perché il software libero è fondamentaleAndrea Lazzarotto
 
Recuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateRecuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateAndrea Lazzarotto
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Martin Junghanns
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Yuanyuan Tian
 
Vbug nov 2010 Visio Validation
Vbug nov 2010   Visio ValidationVbug nov 2010   Visio Validation
Vbug nov 2010 Visio ValidationDavid Parker
 
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiRicostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiAndrea Lazzarotto
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBMohamed Taher Alrefaie
 
Sql saturday and share point saturday cambridge 2015 - david parker - visio
Sql saturday and share point saturday cambridge 2015 - david parker - visioSql saturday and share point saturday cambridge 2015 - david parker - visio
Sql saturday and share point saturday cambridge 2015 - david parker - visioDavid Parker
 
Graph databases in PHP @ PHPCon Poland 10-22-2011
Graph databases in PHP @ PHPCon Poland 10-22-2011 Graph databases in PHP @ PHPCon Poland 10-22-2011
Graph databases in PHP @ PHPCon Poland 10-22-2011 Alessandro Nadalin
 
Visio 2010 tips and techniques handouts
Visio 2010 tips and techniques handoutsVisio 2010 tips and techniques handouts
Visio 2010 tips and techniques handoutsSteven XU
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataGuido Schmutz
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics EngineLDBC council
 
TinkerPop and Titan from a Python State of Mind
TinkerPop and Titan from a  Python State of MindTinkerPop and Titan from a  Python State of Mind
TinkerPop and Titan from a Python State of MindDenise Gosnell, Ph.D.
 
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...DataStax Academy
 
Come si creano le app Android
Come si creano le app AndroidCome si creano le app Android
Come si creano le app AndroidAndrea Lazzarotto
 
A walk in graph databases v1.0
A walk in graph databases v1.0A walk in graph databases v1.0
A walk in graph databases v1.0Pierre De Wilde
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyInfiniteGraph
 
Graph Theory #searchlove The theory that underpins how all search engines wor...
Graph Theory #searchlove The theory that underpins how all search engines wor...Graph Theory #searchlove The theory that underpins how all search engines wor...
Graph Theory #searchlove The theory that underpins how all search engines wor...Kelvin Newman
 

Viewers also liked (20)

Come funziona Internet e perché il software libero è fondamentale
Come funziona Internet e perché il software libero è fondamentaleCome funziona Internet e perché il software libero è fondamentale
Come funziona Internet e perché il software libero è fondamentale
 
Recuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiateRecuperare dati da partizioni NTFS danneggiate
Recuperare dati da partizioni NTFS danneggiate
 
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
Meetup Big Data User Group Dresden: Gradoop - Scalable Graph Analytics with A...
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 
Vbug nov 2010 Visio Validation
Vbug nov 2010   Visio ValidationVbug nov 2010   Visio Validation
Vbug nov 2010 Visio Validation
 
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiatiRicostruzione forense di NTFS con metadati parzialmente danneggiati
Ricostruzione forense di NTFS con metadati parzialmente danneggiati
 
Graph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DBGraph databases: Tinkerpop and Titan DB
Graph databases: Tinkerpop and Titan DB
 
Sql saturday and share point saturday cambridge 2015 - david parker - visio
Sql saturday and share point saturday cambridge 2015 - david parker - visioSql saturday and share point saturday cambridge 2015 - david parker - visio
Sql saturday and share point saturday cambridge 2015 - david parker - visio
 
Graph databases in PHP @ PHPCon Poland 10-22-2011
Graph databases in PHP @ PHPCon Poland 10-22-2011 Graph databases in PHP @ PHPCon Poland 10-22-2011
Graph databases in PHP @ PHPCon Poland 10-22-2011
 
Visio 2010 tips and techniques handouts
Visio 2010 tips and techniques handoutsVisio 2010 tips and techniques handouts
Visio 2010 tips and techniques handouts
 
Apache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-DataApache Cassandra for Timeseries- and Graph-Data
Apache Cassandra for Timeseries- and Graph-Data
 
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
8th TUC Meeting – Yinglong Xia (Huawei), Big Graph Analytics Engine
 
Flexsim y Visio
Flexsim y VisioFlexsim y Visio
Flexsim y Visio
 
TinkerPop and Titan from a Python State of Mind
TinkerPop and Titan from a  Python State of MindTinkerPop and Titan from a  Python State of Mind
TinkerPop and Titan from a Python State of Mind
 
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
C* Summit 2013: Distributed Graph Computing with Titan and Faunus by Matthias...
 
Come si creano le app Android
Come si creano le app AndroidCome si creano le app Android
Come si creano le app Android
 
Getting Started with Graph Databases
Getting Started with Graph Databases Getting Started with Graph Databases
Getting Started with Graph Databases
 
A walk in graph databases v1.0
A walk in graph databases v1.0A walk in graph databases v1.0
A walk in graph databases v1.0
 
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph TechnologyOracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
Oracle NoSQL DB & InfiniteGraph - Trends in Big Data and Graph Technology
 
Graph Theory #searchlove The theory that underpins how all search engines wor...
Graph Theory #searchlove The theory that underpins how all search engines wor...Graph Theory #searchlove The theory that underpins how all search engines wor...
Graph Theory #searchlove The theory that underpins how all search engines wor...
 

More from Matthias Broecheler

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Matthias Broecheler
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Matthias Broecheler
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraMatthias Broecheler
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksMatthias Broecheler
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksMatthias Broecheler
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Matthias Broecheler
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksMatthias Broecheler
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksMatthias Broecheler
 

More from Matthias Broecheler (10)

Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3Titan: Scaling Graphs and TinkerPop3
Titan: Scaling Graphs and TinkerPop3
 
Titan NYC Meetup March 2014
Titan NYC Meetup March 2014Titan NYC Meetup March 2014
Titan NYC Meetup March 2014
 
Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013Graph Computing @ Strangeloop 2013
Graph Computing @ Strangeloop 2013
 
Titan - Graph Computing with Cassandra
Titan - Graph Computing with CassandraTitan - Graph Computing with Cassandra
Titan - Graph Computing with Cassandra
 
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social NetworksPMatch: Probabilistic Subgraph Matching on Huge Social Networks
PMatch: Probabilistic Subgraph Matching on Huge Social Networks
 
Budget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large NetworksBudget-Match: Cost Effective Subgraph Matching on Large Networks
Budget-Match: Cost Effective Subgraph Matching on Large Networks
 
Probabilistic Soft Logic
Probabilistic Soft LogicProbabilistic Soft Logic
Probabilistic Soft Logic
 
Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010Computing Marginal in CCMRFs - NIPS 2010
Computing Marginal in CCMRFs - NIPS 2010
 
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social NetworksA Scalable Framework for Modeling Competitive Diffusion in Social Networks
A Scalable Framework for Modeling Competitive Diffusion in Social Networks
 
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social NetworksCOSI: Cloud Oriented Subgraph Identification in Massive Social Networks
COSI: Cloud Oriented Subgraph Identification in Massive Social Networks
 

Recently uploaded

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 

Big Graph Data

  • 1. BIG GRAPH DATA Understanding a Complex World Matthias Broecheler, CTO @mbroecheler AURELIUS November XIII, MMXII THINKAURELIUS.COM
  • 2. I Graph Foundation AURELIUS THINKAURELIUS.COM
  • 3. name: Neptune name: Alcmene type: god type: god Vertex Property name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod name: Pluto name: Cerberus type: god type: monster Graph
  • 4. name: Neptune name: Alcmene type: god type: god Edge brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father Edge battled brother Property time:12 name: Pluto name: Cerberus type: god type: monster Edge Type pet Graph
  • 5. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Path
  • 6. name: Neptune name: Alcmene type: god type: god brother mother name: Saturn name: Jupiter name: Hercules type: titan type: god type: demigod father father battled brother time:12 name: Pluto name: Cerberus type: god type: monster pet Degree
  • 7. I Connected World AURELIUS THINKAURELIUS.COM
  • 20.
  • 21.
  • 22.
  • 23. III Titan Graph Database AURELIUS THINKAURELIUS.COM
  • 24. Titan Features   Numerous Concurrent Users   Many Short Transactions   read/write   Real-time Traversals (OLTP)   High Availability   Dynamic Scalability   Variable Consistency Model   ACID or eventual consistency   Real-time Big Graph Data
  • 25. Storage Backends Partitionability Consistency Availability
  • 26. Titan Features I.  Data Management II.  Vertex-Centric Indices
  • 27. Titan Features III.  Graph Partitioning IV.  Edge Compression
  • 28. Titan Ecosystem   Native Blueprints Graph Server Implementation Graph   Gremlin Query Algorithms Language Object-Graph Mapper   Rexster Server Traversal Language   any Titan graph can be exposed as a REST endpoint Dataflow Processing Generic Graph API
  • 29. IV Github Network AURELIUS THINKAURELIUS.COM
  • 30. Setup $ ./titan-0.1.0/bin/gremlin.sh! ! ! !,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> g = TitanFactory.open('/tmp/titan')! ==>titangraph[local:/tmp/titan]!
  • 31. Titan Storage Model   Adjacency list in one 5 column family   Row key = vertex id   Each property and edge in one column 5   Denormalized, i.e. stored twice   Direction and label/key as column prefix   Use slice predicate for quick retrieval
  • 32. created USER edited opened pushed COMMENT PAGE on ISSUE COMMIT on on to in REPOSITORY
  • 33. Defining Property Keys gremlin> g.makeType().name(‘username’).! ! ! ! dataType(String.class).! ! ! ! functional().! ! ! ! indexed().unique().! ! ! ! makePropertyKey()! gremlin> g.makeType().name(‘time’).! ! ! ! dataType(Long.class).! ! ! ! functional().makePropertyKey()!
  • 34. Defining Edge Labels gremlin> g.makeType().name(‘on’).! ! ! ! makeEdgeLabel()! gremlin> g.makeType().name(‘pushed’).! ! ! ! primaryKey(time).! ! ! ! makeEdgeLabel()! gremlin> g.makeType().name(‘in’).! ! ! ! unidirected().! ! ! ! makeEdgeLabel()!
  • 35. Create & Retrieve gremlin> v = g.addVertex([username: ‘okram’])! ==>v[4]! gremlin> v.map! ==>{username=okram}! gremlin> g.V('username','okram')! ==>v[4]!
  • 36. Titan Locking   Locking ensures consistency when it is needed name : Hercules 5   Titan uses time stamped consistent reads and writes 9 on separate CFs for locking   Uses name :   Property uniqueness: .unique() name : Hercules Jupiter   Functional edges: .functional() father   Global ID management x name : father Pluto
  • 37. Titan Indexing   Vertices can be retrieved by property key + value name : Hercules 5   Titan maintains index in a separate column family as name : Jupiter 9 graph is updated   Only need to define a property key as .index()
  • 38. Basic Queries gremlin> v.out(‘pushed’)! gremlin> v.out(‘pushed’).out(‘to’).name! gremlin> v.out(‘pushed’).out(‘to’).dedup.name! gremlin> v.out(‘pushed’).out(‘to’).dedup.! ! ! ! name.sort{it}! gremlin> v.outE(‘pushed’).has(‘time’,T.gt,1000).inV!
  • 39. Basic Queries gremlin> v.out(‘pushed’)! gremlin> v.out(‘pushed’).out(‘to’).name! gremlin> v.out(‘pushed’).out(‘to’).dedup.name! gremlin> v.out(‘pushed’).out(‘to’).dedup.! ! ! ! name.sort{it}! gremlin> v.outE(‘pushed’).has(‘time’,T.gt,1000).inV! Query Optimization
  • 40. Vertex-Centric Indices   Sort and index edges per vertex by primary key   Primary key can be composite   Enables efficient focused traversals   Only retrieve edges that matter   Uses push down predicates for quick, index-driven retrieval
  • 41. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 father fought fought
  • 42. battled battled battled time: 1 time: 3 time: 5 mother battled v v.query()! time: 9 .direction(OUT)! father
  • 43. battled battled battled time: 1 time: 3 time: 5 battled v v.query()! time: 9 .direction(OUT)! .labels(‘battled’)!
  • 44. battled battled time: 1 time: 3 v v.query()! .direction(OUT)! .labels(‘battled’)! .has(‘time,T.lt,5)!
  • 45. Recommendation Engine gremlin> v.out('pushed').out('to')[0..9].! ! ! ! in('to').in('pushed')[0..500].! ! ! ! except([v]).name.! ! ! ! groupCount.cap.next().sort{-it.value}[0..4]!
  • 46. Recommendation Engine gremlin> v.out('pushed').out('to')[0..9].! ! ! ! in('to').in('pushed')[0..500].! ! ! ! except([v]).name.! ! ! ! groupCount.cap.next().sort{-it.value}[0..4]! v = g.V(‘username’,’okram’):! ==>lvca=175! ==>spmallette=56! ==>sgomezvillamor=36! ==>mbroecheler=33! ==>joshsh=20!
  • 47. Recommendation Engine gremlin> v.out('pushed').out('to')[0..9].! ! ! ! in('to').in('pushed')[0..500].! ! ! ! except([v]).name.! ! ! ! groupCount.cap.next().sort{-it.value}[0..4]! v = g.V(‘username’,’torvalds’):! ==>iksaif=90! ==>rjwysocki=22! ==>kernel-digger=20! ==>giuseppecalderaro=16! ==>groeck=15!
  • 48. Titan Embedding   Rexster RexPro   lightweight Gremlin Server   based on Grizzly   Titan Gremlin Engine   Embedded Storage Backend   in-JVM method calls
  • 49. Graph Partitioning Goal: Vertex Co-location   Titan maintains multiple ID Pools   Ordered Partitioner in Storage Backend   Dynamically determines optimal partition and allocates corresponding ID Pool IDs
  • 50. What’s coming   Full-text indexing   external index system integration   Bulk Loading   integration with storage backend utilities and Hadoop ingestion   240 Billion Edge Benchmark   performance analysis and improvements across the entire stack
  • 51. V Faunus Graph Analytics AURELIUS THINKAURELIUS.COM
  • 52. Faunus Features   Hadoop-based Graph Computing Framework   Graph Analytics   Breadth-first Traversals   Global Graph Computations   Batch Big Graph Data
  • 54. Faunus Work Flow g.V.out .out .count() hdfs://user/ubuntu/ output/job-0/ output/job-1/ graph* output/job-2/ { sideeffect* Compressed HDFS Graphs   stored in sequence files   variable length encoding   prefix compression
  • 55. Faunus Setup $ bin/gremlin.sh ! ,,,/! (o o)! -----oOOo-(_)-oOOo-----! gremlin> g = FaunusFactory.open('bin/titan-hbase.properties')! ==>faunusgraph[titanhbaseinputformat]! gremlin> g.getProperties()! ==>faunus.graph.input.format=com.thinkaurelius.faunus.formats.titan.hbase.TitanHBaseInputFormat ==>faunus.graph.output.format=org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat! ==>faunus.sideeffect.output.format=org.apache.hadoop.mapreduce.lib.output.TextOutputFormat! ==>faunus.output.location=dbpedia! ==>faunus.output.location.overwrite=true! gremlin> g._() ! 12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Compiled to 1 MapReduce job(s)! 12/11/09 15:17:45 INFO mapreduce.FaunusCompiler: Executing job 1 out of 1: MapSequence[com.thinkaurelius.faunus.mapreduce.transform.IdentityMap.Map]! 12/11/09 15:17:50 INFO mapred.JobClient: Running job: job_201211081058_0003!
  • 56. Graph Analytics gremlin> g.E.has('label',’followed').keep.! ! ! !V.sideEffect('{it.degree = it.outE.count()}').! ! ! !degree.groupCount! gremlin> g.E.has('label','pushed').keep.! ! ! !V.sideEffect('{it.degree = it.outE.count()}').! ! ! !degree.groupCount!
  • 58. Follow Degree Distribution P(k) ~ k-γ γ = 2.2
  • 60. Global Recommendations gremlin> g.E.has('label','pushed','to').keep.! ! ! !V.out('pushed').out('to').! ! ! !in('to').in('pushed').! ! ! !sideEffect('{it.score =it.pathCounter}').! ! ! !score.order(F.decr,'name')! # Top 5:! Jippi ! ! ! !60892182927! garbear ! ! !30095282886! FakeHeal ! ! !30038040349! brianchandotcom !24684133382! nyarla ! ! !15230275746!
  • 61. What’s coming   Faunus 0.1   Bulk Loading   loaded graph into Titan   loading derivations into Titan   Extending Gremlin Support   currently only a subset is of Gremlin implemented   Operational Tools
  • 63. II Graph = Agile Data Model
  • 64. III Graph = Algebraic Data Model
  • 65. Aurelius Graph Cluster Apache 2 Map/Reduce Load & Compress Analysis results back into Titan Stores a massive-scale Batch processing of large Runs global graph algorithms property graph allowing real- graphs with Hadoop on large, compressed, time traversals and updates in-memory graphs
  • 66. Speed of Traversal/Process The Graph Landscape Illustration only, not to scale Size of Graph
  • 68. Thanks! Vadas Gintautas Marko Rodriguez @vadasg @twarko Stephen Mallette Daniel LaRocque @spmallette AURELIUS THINKAURELIUS.COM
  • 70. XVX Benchmark Results AURELIUS THINKAURELIUS.COM
  • 71. XVX - I Titan Performance Evaluation on Twitter-like Benchmark AURELIUS THINKAURELIUS.COM
  • 72. Twitter Benchmark   1.47 billion followship edges and 41.7 million users   Loaded into Titan using BatchGraph   Twitter in 2009, crawled by Kwak et. al   4 Transaction Types   Create Account (1%)   Publish tweet (15%)   Read stream (76%)   Recommendation (8%)   Follow recommended user (30%) Kwak, H., Lee, C., Park, H., Moon, S., “What is Twitter, a Social Network or a News Media?,” World Wide Web Conference, 2010.
  • 73. Benchmark Setup   6 cc1.4xl Cassandra nodes   in one placement group   Cassandra 1.10   40 m1.small worker machines   repeatedly running transactions   simulating servers handling user requests   EC2 cost: $11/hour
  • 74. Benchmark Results Transaction Type Number of tx Mean tx time Std of tx time Create account 379,019 115.15 ms 5.88 ms Publish tweet 7,580,995 18.45 ms 6.34 ms Read stream 37,936,184 6.29 ms 1.62 ms Recommendation 3,793,863 67.65 ms 13.89 ms Total 49,690,061 Runtime 2.3 hours 5,900 tx/sec
  • 75. Peak Load Results Transaction Type Number of tx Mean tx time Std of tx time Create account 374,860 172.74 ms 10.52 ms Publish tweet 7,517,667 70.07 ms 19.43 ms Read stream 37,618,648 24.40 ms 3.18 ms Recommendation 3,758,266 229.83 ms 29.08 ms Total 49,269,441 Runtime 1.3 hours 10,200 tx/sec
  • 76. Benchmark Conclusion Titan   can   handle   10s   of   thousands   of   concurrent   users   with   short   response   5mes   even   for   complex   traversals   on   a   simulated   social   networking   applica5on  based  on  real-­‐world  network  data  with   billions  of  edges  and  millions  of  users  in  a  standard   EC2  deployment.   For  more  informa5on  on  the  benchmark:   hDp://thinkaurelius.com/2012/08/06/5tan-­‐provides-­‐real-­‐5me-­‐big-­‐graph-­‐ data/