Overview 
• Why Graphs? 
• Order to complexity 
• Use cases – major players 
• Graphs & Adjacency Matrices 
• Tinkerpop Fr...
WHY GRAPHS?
Warren Weaver 
• 17th - 19th century 
• Problems of simplicity 
• How one element interacts with 
another 
• First half of...
Organisms
Knowledge Classification
Organizational Hierarchy
Neurology
Order to Complexity 
• Trees describe order 
• Linear (simple lineage) 
• Categorized 
• Single dimensional 
• Symmetrical...
Types of Networks
Types of Networks
Types of Networks
Types of Networks
Types of Networks
Types of Networks
Types of Networks
Types of Networks
-DFN.LUE 
DUO*DIIRUG 
3DXO5HLQPDQ 
6RO%URGVN 
/DUU/LHEHU 
%LOO(YHUHWW 
'LFN$HUV 
6WHYH'LWNR 
6DP5RVHQ 
5LFKDUG+RZHOO 
$O*R...
Types of Networks 
Neuron Network of Mouse Millennium Simulation (2005) 
Largest astronomical simulation ever on the struc...
Use Cases 
• Recommendation engines (avoid 
relational N-JOIN or self-JOIN) 
• Ranking/credibility (Google’s 
PageRank) 
•...
Graphs 
• Node/Verticy: An entity that can have zero or more edges 
connected to it. 
1 2 3 
• Edge: An entity which conne...
Adjacency Matrix 
• If graph is undirected, the adjacency matrix is symmetric 
• Thus, transposition of matrix is the same...
Adjacency Matrix 
• Some graphs have different ‘types’ or dimensions of edges
Property Graphs 
Attribute Value 
id 2 
name Bob 
Attribute Value 
id E3 
type knows 
since 2013-09-01 
Attribute Value 
i...
Traversals 
• Breadth-first 
• 3, 2, 4, 1 
• Depth-first 
• 3, 2, 1, 4 
• Breadth-first and 
depth-first search 
can be co...
TINKERPOP 
Graph Framework
Tinkerpop 
• A comprehensive, open-source graph framework 
(http://www.tinkerpop.com/) 
Property graph 
model that is DB 
...
Tinkerpop Stack 
• Different components all build 
on each other 
• Provides abstraction from 
HTTP layer, to object mappi...
Tinkerpop - Rexter 
• Provides REST and binary (RexPro - grizzly) protocols 
• Flexible extension model (e.g. ad-hoc Greml...
Tinkerpop - Furnace 
• Collection of industry-standard algorithms for 
traversing or analyzing graphs. 
• Network generato...
Tinkerpop - Frames 
More Information: https://github.com/tinkerpop/frames/wiki
Tinkerpop - Pipes 
• Dataflow framework for process graphs. 
• Computational step becomes a node and an edge is a 
communi...
Tinkerpop - Blueprints 
• Like JDBC but for graphs. 
• Common API for Property Graphs which are very flexible 
• Foundatio...
Tinkerpop - Gremlin 
• Graph traversal scripting language. 
• Works against Blueprints API and is “compiled” into 
Frames ...
SQL → Gremlin (secret decoder ring) 
Query SQL Gremlin 
Get all users select 
* 
from 
users 
g.V(‘type’, 
‘user’).map() 
...
SQL → Gremlin (secret decoder ring) 
Query SQL Gremlin 
Select by equality select 
* 
from 
users 
where 
age 
= 
35 
g.V(...
SQL → Gremlin (secret decoder ring) 
Query SQL Gremlin 
Join select 
users.* 
from 
users 
inner 
join 
groups 
on 
users....
Gremlin Resources 
• Tinkerpop resources 
• https://github.com/tinkerpop/gremlin/wiki/Basic-Graph-Traversals 
• https://gi...
GREMLIN 
Demo Dataset Lab
Tinkerpop - Gremlin 
gremlin 
g 
= 
TinkerGraphFactory.createTinkerGraph() 
==tinkergraph[vertices:6 
edges:6] 
gremlin 
g...
Tinkerpop - Gremlin 
// 
get 
verticies 
known 
by 
marko 
gremlin 
g.v(1).outE('knows').inV 
==v[2] 
==v[4] 
// 
get 
pro...
Tinkerpop - Gremlin 
// 
find 
edges 
with 
weight 
 
.5 
gremlin 
g.E.filter{it.weight 
 
0.5} 
==e[10][4-­‐created-­‐5] ...
Tinkerpop - Gremlin 
// 
add 
some 
new 
nodes 
gremlin 
g.addVertex([name:'bob',age:'60']) 
==v[0] 
gremlin 
g.addVertex(...
Tinkerpop - Gremlin 
// 
previously 
gremlin 
g.addVertex([name:'bob',age:'60']) 
==v[0] 
gremlin 
g.addVertex([name:'eve'...
TITAN 
A Distributed Graph Database
Titan Graph Database 
• Optimized to work against billions of nodes 
and edges 
• Theoretical limitation of 2^60 edges and...
Titan Distributed Architecture 
• TitanDB can integrate with distributed architectures in a 
few different ways 
Native Re...
Titan Indexing 
• Standard index 
• Internal to Titan 
• Very fast but only supports exact matches 
• External index 
• Us...
Distributed Titan Limitations/Gotchas 
• Limitations which are present but which are scheduled to 
be remedied 
• Property...
Titan Graph Database - Gremlin 
graph vertices edges properties 
G = (V , E , λ)
Titan Graph Database - Gremlin 
graph vertices edges properties 
G = (V , E , λ)
Titan Graph Database - Gremlin 
graph vertices edges properties 
G = (V , E , λ) 
Application
Titan Graph Database - Gremlin 
graph vertices edges properties 
G = (V , E , λ) 
Application
Titan Graph Database - Gremlin 
graph vertices edges properties 
G = (V , E , λ) 
Application
DATA MODELING 
EXAMPLE 
A Blogging Application
“Bloggie Blog” Requirements 
• Create users, posts, and comments 
• Retrieve all posts for a user 
• Retrieve posts by tim...
Get Cassandra  Titan 
• https://github.com/thinkaurelius/titan/wiki/Downloads (0.3.2 stable) 
$ 
$TITAN_LOCATION/bin/greml...
Modeling Entities (User, Post, Comment) 
• There’s no one way to model this. 
• General rules to follow: 
• 1-N relationsh...
Users, Posts, Comments
Retrieve User’s Posts 
• Let’s create a user and post 
• Link them together 
• Retrieve the user and their posts 
gremlin ...
Retrieve Posts by Time Range 
• Add timestamp property to post 
• Query by range 
gremlin 
g.V 
 
.has('guid','21EC2020-­‐...
Retrieve All User’s Comments 
• Add comment 
• Link to author and to post 
gremlin 
g.addVertex([ 
type: 
'comment', 
guid...
Retrieve top N posts by vote 
• Create “postVote” edge and 
aggregated votes count in post 
• Query and sort by votes 
gre...
Retrieve Post Comments Sorted by Vote 
• Similar to post votes 
gremlin 
g.addEdge(g.v(0), 
g.v(4), 
'commentVote', 
[date...
User Can Only Vote Once 
• Could enforce using external 
unique indexes 
• Or do 2-step incrementing in 
gremlin (small ch...
Graph Visualization
Areas Not Covered 
• Map/Reduce 
• Gremlin has its own built-in M/R API 
• Indexing 
• Titan currently has limitation requ...
References 
http://sql2gremlin.com/ 
http://www.tinkerpopbook.com/ - http://www.tinkerpop.com/ 
https://github.com/thinkau...
THANK YOU 
{ 
“email” : “calebjones@gmail.com”, 
“website” : “http://calebjones.info”, 
“twitter” : “@JonesWCaleb” 
}
Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin
Upcoming SlideShare
Loading in...5
×

Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin

13,787

Published on

A quick overview of the history, motivation, and uses of graph modeling and graph databases in various industries. Covers a brief introduction to graph databases with an emphasis on the Tinkerpop stack and Gremlin query language. These concepts are then solidified through a hands-on lab modeling a blog engine using Titan and Gremlin.

See more at http://allthingsgraphed.com.

Published in: Technology
2 Comments
58 Likes
Statistics
Notes
No Downloads
Views
Total Views
13,787
On Slideshare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
341
Comments
2
Likes
58
Embeds 0
No embeds

No notes for slide

Intro to Graph Databases Using Tinkerpop, TitanDB, and Gremlin

  1. 1. INTRO TO GRAPH DATABASES Using Tinkerpop, TitanDB, and Gremlin { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }
  2. 2. Overview • Why Graphs? • Order to complexity • Use cases – major players • Graphs & Adjacency Matrices • Tinkerpop Framework • Blueprints, Frames, Pipes, Furnace, Gremlin, Rexster • Titan using Cassandra • Blog Application (lab) • Traversals using Gremlin
  3. 3. WHY GRAPHS?
  4. 4. Warren Weaver • 17th - 19th century • Problems of simplicity • How one element interacts with another • First half of 20th century • Problem of disorganized complexity • Many elements operating in a system w/o regard to how they interact with each other • Predicted • Problem of organized complexity • Many elements operating in a system taking into account how they interact with each other • Would require computational power far beyond what was currently available Science and Complexity 1948 ENIAC (1946)
  5. 5. Organisms
  6. 6. Knowledge Classification
  7. 7. Organizational Hierarchy
  8. 8. Neurology
  9. 9. Order to Complexity • Trees describe order • Linear (simple lineage) • Categorized • Single dimensional • Symmetrical • Hierarchical • Convergent modeling • Networks describe complexity • Non-linear (multi-lineage) • Multi-categorical • Multi-dimensional • Asymmetrical • Decentralized • Divergent modeling
  10. 10. Types of Networks
  11. 11. Types of Networks
  12. 12. Types of Networks
  13. 13. Types of Networks
  14. 14. Types of Networks
  15. 15. Types of Networks
  16. 16. Types of Networks
  17. 17. Types of Networks
  18. 18. -DFN.LUE DUO*DIIRUG 3DXO5HLQPDQ 6RO%URGVN /DUU/LHEHU %LOO(YHUHWW 'LFN$HUV 6WHYH'LWNR 6DP5RVHQ 5LFKDUG+RZHOO $O*RUGRQ .HQ)HGXQLHLZLF] DUPLQH,QIDQWLQR )UDQN6SULQJHU 6WDQ/HH 0LNH(VSRVLWR -RH0DQHHO -RKQ7DUWDJOLRQH +HUE7ULPSH $O.XU]URN *DU)ULHGULFK -LP6WHUDQNR $UWLH6LPHN KLF6WRQH *HRUJH5RXVVRV 6HH1RWHV $OH[7RWK 9LQFHROOHWWD /3*UHJRU )UDQN*LDFRLD :HUQHU5RWK :DOO:R-RRGH6LQQRWW 5R7KRPDV -DFN6SDUOLQJ 'DQ$GNLQV *LO.DQH -HUU5)RHVVOG$PQDGQUX 'RQ+HFN -RVKXD0LGGOHWRQ 0LNH*XVWRYLFK %UHW%OHYLQV $OIUHGR$OFDOD %RE/DUNLQ (G+DQQLJDQ %RE+DOO 'DQQ)LQJHURWK *HUURQZD DUROH6HXOLQJ KULVWLH6FKHHOH -RKQ5RPLWD-5 -RKQ9HUSRRUWHQ *DVSDU6DODGLQR *HRUJH7XVND -RKQ%XVFHPD 0RUULH.XUDPRWR $QGDQFKXV ,UYLQJ:DWDQDEH -RH5RVHQ +HUERRSHU 1HDO$GDPV 0DULH6HYHULQ 3HWHU6DQGHUVRQ 5RJHU6WHUQ %DUU:LQGVRU6PLWK -HDQ6LPHN 'LDQ%DR$EOE%HXUVGLDQVN 7RQ'H0]XDQ*ULJH7DRLUWJXHV3HUH] %RE6KDUHQ 5LFDUGR9LOODPRQWH 'DQUHVSL 6DP*UDLQJHU $QGUHD+LOO 6HQHQ$QWRQLR *HRUJH*R]XP %LOO:UD $OH[-D $UPDQGR*LO -RKQ%ROWRQ )UDQN0LOOHU 0D[6FKHHOH +RZD0UGDU0FD6FXNPLHHUDN $UVLD5R]HJDU -RKQ:RUNPDQ 6FRWW:LOOLDPV KULV(OLRSRXORV 7RP3DOPHU /LQGD)LWH KULVODUHPRQW 6DO%XVFHPD 'HQQLV2QHLO 5RQ:LOVRQ %RE0F/HRG 'DYHRFNUXP 0LFKHOOH:ULJKWVRQ )UDQNKLDUDPRQWH 3KLO5DFKHOVRQ 7RP2U]HFKRZVNL /HQ:HLQ 3HWUD*ROGEHUJ .DUHQ0DQWOR %LOO0DQWOR 0DUY:ROIPDQ $QQHWWH.DZHFNL 'DYLG+XQW %UXFH3DWWHUVRQ 5LFK%XFNOHU 'RQ:DUILHOG $QQHWWH.DH -DQLFHRKHQ %RQQLH:LOIRUG -RKQRVWDQ]D %RE/DWRQ $UFKLH*RRGZLQ %RE%URZQ 7RP6XWWRQ -RKQ%UQH 'DQ*UHHQ 7HUU$XVWLQ 'HQLVH:RKO -LP6KRRWHU *OQLV2OLYHU )UDQFRLVH0RXO 0LFKDHO+LJJLQV 5LFN3DUNHU OHP5RELQV -LP1RYDN -LP6DOLFUXS /RXLVH-RQHV %UHQW$QGHUVRQ -RH5XELQVWHLQ -DQLFHKLDQJ %RE:LDFHN -LP6KHUPDQ 5RQ=DOPH %LOO6LHQNLHZLF] -DQLQHDVH 3DXO6PLWK /QQ9DUOH -DQLFHDVH 3DXO%HFWRQ :DOWHU6LPRQVRQ (OLRW%URZQ 6U $QQ1RFHQWL 6WHYH/HLDORKD 3DW%OHYLQV 5LFN/HRQDUGL -KDULHV/6HSHURXVH 7HUU.DYDQDJK $O:LOOLDPVRQ -XQH%ULJPDQ UDLJ5XVVHOO 'DYH0FDLJ 0HO5XEL .LD$VDPLD 63WXDGXLRO77XUWRURQQH 6WHYH.LP -G6PLWK 3KLOLS7DQ 5XV:RRWRQ KXFN$XVWHQ 6HDQ3KLOOLSV 6KDQQRQ%ODQFKDUG -RHDVH 'DQ1RUWRQ 'DYH/DQSKHDU (PHUVRQ0LUDQGD 0DWW%DQQLQJ 5RE-HQVHQ $ODQ'DYLV 3DXO1HDU $UW$GDPV %XWFK*XLFH 0DUF6LOYHVWUL 3HWUD6FRWHVH .HUU*DPPLO -HII0DWVXGD (OHFWULFUDRQ 'LJLWDOKDPHOHRQ 6HDQ3DUVRQV 6NRWWLHRXQJ -DVRQ.HLWK 'DYH6KDUSH (G%UXEDNHU $QWRQLR)DEHOD (GJDU'HOJDGR )UDQN' $UPDWD 'DQQ.0LNL 0LNH5DLFKW *HRUJHV-HDQW $Q7GUKHRZP3DHVS'RHUHQLFN 6DQGX)ORUHD 0DUN3RZHUV .HYLQ1RZODQ .HYLQ6RPHUV %RE+DUUDV %LOO-DDVND 'DUO(GHOPDQ *LQD*RLQJ -RH4XHVDGD 3ROO:DWVRQ HGULF1RFRQ 5RE/LHIHOG -LP/HH .HQW:LOOLDPV (ULN/DUVHQ 0LFKDHO+HLVOHU *UHJ:ULJKW .LHURQ'ZHU 10HOLNHRPRWROOYLQV 0LNH5RFNZLW] 6X]DQQH*DIIQH %UDG9DQFD3WDDW%URVVHDX $UW7KLEHUW 6WHYH%XFFHOODWR 7RPRNR6DLWR /RLV%XKDOLV .HYLQXQQLQJKDP -RH5RVDV .HQ%DUU -LP6WDUOLQ 0LFKDHO*ROGHQ .ODXV-DQVRQ /DUU36WURPDQ +LODU%DUWD )DELDQ1DFLH]D 5XULN7OHU 6WHYH%XWOHU /HH.:HHYHLQNVRQUDG %HQ5DDE -LP.UXHJHU 'DQD0RUHVKHDG 5LFKDUG6WDUNLQJV .HYLQ7LQVOH 7RP5DQH -DQ$QWRQ+DUSV %UDQGRQ3HWHUVRQ /LVD3DWULFN 0DUN3HQQLQJWRQ 0LNH7KRPDV 0DULH-DYLQV 'DQ3DQRVLDQ $O0LOJURP 5LFKDUG,VDQRYH -L6PDLKGHDXQ7JHPRIRQWH -RQ%DEFRFN 5LFKDUG%HQQHWW -D-QR'VHXX0UVDHU]PDDQ $O9H -RH0DGXUHLUD 7LP7RZQVHQG 6W0HYDHWW(+SLWFLQNVJ 6FRWW/REGHOO 0DUN:DLG %LOO2DNOH 5RQ*DUQH 7RP*UXPPHWW RPLFUDIW %UDQ+LWFK DP6PLWK DUORV3DFKHFR 3KLO+XJK)HOL[ -RQDWKDQ%DEFRFN $QWKRQ:LQQ 5RELQ5LJJV KULVWLDQ/LFKWQHU -RH3LPHQWHO -RH$QGUHDQL -HSK/RHE -J 7HDP%XFFH -RKQ'HOO 3DVTXDO)HUU .ROMD)XFKV KXFN'L[RQ -DVRQ/LHELJ 6DOYDGRU/DUURFD 0DUN%HUQDUGR $GDP.XEHUW 0DUN0RUDOHV -RVH/DGURQQ -XDQ9ODVFR +XPEHUWR5DPRV KULV%DFKDOR 'DQ%URZQ 6WHYH6HDJOH $QG6PLWK 0RQLFD.XELQD (G%HQHV KULV6RWRPDRU /LTXLG*UDSKLFV -RKQ:DWVRQ *UHJ/DQG -+:LOOLDPV 'DUUO%DQNV -RKQDVVDGD 7RPP/HH(GZDUGV .LHURQ*LOOHQ -D/HLVWHQ DQLFN3DTXHWWH 7RGG.OHLQ 7HUU'RGVRQ XOO+DPQHU 3DXO0RXQWV 6WHYH2OLII $OEHUW'HVFKHVQH -RVHSK+DUULV 5DQ-%RHHQ.MDHPOOLQ 6WHYH5XGH 5DOSK0DFFKLR /HLQLO)UDQFLVX 3HWH)UDQFR KULV'LFNH 2VFDU*RQ%JRULUDDQ0LOOHU 0LNH60LOOHU 0LFDKHO6WHZDUW 2SWLF6WXGLRV 5LFKDUG+RULH *OHQQ+HUGOLQJ *UHJJ6FKLJLHO -D)DHUEHU 6FRWW+DQQD 5LFKDUGDVH 6WHYH%HK0OLQLFJKDHO6WHZDUW /RXLVH6LPRQVRQ %ULDQ+DEHUOLQ *UDKDP1RODQ 0LNH6WHZDUW :HV$EERW 5DQGDOO*UHHQ *HUPDQ*DUFLD 0LFKDHO5DQ 7RP'HUHQLFN 1RUP5DSPXQG +L)L'HVLJQ ,DQKXUFKLOO $YDORQ6WXGLR /DUU6WXFNHU $VKOH:RRG (GGLHDPSEHOO 6HDQ3-KDLYOOLLSHVU3DXQOGLG0RDWW6PLWK $$ULDHUOR2QOL/YHRWSWLUHVWL 'DYLG)LQFK 0DWW0LOOD 0LNH0DUWV 6WHYH8 %LOO7DQ 0DUN)DUPHU 2OLYLHURLSHO KULVKXFNU $QG3DUN -RQDWKDQ6LEDO 3HWH0UL0NHLOOL$JODOUQHG 'H[WHU9LQHV (GJDU7DGHR -XVWLQ3RQVRU 5REHUW:HLQEHUJ -RH3UXHWW .DUO6WRU 5REE0F1DEE -DVRQ:ULJKW -LP0RRQH 'HDQ:KLWH )UDQNKR .HQQ/RSH] 'DYLG$QWKRQ.UDIW 6G6KRUHV (YHOQ6WHLQ -RKQ:DUQHU 9LFWRU2OD]DED 'DYLG0LFKHOLQLH %UXFH-RQHV +DUUDQGHODULR 7RP%UHYRRUW 7RP'H)DOFR RU3HWLW -RHDUDPDJQD KULVWLQD:HLU DUOR%DUEHUL 7RP0DQGUDNH 3HWHU,UR 0DUN%URRNV 6WHYH*HUEHU DUODRQZD %UHWW%UHHGLQJ ODWRQ+HQU KULVWLQD6WUDLQ 0DUN*UXHQZDOG 6WHYH(QJOHKDUW KULVWRV*DJH 3HWHU'DYLG $QWKRQDVWULOOR $QJHO0HGLQD 0LNH'HRGDWR -RKQ.DOLV] 5REHUWR$JXLUUH6DFDVD %REE-LDHHK*DDVUHGQHU 0LFKDHOKRL 6RQLD2EDFN :HVOH:RQJ %HUQDUGKDQJ 0DWW5DQ /DU6WXFNHU -RQ+ROGUHGJH :LOVRQ5DPRV 7DQD+RULH -DVRQ/HYLQH %HQ2OLYHU 0LNHDUH UDLJKU.0LVLONHHRV3WHUNLQV DUORV$OEHUWRFUX]XHYDV %ULDQ5HEHU 0DWW)UDFWLRQ :LOOLDP0HVVQHUORHEV 9DO6HPHLNV /DUU+DPD DUORV0RWD -RH%HQQHWW %XG/D5RVD -(3+25. 5DJV.0DUROU%DROHOOVHUV - %HQFKPDU+NL)3LURGRXORFXWLURQV 7RSRZ7EG 5DFKHO'RGVRQ $OODQ+HLQ3$EKGHLOLUJ-*LPUDHQQRHY] :+,/(3257$,2 1LFN/RZH ,6ELUPDLRPQH5R%ELDHQUVFRKQL Types of Networks 6FRWW(GHOPDQ 7RQ,VDEHOOD ,UHQH9DUWDQRII 'RQ1HZWRQ +RZDUG%HQGHU DUO*DIIRUG $QQHWWH.DZHFNL -RKQ7DUWDJOLRQH *HQH'D (UQHVW+DUW 3DEOR0DUFRV 3KLO5DFKHOVRQ )UDQNKLDUDPRQWH *DVSDU6DODGLQR *HRUJH5RXVVRV -RKQRVWDQ]D 6WHYH(QJOHKDUW %RE%URZQ -LP6WDUOLQ 0LNH(VSRVLWR *LO.DQH -RKQ5RPLWD-5 9LQFHROOHWWD (GJDU'HOJDGR DURO/D *HQHRODQ *HRUJH.OHLQ /LVD3DWULFN -LP0RRQH /LQGD/HVVPDQQ 5R7KRPDV 5RQ:LOVRQ 6WHYH'LWNR /DUU,YLH /DUU/LHEHU 0LNH3ORRJ 'RQ+HFN 3HWUD*ROGEHUJ -RKQ%XVFHPD -HDQ6LPHN %HQ6HDQ -DFN$EHO (PEHOOLVKHUV$VVHPEOHG %RE%XGLDQVN 'DYHRFNUXP )UDQN*LDFRLD $UYHOO-RQHV )UDQN0LOOHU -0'H0DWWHLV $ODQ.XSSHUEHUJ 3LWWVEXUJKRPLFVOXE 'DQ*UHHQ (G+DQQLJDQ -LP6DOLFUXS %RE+DOO -DFNV/LWWOH+HOSHUV 'DYLG0LFKHOLQLH 7RP2U]HFKRZVNL 6DO%XVFHPD -RH6WDWRQ -RH6LQQRWW /HQ:HLQ *HRUJH7XVND -HUU)HOGPDQ 7RP6XWWRQ 'DYLG'D KDUOHV1LFKRODV 0LFKDHO.HOOHKHU 5R7KRPDVDV--RQDK-DPHVRQ $UWLH6LPHN 6WDQ/HH KDUORWWH-HWWHU ,UYLQJ:DWDQDEH -DFN.LUE 'LFN$HUV 3DXO5HLQPDQ 6RO%URGVN KLF6WRQH 6DP5RVHQ :DOO:RRG 5D+ROORZD6KHULJDLO /3*UHJRU 0DULH6HYHULQ 1G7RODVW3DQHO 3URGXFWLRQ3DVWHXS 6DP*UDLQJHU +RZDUG3XUFHOO %DUU:LQGVRU6PLWK 6G6KRUHV )UDQN6SULQJHU -DQLFHKLDQJ .OH%DNHU %RE:LDFHN -RKQ:HOOLQJWRQ -DFN0RUHOOL +RZDUG0DFNLH :DOWHU6LPRQVRQ 7RQ'H]XQLJD *DU)LHOGV .HLWK:LOOLDPV 7RP3DOPHU %RE/DUNLQ %RQQLH:LOIRUG -LP6KHUPDQ $OIUHGR$OFDOD %RE0F/HRG -RH'HOEHDWR 3HWHU6DQGHUVRQ /RXLVH-RQHV 'DQQ)LQJHURWK +HUERRSHU 0LNH6WHYHQV 6KHOO/HIHUPDQ +DUODQ(OOLVRQ $ODQ1:HHDLOV$VGDPV %LOO(YHUHWW 5LFK%XFNOHU 'DQ$GNLQV KULVODUHPRQW *OQLV2OLYHU 'HQLVH:RKO )UDQN0F/DXJKOLQ 6WDQ*ROGEHUJ 'DYLG+XQW )UDQN%ROOH -HDQ,]]R -XQH%UDYHUPDQ %LOO0DQWOR -RH5RVHQ 7LWOH 0DUY:ROIPDQ *HRUJH3HUH] -DQLFHRKHQ 'DQUHVSL 'RQ:DUILHOG .HLWK3ROODUG $O0LOJURP +XJK3DOH 'XII9RKODQG $UFKLH*RRGZLQ -LP6KRRWHU *HUURQZD 'RF0DUWLQ 'DYLG$QWKRQ.UDIW 5RJHU6OLIHU -RKQ%UQH 7HUU$XVWLQ .HQ.ODF]DN .ODXV-DQVRQ )LQLVKHG$UW 'LYHUVH+DQGV -RH5XELQVWHLQ 1HORPWRY 5RJHU6WHUQ 'DYLG:HQ]HO %RE6KDUHQ 5LFN3DUNHU 5LFDUGR9LOODPRQWH 6WHYH*HUEHU 5XG1HEUHV DUPLQH,QIDQWLQR 7RP'H)DOFR $O*RUGRQ 0DULR6HQ )UDQFRLVH0RXO (ODLQH+HLQO %RE/DWRQ 'LDQD$OEHUV -LP1RYDN 6WHYHQ*UDQW 0DUN*UXHQZDOG KULVWLH6FKHHOH 6W'HYDHYH06LWLFPKRHQOOV 'RQ3HUOLQ 7RS2I3DJH %UHWW%UHHGLQJ *UHJ/DURFTXH $ODQ=HOHQHW] 06DDUON7%UDULSJDKQW L -RKQ%HDWW $QQ1RFHQWL $QG0XVKQVN -XOLDQQD)HUULWHU -RKQ0RUHOOL %U*LDDQUY*HDUYH -RKQ:RUNPDQ 0D[6FKHHOH 3DXO%HFWRQ .HQW:LOOLDPV UDLJ'%DUDUUVHIQLH$OGXFN -HII-RKQVRQ (YDQ6NROQLFN 3DWULFN2OOLIIH .DWKUQ%ROLQJHU *UHJDSXOOR -RKQ]RS 6WHSKHQ%-RQHV 5XULN7OHU %UDG9DQFDWD %LOO2DNOH 5DOSK0DFFKLR -LP5HGGLQJWRQ 0LFNH5LWWHU %RE+DUUDV (O/LRRWS%HU]RZQ 0DUF6LU .HQQ/RSH] 3DXO5DQ 7)LQH 0LNH5RFNZLW] )DELDQ1DFLH]D -DPHV)U 5LN/HYLQV KULV,Y 5HQHH:LWWHUVWDHWWHU -RH5RVDV (G/D]HOODUL %RE0DFNLH *OHQQ+HUGOLQJ KULV(OLRSRXORV )UHG)UHGHULFNV %UDG.-RFH 7HUU.DYDQDJK -RK/QXN6HWD5WHRPVVD +LJJLQV2DNOH 0LNH:LHULQJR /DUU+DPD +HUE7ULPSH 75RPHJ0JLRHU-JRDQQHV 6WHYH(SWLQJ 5RQ/LP 7LQVOH6FRWW 0RVVRII6NROQLN ((GP%LUH5QLHEVHUR '-4 )UDQN/RSH] 5DPRV -RH4XHVDGD 5-RQHV KDUOHV%DUQHWW 3DXO$EUDPV -HIIUH0RRUH 0DULDQQH/LJKWOH 0DOLEX 0DULH-DYLQV 0LFKDHO+HLVOHU -DQ'XXUVHPD *HR7I,RVPKHUZDWRHRVG 5RE7RNDU 6WHYH'XWUR .HYLQ.REDVLF *LQD*RLQJ 0LFKDHO+LJJLQV -RKQ/HZDQGRZVNL $ULDQH/HQVKRHN .RVVRII $QGU-HZ*3-DRTQXHHVWWH KULV0DWWKV .HYLQ:HVW :LOOLDP0HVVQHUORHEV -LP/H5HRE/LHIHOG DUORV09RDWDO6$HPQWHKLRNQV:LQQ -RH%HQQHWW -RKQ'HOO %XG/D5RVD /HQ.DPDQVNL 0:PDQ $OH[DQGURY )%HQHV )1DEMLRT/DJXQD 0LNH7KRPDV +6HXF]WDRQUQHR*OODD]IRIQH *)RUHUGGRHQULF3NXVUFHOO 0LNH*XVWRYLFK -LP+DOO %DEFRFN 'RQDOG+XGVRQ %HQ5D-DREKQ.DOLV] D0QDFUHLD'/DHEFDFWDUL 0LNH0DUWV 2YL+RQGUX -R*HUDQWD0YDLHOLHKUPL 6XVDQUHVSL 6WHZDUW-RKQVRQ 7RP*ULQGEHUJ 5LFK5DQNLQ 7LP']RQ %ROOHUV 0DUVKDOO (OOLH'H9LOOH 0LNH'HRGDWR -HII0DWVXGD .H7YLPLQR7WKLQVO'H]RQ -RH3LPHQWHO KULVWLDQ/LFKWQHU 3KLO+XJK)HOL[ -RH$QGUHDQL .ROMD)XFKV -HSK/RHE 6FRWW/REGHOO $QWKRQDVWULOOR 6WHYH%XFFHOODWR %REELHKDVH 0LNH.DQWHURYLFK 7RP%UHYRRUW .HYLQ6RPHUV $QJHO0HGLQD 6FRWW.REOLVK 0DUN:DLG $OODQ-'DRFQRE5VLFHRQ )UDQN5REELQV /HH(OLDV 6FRWW.ROLQV 36PLWK $OH[6FKRPEXUJ 6DP.DWR DUO3RWWV %LOO6LHQNLHZLF] 0DWW5DQ $QGDQFKXV 'HQQLV2QHLO 5DQG'/DRQIIQLF7LHKURPDV -HDQ0DUF/RIILFLHU %XWFK*XLFH 'DQ'D KULVWLH6FKHHOHDV0D[6FKHHOH *HUDUG-RQHV /DUU$OH[DQGHU +DUUD5QRGQHO)DUUHLRQ] +XGVRQ 'DQ3DQRVLDQ 'DQQ%XODQDGL .HOO3RDUYWH%VUHRVVHDX 'DYLG5RVV /LQGD)LDWHUODRQZD DUROH6HXOLQJ 0DUF6LOYHVWUL )UDQN7XUQHU :HUQHU5RWK %HY%HYHULGJH %UXFH3DWWHUVRQ -HII$FOLQ -RKQ9HUSRRUWHQ 0LFKDHO)OHLVKHU 6WHYH*DQ 3DW%URGHULFN .DUHQ50RDEQWORDURVHOOD .HUU*DPPLO 3HWUD6FRWHVH 0LNHDUOLQ *HR6UJHHQ*HQR$]$QXQGPWURHQDLR+LOO $UW$G5DLPFNV/HRQDUGL 6WHYH/HL3DODRXKOD6PLWK .HQ)5HLGFXKQDLUHGLZ+LFR]ZHOO 2WWR%LQGHU 9LQFH$ODVFLD $O$YLVRQ -RQDWKDQ%DEFRFN 7LP7RZQVHQG 3HWHU'DYLG 3DVTXDO)HUU -RH0DGXUHLUD 5REHUWR$JXLUUH6DFDVD $GDP.XEHUW ,DQKXUFKLOO RPLFUDIW 0DUN3RZHUV 6FRWW+DQQD 0DUN0DRPUD6OHPVLWK 5RELQ5LJJV -DH*DUGQHU 7HD5PLF%KDXUFGFH6WDUNLQJV -(3+25. RU6HGOPHLHU $OO7KXPEVUHDWLYH
  19. 19. Types of Networks Neuron Network of Mouse Millennium Simulation (2005) Largest astronomical simulation ever on the structure and evolution of galaxies in the universe. 25 TB of data and 20 million galaxies
  20. 20. Use Cases • Recommendation engines (avoid relational N-JOIN or self-JOIN) • Ranking/credibility (Google’s PageRank) • Path finding (shortest, longest, mutual friends) • Social (friendship, following, key connectors)
  21. 21. Graphs • Node/Verticy: An entity that can have zero or more edges connected to it. 1 2 3 • Edge: An entity which connects two nodes. May be directed or undirected 1 2 A B
  22. 22. Adjacency Matrix • If graph is undirected, the adjacency matrix is symmetric • Thus, transposition of matrix is the same graph
  23. 23. Adjacency Matrix • Some graphs have different ‘types’ or dimensions of edges
  24. 24. Property Graphs Attribute Value id 2 name Bob Attribute Value id E3 type knows since 2013-09-01 Attribute Value id 4 name Alice Attribute Value id 3 name Eve Attribute Value id E2 type knows since 2013-09-01 Attribute Value id E4 type sibling twins true Attribute Value id 1 name Ivan Attribute Value id E1 type cousin separation 1
  25. 25. Traversals • Breadth-first • 3, 2, 4, 1 • Depth-first • 3, 2, 1, 4 • Breadth-first and depth-first search can be combined. • Filtering • Ability to filter/sort paths in traversal • Aggregating • Ability to aggregate/count properties as traversal occurs and affect traversal with result of aggregation (e.g. power-grid load distr.) • Backtracking • Leave marker in traversal and come back to it when certain criteria is met in a lower step 1 2 3 4
  26. 26. TINKERPOP Graph Framework
  27. 27. Tinkerpop • A comprehensive, open-source graph framework (http://www.tinkerpop.com/) Property graph model that is DB agnostic. A kind of JDBC for graphs. Data flow API for processing graphs. Underlying component for graph traversals DSL for traversing property graphs. Implemented in JSR-223. Maps between domain objects and the graph’s nodes and edges. Like ORM for graphs. Collection of common graph analysis algorithms for property graphs. Exposes any blueprints graph via a uniform RESTful API. Blueprints Pipes Gremlin Frames Furnace Rexster
  28. 28. Tinkerpop Stack • Different components all build on each other • Provides abstraction from HTTP layer, to object mapping layer, to traversal scripting, to pluggable graph API • Blueprints underpins the stack making it all DB agnostic • Blueprints implementations: • Neo4j, Sail, OrientDB, Dex • *) Accumulo, ArangoDB, Bitsy, FluxGraph, FoundationDB, InfiniteGraph, MongoDB, Oracle- NoSQL, TitanDB * - Implemented by 3rd party
  29. 29. Tinkerpop - Rexter • Provides REST and binary (RexPro - grizzly) protocols • Flexible extension model (e.g. ad-hoc Gremlin queries) • Server-side stored procedures (Gremlin) • Browser-based interface (Dog House) • Command-line tool for interacting with API • Pluggable security • SPARQL plugin to work against Sail graphs (OpenRDF) • More information: https://github.com/tinkerpop/rexster/wiki
  30. 30. Tinkerpop - Furnace • Collection of industry-standard algorithms for traversing or analyzing graphs. • Network generators (by clique or degree distribution) • Search: A*, Breadth-first, Depth-first • Shortest path • Bellman-Ford (like Dijkstra’s but can handle neg. paths) • PageRank • Degree Distribution • More information: https://github.com/tinkerpop/furnace/wiki
  31. 31. Tinkerpop - Frames More Information: https://github.com/tinkerpop/frames/wiki
  32. 32. Tinkerpop - Pipes • Dataflow framework for process graphs. • Computational step becomes a node and an edge is a communication channel between steps. • Pipes are then chained and nested. • Custom pipes can be created. • Pipe types: • Transform – emit transformation of object • Dozens of different types of transforms • Filter – decide whether to include/exclude object in traversal • ~20 different types of filters • sideEffect – include object but produce side-effect from it • ~15 different types of sideEffects (e.g. group, count, table, tree) • Branch – decide which step to take next in traversal • Several different branching options
  33. 33. Tinkerpop - Blueprints • Like JDBC but for graphs. • Common API for Property Graphs which are very flexible • Foundational component for Pipes, Gremlin, Frames, Furnace, and Rexster • Supports transactions (if underlying DB engine does) • Multi-threaded transactions supported • Format readers/writers (GML, GraphML, GraphSON) • More Information: https://github.com/tinkerpop/blueprints/wiki
  34. 34. Tinkerpop - Gremlin • Graph traversal scripting language. • Works against Blueprints API and is “compiled” into Frames data-flows. • Both native Java and Groovy (JSR-223) supported. • Step library (https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps) • Transform – emit transformation of object • Dozens of different types of transforms • Filter – decide whether to include/exclude object in traversal • ~20 different types of filters • sideEffect – include object but produce side-effect from it • ~15 different types of sideEffects (e.g. group, count, table, tree) • Branch – decide which step to take next in traversal • Several different branching options
  35. 35. SQL → Gremlin (secret decoder ring) Query SQL Gremlin Get all users select * from users g.V(‘type’, ‘user’).map() Get user names select name from users g.V(‘type’, ‘user’).name Get user names/ages select name, age from users g.V(‘type’, ‘user’) .transform( { [ ‘name’ : it.getProperty(‘name’), ‘age’ : it.getProperty(‘age’) ] }) Get distinct user ages select distinct(age) from users g.V(‘type’, ‘user’) .age.dedup() Get oldest user select max(age) from users g.V(‘type’, ‘user’) .age.max()
  36. 36. SQL → Gremlin (secret decoder ring) Query SQL Gremlin Select by equality select * from users where age = 35 g.V(‘type’, ‘user’) .has(‘age’, 35).map() Select by comparison select * from users where age 21 g.V(‘type’, ‘user’) .has(‘age’, T.gt, 21) .map() Select by multiple criteria select * from users where sex = “M” and age 25 g.V(‘type’, ‘user’) .has(‘age’, T.gt, 25) .has(‘sex’, ‘M’) .map() Order by age (switch ‘a’ and ‘b’ to do asc) select * from users order by age desc g.V(‘type’, ‘user’).order({ it.b.getProperty(‘age’) = it.a.getProperty(‘age’) }).map() Paging select * from users order by age desc limit 5 offset 5 g.V(‘type’, ‘user’) .order({ it.b.getProperty(‘age’) = it.a.getProperty(‘age’) })[5..10].map()
  37. 37. SQL → Gremlin (secret decoder ring) Query SQL Gremlin Join select users.* from users inner join groups on users.gId = groups.id where groups.name = “devs” g.V(‘type’, ‘groups’) .has(‘name’, ‘dev’) .in(‘inGroup’).map() Join-on-join-on-join … SELECT TOP (5) [t14].[ProductName] FROM (SELECT COUNT(*) AS [value], [t13].[ProductName] FROM [customers] AS [t0] CROSS APPLY (SELECT [t9].[ProductName] FROM [orders] AS [t1] CROSS JOIN [order details] AS [t2] INNER JOIN [products] AS [t3] ON [t3].[ProductID] = [t2].[ProductID] CROSS JOIN [order details] AS [t4] INNER JOIN [orders] AS [t5] ON [t5].[OrderID] = [t4].[OrderID] LEFT JOIN [customers] AS [t6] ON [t6].[CustomerID] = [t5].[CustomerID] CROSS JOIN ([orders] AS [t7] CROSS JOIN [order details] AS [t8] INNER JOIN [products] AS [t9] ON [t9].[ProductID] = [t8].[ProductID]) WHERE NOT EXISTS(SELECT NULL AS [EMPTY] FROM [orders] AS [t10] CROSS JOIN [order details] AS [t11] INNER JOIN [products] AS [t12] ON [t12].[ProductID] = [t11].[ProductID] WHERE [t9].[ProductID] = [t12].[ProductID] AND [t10].[CustomerID] = [t0].[CustomerID] AND [t11].[OrderID] = [t10].[OrderID]) AND [t6].[CustomerID] [t0].[CustomerID] AND [t1].[CustomerID] = [t0].[CustomerID] AND [t2].[OrderID] = [t1].[OrderID] AND [t4].[ProductID] = [t3].[ProductID] AND [t7].[CustomerID] = [t6].[CustomerID] AND [t8].[OrderID] = [t7].[OrderID]) AS [t13] WHERE [t0].[CustomerID] = N'ALFKI' GROUP BY [t13].[ProductName]) AS [t14] ORDER BY [t14].[value] DESC g.V('customerId','ALFKI') .as('customer’) .out('ordered') .out('contains') .out('is') .as('products’) .in('is') .in('contains') .in('ordered') .except('customer’) .out('ordered') .out('contains') .out('is') .except('products’) .groupCount().cap() .orderMap(T.decr[0..5] .productName
  38. 38. Gremlin Resources • Tinkerpop resources • https://github.com/tinkerpop/gremlin/wiki/Basic-Graph-Traversals • https://github.com/tinkerpop/gremlin/wiki/Gremlin-Steps • https://github.com/tinkerpop/gremlin/wiki/Using-Gremlin-through-Java • https://groups.google.com/forum/#!forum/gremlin-users • https://github.com/tinkerpop/gremlin/wiki/SPARQL-vs.-Gremlin • http://markorodriguez.com/2011/08/03/on-the-nature-of-pipes/ • http://sql2gremlin.com/ • http://gremlindocs.com/ • Groovy • http://groovy.codehaus.org/Beginners+Tutorial • http://groovy.codehaus.org/Collections • Misc • http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html • http://markorodriguez.com/2011/06/15/graph-pattern-matching-with-gremlin-1-1/
  39. 39. GREMLIN Demo Dataset Lab
  40. 40. Tinkerpop - Gremlin gremlin g = TinkerGraphFactory.createTinkerGraph() ==tinkergraph[vertices:6 edges:6] gremlin g.V.count() ==6 gremlin g.E.count() ==6 gremlin g.v(1) ==v[1] gremlin g.v(1).map =={age=29, name=marko} gremlin g.v(1).outE ==e[7][1-­‐knows-­‐2] ==e[8][1-­‐knows-­‐4] ==e[9][1-­‐created-­‐3] gremlin g.v(1).outE('knows') ==e[7][1-­‐knows-­‐2] ==e[8][1-­‐knows-­‐4] gremlin g.v(1).outE('knows').map =={weight=0.5} =={weight=1.0}
  41. 41. Tinkerpop - Gremlin // get verticies known by marko gremlin g.v(1).outE('knows').inV ==v[2] ==v[4] // get properties of verticies known by marko gremlin g.v(1).outE('knows').inV.map =={age=27, name=vadas} =={age=32, name=josh} // filter by those older than 30 gremlin g.v(1).outE('knows').inV .filter{it.age 30}.map =={age=32, name=josh} // just get name gremlin g.v(1).outE('knows').inV .filter{it.age 30}.name ==josh // find nodes who ‘know’ someone older than 30 gremlin g.V.as('x').outE('knows').inV .has('age', T.gt, 30).back('x').map =={age=29, name=marko}
  42. 42. Tinkerpop - Gremlin // find edges with weight .5 gremlin g.E.filter{it.weight 0.5} ==e[10][4-­‐created-­‐5] ==e[8][1-­‐knows-­‐4] // find edges w/ weight .5 from marko gremlin g.E.filter{it.weight 0.5} .as('x').outV.has('name', T.eq, 'marko') .back('x') ==e[8][1-­‐knows-­‐4] // find nodes ‘created’ by other nodes gremlin g.V.as('x').inE('created') .back('x').map =={name=lop, lang=java} =={name=ripple, lang=java} gremlin g.E.filter{it.label == 'created'}.inV .dedup().map =={name=lop, lang=java} =={name=ripple, lang=java} // find nodes ‘created’ by more than 1 node gremlin g.E.filter{it.label == 'created'} .inV.groupCount().cap() =={v[3]=3, v[5]=1} // find nodes ‘created’ by marko’s friends gremlin g.v(1).outE('knows').inV .outE('created').inV.map =={name=ripple, lang=java} =={name=lop, lang=java}
  43. 43. Tinkerpop - Gremlin // add some new nodes gremlin g.addVertex([name:'bob',age:'60']) ==v[0] gremlin g.addVertex([name:'eve',age:'40']) ==v[7] gremlin g.addVertex([name:'timmy',age:'5']) ==v[8] // add some edges gremlin g.addEdge(g.v(0), g.v(7),'friend’) ==e[13][0-­‐friend-­‐7] gremlin g.addEdge(g.v(0), g.v(8),'child') ==e[14][0-­‐child-­‐8] gremlin g.V.filter{it.name == 'bob'} .outE('child').as('x').inV .filter{it.name == 'timmy'}.back('x') ==e[14][0-­‐child-­‐8] gremlin g.removeEdge(g.e(14)) ==null gremlin g.V.filter{it.name == 'bob'} .outE('child').as('x').inV .filter{it.name == 'timmy'}.back('x') // no results
  44. 44. Tinkerpop - Gremlin // previously gremlin g.addVertex([name:'bob',age:'60']) ==v[0] gremlin g.addVertex([name:'eve',age:'40']) ==v[7] gremlin g.addEdge(g.v(0), g.v(7),'friend') ==e[13][0-­‐friend-­‐7] // query for edge gremlin g.v(0).outE ==e[13][0-­‐friend-­‐7] // remove vertex (auto removes orphaned edge) gremlin g.removeVertex(g.v(7)) ==null gremlin g.v(0).outE // no results gremlin g.e(13) ==null
  45. 45. TITAN A Distributed Graph Database
  46. 46. Titan Graph Database • Optimized to work against billions of nodes and edges • Theoretical limitation of 2^60 edges and 1^60 nodes • Works with several different distributed DBs including Cassandra and HBase • Supports many concurrent users doing complex graph traversals simultaneously • Native integration with Tinkerpop stack • Supports integration with search technologies such as Lucene and Elasticsearch • Created by Thinkaurelius (http://thinkaurelius.com/)
  47. 47. Titan Distributed Architecture • TitanDB can integrate with distributed architectures in a few different ways Native Remote Embedded • Put Rexter in front to allow RESTful access • Connects remotely to cluster • Can scale size as far as cluster can • Possible processing bottleneck • TitanDB and Rexter run on each node in the cluster • Can run on same JVM • Considerable performance/scalability improvement • Connects remotely to cluster (or local) • Can scale size as far as cluster can • Native Titan API • Possible processing bottleneck
  48. 48. Titan Indexing • Standard index • Internal to Titan • Very fast but only supports exact matches • External index • Use indexing engine external to Titan (Lucene or Elasticsearch) • Supports range queries • Lucene • Limited to only one machine (small-sized datasets) • Also as richer set of search features (than Elasticsearch) • Elasticsearch • Distributed • Not as feature-filled as Lucene
  49. 49. Distributed Titan Limitations/Gotchas • Limitations which are present but which are scheduled to be remedied • Property indexes must be created before property is ever used • Unable to drop indices • Types cannot be changed once created • Gotchas • Multiple graphs on same backend requires specific configurations per graph • Ghost vertices – certain concurrency circumstances can leave traces of vertices. Recommendation is to allow this and periodically clean them up
  50. 50. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ)
  51. 51. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ)
  52. 52. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ) Application
  53. 53. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ) Application
  54. 54. Titan Graph Database - Gremlin graph vertices edges properties G = (V , E , λ) Application
  55. 55. DATA MODELING EXAMPLE A Blogging Application
  56. 56. “Bloggie Blog” Requirements • Create users, posts, and comments • Retrieve all posts for a user • Retrieve posts by time range • Retrieve all comments for a user • Retrieve all comments for a post, sorted by vote • Retrieve the top N posts, sorted by vote • User can only vote *once* on a post or comment
  57. 57. Get Cassandra Titan • https://github.com/thinkaurelius/titan/wiki/Downloads (0.3.2 stable) $ $TITAN_LOCATION/bin/gremlin.sh ,,,/ (o o) -­‐-­‐-­‐-­‐-­‐oOOo-­‐(_)-­‐oOOo-­‐-­‐-­‐-­‐-­‐ gremlin g = new TinkerGraph(); ==tinkergraph[vertices:0 edges:0] gremlin
  58. 58. Modeling Entities (User, Post, Comment) • There’s no one way to model this. • General rules to follow: • 1-N relationships can be modeled as one node with N edges pointing to other nodes • 1-1 relationships can be modeled as a simple edge between two nodes • M-N relationships are just more edges • It is important to categorize the different types of edges since many different types of edges will connect to a single node • Don’t shy away from attaching properties to edges. Remember that edges are just a query-able as nodes. • A common practice is to tend to model “actions” as edges and “actors”/”artifacts” as nodes • Denormalize to minimize traversals
  59. 59. Users, Posts, Comments
  60. 60. Retrieve User’s Posts • Let’s create a user and post • Link them together • Retrieve the user and their posts gremlin g.addVertex([ type: 'user', email: 'bob@test.com', name: 'Robert', password: 'asdf']) ==v[0] gremlin g.addVertex( [type: 'post', guid: '21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309D', title: 'Hello World', text: 'My first post!', userDisplayName: 'Bob']) ==v[1] gremlin g.addEdge(g.v(0), g.v(1), 'postAuthor') ==e[3][0-­‐postAuthor-­‐1] gremlin g.V.has('type', 'post').as('posts') .inE('postAuthor') .outV.has('email', 'bob@test.com') .back('posts').map() =={guid=21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309D, text=My first post!, title=Hello World, userDisplayName=Bob, type=post}
  61. 61. Retrieve Posts by Time Range • Add timestamp property to post • Query by range gremlin g.V .has('guid','21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309D') .has('type', 'post').sideEffect( {it.createTimestamp = 1383726500}); ==v[1] gremlin g.V .has('createTimestamp', T.gt, 1383726400) .has('createTimestamp', T.lt, 1383726600) .map() =={guid=21EC2020-­‐3AEA-­‐1069-­‐ A2DD-­‐08002B30309D, createTimestamp=1383726500, text=My first post!, title=Hello World, userDisplayName=Bob, type=post}
  62. 62. Retrieve All User’s Comments • Add comment • Link to author and to post gremlin g.addVertex([ type: 'comment', guid: '3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301', text: 'I like it!', userDisplayName: 'Sally', createTimestamp: 1383736500]) ==v[4] gremlin g.addEdge( g.v(1), g.v(4), 'postComment') ==e[5][1-­‐postComment-­‐4] gremlin g.addVertex([type: 'user', email: 'sally@test.com', name: 'Sally', password: 'qwerty']) ==v[6] gremlin g.addEdge(g.v(6), g.v(4), 'commentAuthor') ==e[7][6-­‐commentAuthor-­‐4] gremlin g.V.has('type', 'comment').as('comments') .inE('commentAuthor').outV.has( 'email', 'sally@test.com') .back('comments').map() =={guid=3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301, createTimestamp=1383736500, text=I like it!, userDisplayName=Sally, type=comment}
  63. 63. Retrieve top N posts by vote • Create “postVote” edge and aggregated votes count in post • Query and sort by votes gremlin g.addEdge(g.v(6), g.v(1), 'postVote', [date: 1383726600]) ==e[8][6-­‐postVote-­‐1] gremlin g.V.has('type','post').has('guid','21EC2 020-­‐3AEA-­‐1069-­‐ A2DD-­‐08002B30309D').sideEffect({it.votes = 1}) ==v[1] gremlin g.addVertex([ type: 'post', guid: '21EC2020-­‐3AEA-­‐1069-­‐A2DD-­‐08002B30309E', createTimestamp: 1383726600, title: 'Learning Gremlin', text: 'Gremlin is neat.', userDisplayName: 'Bob', votes: 2]) ==v[9] gremlin g.V('type', 'post').order({it.b.getProperty('votes') = it.a.getProperty('votes')}).transform({['title' : it.getProperty('title'), 'votes' : it.getProperty('votes')]})[0..5] =={title=Learning Gremlin, votes=2} =={title=Hello World, votes=1}
  64. 64. Retrieve Post Comments Sorted by Vote • Similar to post votes gremlin g.addEdge(g.v(0), g.v(4), 'commentVote', [date: 1383726700]) ==e[10][0-­‐commentVote-­‐4] gremlin g.V.has('type','comment').has('guid','3F 2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301').sid eEffect({it.votes = 1}) ==v[4] gremlin g.addVertex([ type: 'comment', guid: '3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3302', text: 'Thanks.', userDisplayName: 'Bob', createTimestamp: 1383736500]) ==v[11] gremlin g.addEdge(g.v(1), g.v(11), 'postComment') gremlin g.addEdge(g.v(0), g.v(11), 'commentAuthor') gremlin g.v(1).outE('postComment').inV.order({it.b.getProperty( 'votes') = it.a.getProperty('votes')}).map() =={guid=3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3301, createTimestamp=1383736500, text=I like it!, votes=1, userDisplayName=Sally, type=comment} =={guid=3F2504E0-­‐4F89-­‐11D3-­‐9A0C-­‐0305E82C3302, createTimestamp=1383736500, text=Thanks., userDisplayName=Bob, type=comment}
  65. 65. User Can Only Vote Once • Could enforce using external unique indexes • Or do 2-step incrementing in gremlin (small chance of dups) gremlin user = g.v(0); post = g.v(1); if (post.inE('postVote').outV.has( 'email', user.email).count() == 0) { g.addEdge(user, post, 'postVote', [date: new Date().getTime()]); if (post.getProperty('votes') != null){ post.votes++; } else { post.votes = 1; } } ==1 gremlin // same command above ==null
  66. 66. Graph Visualization
  67. 67. Areas Not Covered • Map/Reduce • Gremlin has its own built-in M/R API • Indexing • Titan currently has limitation requiring all indexes are created up-front • Integration with other backends • HBase, Oracle Berkeley DB, Hazelcast, Persistit • Detailed full-text search through external indexes • Graph analytics engine (Faunus) • Deep dive into gremlin query language and Groovy • Seriously, there’s a TON there.
  68. 68. References http://sql2gremlin.com/ http://www.tinkerpopbook.com/ - http://www.tinkerpop.com/ https://github.com/thinkaurelius/titan/wiki/Getting-Started https://groups.google.com/forum/#!forum/gremlin-users https://groups.google.com/forum/#!forum/aureliusgraphs http://thinkaurelius.com/
  69. 69. THANK YOU { “email” : “calebjones@gmail.com”, “website” : “http://calebjones.info”, “twitter” : “@JonesWCaleb” }

×