Upcoming SlideShare
Loading in...5
×
 

Graph Databases in Python (PyCon Canada 2012)

on

  • 9,673 views

Since the irruption in the market of the NoSQL concept, graph databases have been traditionally designed to be used with Java or C. With some honorable exceptions, there isn't an easy way to manage ...

Since the irruption in the market of the NoSQL concept, graph databases have been traditionally designed to be used with Java or C. With some honorable exceptions, there isn't an easy way to manage graph databases from Python. In this talk, I will introduce you some of the tools that you can use today in order to work with those new challenging databases, from our favorite languge, Python.

Statistics

Views

Total Views
9,673
Views on SlideShare
9,139
Embed Views
534

Actions

Likes
23
Downloads
127
Comments
0

6 Embeds 534

http://www.scoop.it 429
https://twitter.com 69
http://www.twylah.com 31
http://tweetedtimes.com 2
http://www.linkedin.com 2
https://si0.twimg.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Graph Databases in Python (PyCon Canada 2012) Graph Databases in Python (PyCon Canada 2012) Presentation Transcript

  • GRAPH DATABASES IN PYTHON Javier de la Rosa @versae The CulturePlex Lab Western University, London, ON PyCon Canada 2012
  • WHO I AM● Javier de la Rosa● versae● versae● Computer Scientist and Humanist● CulturePlex Lab● CulturePlex Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 2
  • FIRST OF ALL“You do not really understand something unless you can explain it to your grandmother” – (Frequently attributed to) Richard Feynman Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 3
  • DATABASES (in the last 30 years)● Data in tables, rows and columns● Pretty basic mechanism to make connections: – Primary keys, Foreign keys, and... thats all● Relational, ahem, really? Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 4
  • DATABASES (in the last 30 years)● Rigid data schemas – Have you ever tried to make a schema migration?● Relational Algebra and SQL – Terrible for highly interconnected data – JOINs can take a life to end (a bit overdramatized) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 5
  • NoSQL, Not Only SQL● Document ● Anaylitc – MongoDB, CouchDB, etc. – Hadoop● Key-value stores ● Graph – Redis, Riak, Voldemort, – Neo4j, OrientDB, Dynamo, etc. HyperGraphDB, Titan, etc.● Big Tables ● Other – Cassandra, Hbase, etc – Objectivity/DB, ZODB, etc. Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 6
  • DATABASES LANDSCAPE Source: 451Research, https://451research.com/report-long?icid=2289Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 7
  • WHO IS USING GRAPHS?● Mozilla with Pancake and Pacer – https://wiki.mozilla.org/Pancake & http://pangloss.github.com/pacer/● Twitter with FlockDB – https://github.com/twitter/flockdb● Facebook with Open Graph – https://developers.facebook.com/docs/opengraph/● Google with Knowledge Graph – http://www.google.ca/insidesearch/.../knowledge.html Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 8
  • WHY GRAPHS?● Data is getting more and more connected – From text documents, to wikis, to ontologies, to folksonomies, etc● And more semi-structured – Think about the decentralization of content generation● And more complex – Social networks, semantic trending, etc Source: Neo Technology, http://www.slideshare.net/emileifrem/neo4j-the-benefits-of-graph-databases-oscon-2009 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 9
  • A FEW OF THE CURRENT USES● Social Networking and Recommendations● Network and Cloud Management● Master Data Management● Geospatial● Bioinformatics● Content Management and Security and Access Control Source: Mashable, http://mashable.com/2012/09/26/graph-databases/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 10
  • AND WHY ELSE?● Because graphs are cool! Leonard Euler Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 11
  • WHAT IS A GRAPH?● G = (V, E) Where – G is a graph – V is a set of vertices – E is a set of edges Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 12
  • WHAT IS A GRAPH?● G = (V, E) – Graph, aka network, diagram, etc. – Vertex, aka point, dot, node, element, etc. – Edge, aka relationship, arc, line, link, etc.● Basically, “a graph states that something is related to something else” – Svetlana Sicular, Research Director at Gartner Source: Gartner, http://blogs.gartner.com/svetlana-sicular/think-graph/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 13
  • TYPES OF GRAPHUndirected Digraph Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 14
  • TYPES OF GRAPHMultigraph Hypergraph Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_(mathematics) Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 15
  • SOME GRAPHS EVEN HAVE A NAME● Complete graphs K3 K5 K8 Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 16
  • SOME GRAPHS EVEN HAVE A NAME● Stars The star graphs S3, S4, S5 and S6 Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 17
  • SOME GRAPHS EVEN HAVE A NAME● Snarks Blanuša (second) Szekeres Double star Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 18
  • THINGS CAN COMPLICATE... Local McLaughlin graph Source: Wikipedia, http://en.wikipedia.org/wiki/Gallery_of_named_graphs Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 19
  • WAIT A SEC,Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 20
  • DONT WORRY● Just one more type: the Property Graph 1 2 1 2 3 3 4 4 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 21
  • THE PROPERTY GRAPH● Directed, attributed and multi-relational Name: Javi 1 2 1 Knows Knows Since: 2009 Since:1990 2 3 3 Name: David Likes Name: John 4 Likes 4 Title: The Art of Computer Programming Price: $135 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 22
  • THE PROPERTY GRAPH● A set of nodes, and each node has: – An unique identifier. – A set of outgoing edges. – A set of incoming edges. – A collection of properties defined by a map from key to value.● A set of relationships, and each relationship has: – An unique identifier. – An outgoing tail vertex. – An incoming head vertex. – And a collection of properties defined by a map from key to value. Source: TinkerPop, https://github.com/tinkerpop/gremlin/wiki/Defining-a-Property-Graph Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 23
  • IN SHORT● A Property Graph is composed by: – A set of nodes – A set of relationships – Properties and ids on both● Sometimes, nodes and relationship can be typed – In Blueprints and Neo4j, a label denotes the type of relationship between its two nodes. Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 24
  • GRAPH DATABASES● A graph database uses graph structures with nodes, edges, and properties to represent and store data – ...but there is not an easy way to visualize this Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 25
  • HOW IT LOOKS IN PYTHON? Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 26
  • HOW IT LOOKS IN PYTHON?# Lets create a graph>>> silvester = g.nodes.create(name="Silvester") Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 27
  • HOW IT LOOKS IN PYTHON?# Lets create a graph>>> silvester = g.nodes.create(name="Silvester") Name: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 28
  • HOW IT LOOKS IN PYTHON?# Lets create a graph>>> silvester = g.nodes.create(name="Silvester")>>> arnold = g.nodes.create(name="Arnold") Name: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 29
  • HOW IT LOOKS IN PYTHON?# Lets create a graph>>> silvester = g.nodes.create(name="Silvester")>>> arnold = g.nodes.create(name="Arnold") Name: Silvester Name: Arnold Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 30
  • HOW IT LOOKS IN PYTHON?# Lets create a graph>>> silvester = g.nodes.create(name="Silvester")>>> arnold = g.nodes.create(name="Arnold")>>> punch = arnold.punches(silvester) Name: Silvester Name: Arnold Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 31
  • HOW IT LOOKS IN PYTHON?# Lets create a graph>>> silvester = g.nodes.create(name="Silvester")>>> arnold = g.nodes.create(name="Arnold")>>> punch = arnold.punches(silvester) punches Name: Silvester Name: Arnold Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 32
  • HOW IT LOOKS IN PYTHON? punches Name: ArnoldName: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 33
  • HOW IT LOOKS IN PYTHON? >>> chuck = g.nodes.create(name="Chuck") punches Name: ArnoldName: Silvester Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 34
  • HOW IT LOOKS IN PYTHON? >>> chuck = g.nodes.create(name="Chuck") punches Name: ArnoldName: Silvester Name: Chuck Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 35
  • HOW IT LOOKS IN PYTHON? >>> chuck.dropkicks(silvester) >>> chuck.dropkicks(arnold) punches Name: ArnoldName: Silvester Name: Chuck Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 36
  • HOW IT LOOKS IN PYTHON? >>> chuck.dropkicks(silvester) >>> chuck.dropkicks(arnold) punches dropkicks Name: Arnold dropkicksName: Silvester Name: Chuck Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 37
  • GRAPH DATABASES LANDSCAPE Database Data Model Query Method License Python Binding Cypher, Gremlin, Native, Neo4j Property Graph GPL, AGPL Traversal Blueprints, REST Gremlin, OrientDB Property Graph Apache 2 Blueprints Traversal Typed HGQuery,HyperGraphDB LGPL Nope Hypergraph Traversal DEX Property Graph Traversal Commercial Blueprints Titan Property Graph Gremlin Apache 2 Blueprints AGPL, InfoGrid Property Graph Traversal Nope CommercialInfiniteGraph Property Graph Gremlin Commercial Nope Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 40
  • GRAPH DATABASES LANDSCAPEAnd more:– AffinityDB– YarcData uRiKA– Apache Giraph– Cassovary– StigDB– NuvolaBase– Pegasus– Microsoft Trinity– Sherlock– And so on Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 41
  • GRAPH DATABASES LANDSCAPE Database Data Model Query Method License Python Binding Cypher, Gremlin, Native, Neo4j Property Graph GPL, AGPL Traversal Blueprints, REST Gremlin, OrientDB Property Graph Apache 2 Blueprints Traversal Typed HGQuery,HyperGraphDB LGPL Nope Hypergraph Traversal DEX Property Graph Traversal Commercial Blueprints Titan Property Graph Gremlin Apache 2 Blueprints AGPL, InfoGrid Property Graph Traversal Nope CommercialInfiniteGraph Property Graph Gremlin Commercial Nope Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 42
  • GREMLIN, BLUEPRINTS, WAT?Let me introduce you the TinkerPop Stack Source:TinkerPop, http://www.tinkerpop.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 43
  • BLUEPRINTS AND REXSTER● Blueprints is a property graph model interface● Rexster is a server that exposes any Blueprints graph through REST Source:TinkerPop, http://www.tinkerpop.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 44
  • AND WHAT ABOUT PYTHON?● Options to connect to a Blueprints Graph Database OrientDB Neo4j bulbflow Blueprints API Rexster python-blueprints pyblueprints DEX Titan REST Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 45
  • BULBFLOW● Create >>> alice = g.vertices.create(name="Alice") >>> bob = g.vertices.create(name="Bob") >>> g.edges.create(alice, "knows", bob)● Get >>> alice = g.vertices.get(1) >>> bob = g.vertices.get(2)● Update >>> alice.age = 21 >>> alice.save()● Delete >>> alice.delete() Source: Bulbflow, http://bulbflow.com/docs/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 46
  • PYBLUEPRINTS● Create >>> alice = g.addVertex() >>> alice.setProperty("name", "Alice") >>> bob = g.addVertex() >>> bob.setProperty("name", "Bob") >>> g.addEdge(alice, bob, "knows")● Get >>> alice = g.getVertex(1) >>> bob = g.getVertex(2)● Update >>> alice.setProperty("age", 21)● Delete >>> g.removeVertex(alice.getId()) Source: PyBlueprints, https://github.com/escalant3/pyblueprints Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 47
  • BUT NEO4J HAS ITS OWN CLIENTS!● REST Clients for Neo4j neo4j-rest-client OrientDB Neo4j py2neo Blueprints API Rexster bulbflow python-blueprints DEX Titan pyblueprints REST Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 48
  • HOW CAN I LOOKUP?● An index is a data structure that supports the fast lookup of elements by some key/value pair Source: TinkerPop, https://github.com/tinkerpop/blueprints/wiki/Graph-Indices Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 49
  • INDICES● In Python bindings, are similar to dict – bulbflow # bulbflow creates auto indices to make easier basic lookups >>> nodes = g.vertices.index.lookup(name="Alice") >>> for node in nodes: ...: print vertex – PyBlueprints >>> index = g.getIndex("names", "vertex") >>> index.put("name", alice.getProperty("name"), alice) >>> nodes = index.get("name", "Alice") >>> for node in nodes: ...: print node Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 50
  • INDICES● Some Graph Databases provide full-text queries – bulbflow >>> nodes = g.vertices.index.query(name="ali*") >>> for node in nodes: ...: print node – PyBlueprints >>> index = g.getIndex("names", "vertex") >>> nodes = index.query("name", "ali*") >>> for node in nodes: ...: print node Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 51
  • ...MORE COMPLEX SEARCHS?“Without traversals [FlockDB] is only a persistedgraph. But not a graph database.” – Alex Popescu Source: myNoSQL, http://nosql.mypopescu.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 52
  • LETS TRAVERSE THE GRAPH!● “A graph traversal is the problem of visiting all the nodes in a graph in a particular manner” – A* search – Alpha-beta prunning – Breadth-First Search (BFS) – Depth-First Search (DFS) – Dijkstras algorithm – Floyd-Warshalls algortimth – Etc. Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_traversal Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 53
  • NEO4J TRAVERSAL API● Python-embedded (native Neo4j Python binding) >>> traverser = gdb.traversal() .relationships(knows).traverse(alice) # The graph is traversed as you loop through the result >>> for node in traverser.nodes: ...: print node● neo4j-rest-client >>> traverser = alice.traverse(types=[client.All.knows]) # The graph is traversed as you loop through the result >>> for node in traverser: ...: print node Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 54
  • BLUEPRINTS GREMLIN● Gremlin is a domain specific language for traversing property graphs – Defines how to do a query based on the graph structure >>> gremlin = g.extensions.GremlinPlugin.execute_script >>> params = {alice_id: alice.id} >>> script = "g.V(alice_id).out(knows)" >>> node = gremlin(script=script, params=params) >>> node == bob Source: TinkerPop Gremlin, https://github.com/tinkerpop/gremlin/wiki Source: Marko Rodríguez, The Graph Traversal Programmin Pattern, http://www.slideshare.net/slidarko/graph-windycitydb2010 Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 55
  • NEO4J CYPHER QUERY LANGUAGE● Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 56
  • NEO4J CYPHER QUERY LANGUAGE● Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL 1 2 label (1) -[:label]- (2) Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 57
  • NEO4J CYPHER QUERY LANGUAGE● Declarative graph query language – Expressive and efficient querying – Focused on expressing what to retrieve from a graph – Inspired by SQL – Pattern matching expressions from SPARQL 1 2 label START n=(1), m=(2) MATCH n-[r:label]-m RETURN r Source: Wikipedia, https://en.wikipedia.org/wiki/Graph_database Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 58
  • PY2NEO CYPHER HELPERS● Get or create elements >>> g.get_or_create_relationships( ...: (bob, "WORKS WITH", carol, {"since": 2004}), ...: (alice, "DISLIKES!", carol, {"reason": "youth"}), ...: (bob, "WORKS WITH", dave, {"since": 2009}), )● Get counts >>> nodes_count = g.get_node_count() >>> rels_count = g.get_relationship_count()● Delete >>> g.delete() Source: py2neo, http://py2neo.org/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 59
  • NEO4J-REST-CLIENT CYPHER HELPERS● Query casting >>> q = """start n=node(*) match n-[r:punchs]-() """ """return n, n.name, r, r.since""" >>> results = g.query(q, returns=(Node, unicode, Relationship, int))● Complex filtering lookups = ( Q("name", exact="Arnold") & (Q("surname", istartswith="swar") & ~Q("surname", iendswith="chenegger")) ) arnolds = g.nodes.filter(lookups) Source: neo4j-rest-client, https://github.com/versae/neo4j-rest-client Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 60
  • LETS PLAY!● Deploy Neo4j in Heroku or Amazon● Use one of the available clients Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 61
  • NEO4J HEROKU ADD-ON● Create a Heroku app and add the Neo4j add-on $ heroku apps:create pyconca $ heroku addons:add neo4j --app pyconca $ xdg-open `heroku config:get NEO4J_URL --app pyconca` $ export NEO4J_URL=`heroku config:get NEO4J_URL --app pyconca`● Create a virtualenv with neo4j-rest-client $ mkvirtualenv --no-site-packages pyconca $ workon pyconca $ pip install ipython neo4jrestclient $ ipython Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 62
  • NEO4J HEROKU ADD-ON● Run IPython and thats it! >>> import os >>> NEO4J_URL = os.environ["NEO4J_URL"] >>> from neo4jrestclient import client >>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data") >>> gdb.url Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 63
  • NEO4J HEROKU ADD-ON● Run IPython and thats it! >>> import os >>> NEO4J_URL = os.environ["NEO4J_URL"] >>> from neo4jrestclient import client >>> gdb = client.GraphDatabase(NEO4J_URL + "/db/data") >>> gdb.url Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 64
  • THANKS! Questions? Javier de la Rosa @versae The CulturePlex LabWestern University, London, ON PyCon Canada 2012
  • APPENDIX: DATA MODELS● neo4django – https://github.com/scholrly/neo4django● neomodel – https://github.com/robinedwards/neomodel● bulbflow models – http://bulbflow.com/quickstart/#models Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 66
  • APPENDIX: VISUALIZE YOUR GRAPH● Export somehow to .gexf for Gephi – http://gephi.org/● Use D3.js – http://d3js.org/● Use sigma.js – http://sigmajs.org/● Take a look on Max De Marzi work – http://maxdemarzi.com/category/visualization/● Use Sylva (for newbies) – http://www.sylvadb.com/ Graph Databases in Python, Javier de la Rosa, PyCon Canada, 2012 67