Real-Time stream computation on
graphs using Storm, Neo4j and
Python
Sonal Raj
http://www.sonalraj.com
Presented at Pycon ...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Introduction
2
• With data multiplying each day, storage and
knowled...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
. . In this Talk
3
• A look at storm as a distributed
computation Fr...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Disclaimer !
4
• This talk presents an overview of Storm and
Neo4J ....
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
5
Part -1
Storm – The Hadoop
of Real Time
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Don’t we have Hadoop ?
6
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
7
STORM
HADOOP
• Distributed
Processing
• Fault Tol...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
8
HADOOP
• Large but Finite Jobs
• Processes a Lot ...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
9
HADOOP
• Large but Finite Jobs
• Processes a Lot ...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So, what Storm gives us . .
10
 Real-Time Computations
 Guaranteed...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
11
Streams
Tuple Tuple Tuple Tuple Tupl...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
12
Streams
Tuple Tuple Tuple Tuple Tupl...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
13
Spouts
A source of Streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
14
Spouts
A source of Streams
But, what...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
15
Bolts
Computational units processing...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
16
Bolts
Computational units processing...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
17
Topologies
A network of spouts and b...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Is that it . . . ?
18
Tasks and Parallelism
A spout or bolt can exec...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
19
[ ]Mr. Tuple
O Shoot, where
do I go now?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Groupings . . To the rescue of Mr. Tuple !
20
• Shuffle Grouping #pi...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
21
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
S...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
22
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
S...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
23
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
S...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Salient Features . .
24
• Storm > 0.7 supports Transactional Topolog...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
25
Part -2
Neo4J – “Get Graphed”
26
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
This is how
Graph Data was
represented in
RDBMS.
27
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
ENTER, NOSQL DATABASES
28
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Types of NOSQL Databases
Graph
databases
Document
databases
Colum...
29
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Why NOSQL Databases
• Easily horizontally scalable
• Dynamic Sche...
30
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Property Graph Model of Graph Databases
• Core Abstractions
...
31
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J
• Fully ACID with rollbacks support (unbelievable!)
• Schem...
32
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J Pythonized !
• Py2Neo is an excellent binding for Neo4J
• A...
33
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So,Will Relational databases be Extinct ?
OOPS!
34
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Categories of Graphical Data
• Social Networks
• Citations
• Prod...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
35
Part -3
Get your hands dirty !
36
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• Sample Social Network data set
• Data Includes peopl...
37
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• “friendship-index”
 n = Through how many people is
...
38
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Topology . .
Update
Spout
Update
Bolt
Query
Spout Query
Bolt
...
39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Define what kind of tuples
are emitted
41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Gets and emits tuple streams
42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
Objects for database access
and indexing service
44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
The tuple to be emitted
can contain multiple
entities.
47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend and
requested friend ids
50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend
and requested friend
ids as per ...
51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Import all spout and
bolt files
53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Unfortunately,There was no option in
Petrel to tur...
54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Topology.yaml
Configurations to the topology are
specified in this...
55
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little More . .
Update
Spout
Update
Bolt
Query
Spout Query
Bolt...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
56
Final Thoughts
• A Storm-Neo4j framework is a boon for real-time
...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
57
…to play with Storm and Neo4J
• My PyCon Talk Repo – slides, code...
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
58
Ex-terminated . . .
- That’s it
- Thanks for Listening !
- Questi...
Upcoming SlideShare
Loading in...5
×

Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013

10,558
-1

Published on

This talk briefly outlines the Storm framework and Neo4J graph database, and how to compositely use them to perform computations on complex graphs in Python using the Petrel and Py2neo packages. This talk was given at PyCon India 2013.

Published in: Technology, Business

Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013

  1. 1. Real-Time stream computation on graphs using Storm, Neo4j and Python Sonal Raj http://www.sonalraj.com Presented at Pycon India 2013 Bangalore, India Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 1
  2. 2. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Introduction 2 • With data multiplying each day, storage and knowledge extraction is a major concern. • Social Data Analysis, Business Intelligence • Constraints of Real Time and Fault-Tolerant Processing
  3. 3. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com . . In this Talk 3 • A look at storm as a distributed computation Framework • Neo4J as a NoSQL graph database • Some Cool Pictures • What are we trying to achieve ?
  4. 4. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Disclaimer ! 4 • This talk presents an overview of Storm and Neo4J . . Less dirty details  • I’m going to go pretty fast . . . Please hang on.
  5. 5. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 5 Part -1 Storm – The Hadoop of Real Time
  6. 6. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Don’t we have Hadoop ? 6
  7. 7. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 7 STORM HADOOP • Distributed Processing • Fault Tolerance
  8. 8. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 8 HADOOP • Large but Finite Jobs • Processes a Lot of Data at Once • High Latency
  9. 9. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 9 HADOOP • Large but Finite Jobs • Processes a Lot of Data at Once • High Latency Storm Infinite Computations called Topologies Process Infinite Streams of data one-tuple-at-a-time Low Latency
  10. 10. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com So, what Storm gives us . . 10  Real-Time Computations  Guaranteed data Processing  Horizontal Scalability and Fault-Tolerance  No intermediate message Brokers  Higher Abstraction than Message Passing, so makes sense !
  11. 11. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 11 Streams Tuple Tuple Tuple Tuple Tuple An unbounded sequence of Tuples
  12. 12. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 12 Streams Tuple Tuple Tuple Tuple Tuple An unbounded sequence of Tuples So, what kind of a tuple is this ?
  13. 13. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 13 Spouts A source of Streams
  14. 14. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 14 Spouts A source of Streams But, what is the source FOR the spouts ?
  15. 15. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 15 Bolts Computational units processing input streams and producing new streams
  16. 16. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 16 Bolts Computational units processing input streams and producing new streams Just 1 stream ?
  17. 17. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 17 Topologies A network of spouts and bolts
  18. 18. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Is that it . . . ? 18 Tasks and Parallelism A spout or bolt can execute multiple tasks across the cluster
  19. 19. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 19 [ ]Mr. Tuple O Shoot, where do I go now?
  20. 20. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Groupings . . To the rescue of Mr. Tuple ! 20 • Shuffle Grouping #pick a random task • Fields Grouping #mod hashing on a subset of tuple fields • All Grouping #sends to all tasks • Global Grouping #picks task with lowest task id
  21. 21. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 21 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR
  22. 22. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 22 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR If this were Hadoop Job Tracker Task Tracker
  23. 23. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 23 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR But it’s NOT Hadoop ! Co-ordinates Everything
  24. 24. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Salient Features . . 24 • Storm > 0.7 supports Transactional Topologies  Processes small batches of topologies  If failure during commit, both batch+commit is retried • Storm guarantees message Processing using acknowledgements • Petrel by AirSage is a python wrapper for Storm ; you can write and submit topologies in Python.
  25. 25. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 25 Part -2 Neo4J – “Get Graphed”
  26. 26. 26 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com This is how Graph Data was represented in RDBMS.
  27. 27. 27 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com ENTER, NOSQL DATABASES
  28. 28. 28 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Types of NOSQL Databases Graph databases Document databases Column- Family Key-Value Stores Data Complexity DataSize
  29. 29. 29 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Why NOSQL Databases • Easily horizontally scalable • Dynamic Schemas, Handle Unstructured data really well. • Excel in speed and volume • Trade off in consistency for efficiency (except in graph databases . . .We’ll see why  ) • Pleasure to code • Free to use any query language ( even SQL ! ) • Downtime? What Downtime ?
  30. 30. 30 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com The Property Graph Model of Graph Databases • Core Abstractions  Nodes  Relationship between Nodes  Properties of both • Traversal Framework High Performance Queries on connected datasets • Bindings REST, Gremlin, etc.
  31. 31. 31 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Neo4J • Fully ACID with rollbacks support (unbelievable!) • Schema-less and Efficient storage of Semi Structured Data • Fast deep traversal instead of slow SQL queries that span many table joins • Whiteboard Friendly • Very natural to express graph related problems with traversals (recommendation engine, shortest path etc..)
  32. 32. 32 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Neo4J Pythonized ! • Py2Neo is an excellent binding for Neo4J • Accesses Neo4J using it’s RESTful API • Still under development . . Features like labels yet to be included !
  33. 33. 33 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com So,Will Relational databases be Extinct ? OOPS!
  34. 34. 34 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Categories of Graphical Data • Social Networks • Citations • Product Co-Purchasing • Internet peer-to-peer • Road Network and Map Data • Web Graphs Excellent Source of Sample Graphical Data “ http://snap.Stanford.edu/data/ “
  35. 35. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 35 Part -3 Get your hands dirty !
  36. 36. 36 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A demo . . • Sample Social Network data set • Data Includes people signing up info, adding friends, unfriending etc. . . for a month’s activity • Neo4J  Store and Update the social data • Storm  Calculate “friendship-index”
  37. 37. 37 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A demo . . • “friendship-index”  n = Through how many people is person “A” connected to person “B”  Gives an idea of how close two people are !  Useful while searching friends on Social Networks ( something like friends of friends concept in facebook’s graph search )
  38. 38. 38 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com The Topology . . Update Spout Update Bolt Query Spout Query Bolt Source Source
  39. 39. 39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout
  40. 40. 40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout Define what kind of tuples are emitted
  41. 41. 41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout Gets and emits tuple streams
  42. 42. 42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt
  43. 43. 43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt Objects for database access and indexing service
  44. 44. 44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt
  45. 45. 45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Spout
  46. 46. 46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Spout The tuple to be emitted can contain multiple entities.
  47. 47. 47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt
  48. 48. 48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt
  49. 49. 49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt Retrieve caller friend and requested friend ids
  50. 50. 50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt Retrieve caller friend and requested friend ids as per database
  51. 51. 51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology
  52. 52. 52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology Import all spout and bolt files
  53. 53. 53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology Unfortunately,There was no option in Petrel to turn off console debug, so the console view is really messy.
  54. 54. 54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Topology.yaml Configurations to the topology are specified in this file
  55. 55. 55 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little More . . Update Spout Update Bolt Query Spout Query Bolt Source Source
  56. 56. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 56 Final Thoughts • A Storm-Neo4j framework is a boon for real-time graph computations • Quite flexible in Java, Python bindings and implementations still have a long way to go. • If you are an Admin or developer, Analyse your data and computing requirements before narrowing down on a framework.
  57. 57. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 57 …to play with Storm and Neo4J • My PyCon Talk Repo – slides, code skeletons, etc. http://www.sonalraj.com/neo-storm.html • Storm documentation (official) http://github.com/nathanmarz/storm • Storm Book http://www.amazon.com/Getting-Started-Storm-Jonathan- Leibiusky/dp/1449324010 • Deployment of storm on AWS http://github.com/nathanmarz/storm-deploy • Neo4J Documentation http://www.neo4j.org
  58. 58. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 58 Ex-terminated . . . - That’s it - Thanks for Listening ! - Questions
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×