SlideShare a Scribd company logo
Real-Time stream computation on
graphs using Storm, Neo4j and
Python
Sonal Raj
http://www.sonalraj.com
Presented at Pycon India 2013
Bangalore, India
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
1
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Introduction
2
• With data multiplying each day, storage and
knowledge extraction is a major concern.
• Social Data Analysis, Business Intelligence
• Constraints of Real Time and Fault-Tolerant
Processing
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
. . In this Talk
3
• A look at storm as a distributed
computation Framework
• Neo4J as a NoSQL graph database
• Some Cool Pictures
• What are we trying to achieve ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Disclaimer !
4
• This talk presents an overview of Storm and
Neo4J . . Less dirty details 
• I’m going to go pretty fast . . . Please hang on.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
5
Part -1
Storm – The Hadoop
of Real Time
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Don’t we have Hadoop ?
6
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
7
STORM
HADOOP
• Distributed
Processing
• Fault Tolerance
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
8
HADOOP
• Large but Finite Jobs
• Processes a Lot of Data at Once
• High Latency
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Storm v/s Hadoop
9
HADOOP
• Large but Finite Jobs
• Processes a Lot of Data at Once
• High Latency
Storm
Infinite Computations called Topologies
Process Infinite Streams of data one-tuple-at-a-time
Low Latency
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So, what Storm gives us . .
10
 Real-Time Computations
 Guaranteed data Processing
 Horizontal Scalability and Fault-Tolerance
 No intermediate message Brokers
 Higher Abstraction than Message Passing, so makes
sense !
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
11
Streams
Tuple Tuple Tuple Tuple Tuple
An unbounded sequence of Tuples
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
12
Streams
Tuple Tuple Tuple Tuple Tuple
An unbounded sequence of Tuples
So, what kind of
a tuple is this ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
13
Spouts
A source of Streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
14
Spouts
A source of Streams
But, what is the
source FOR the
spouts ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
15
Bolts
Computational units processing input
streams and producing new streams
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
16
Bolts
Computational units processing input
streams and producing new streams
Just 1 stream ?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little deeper . . Concepts
17
Topologies
A network of spouts and bolts
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Is that it . . . ?
18
Tasks and Parallelism
A spout or bolt can execute
multiple tasks across the
cluster
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
19
[ ]Mr. Tuple
O Shoot, where
do I go now?
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Groupings . . To the rescue of Mr. Tuple !
20
• Shuffle Grouping #pick a random task
• Fields Grouping #mod hashing on a
subset of tuple fields
• All Grouping #sends to all tasks
• Global Grouping #picks task with lowest
task id
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
21
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
22
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
If this were Hadoop
Job Tracker
Task Tracker
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A Storm Cluster
23
NIMBUS
ZOOKEEPER
ZOOKEEPER
ZOOKEEPER
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
SUPERVISOR
But it’s NOT Hadoop !
Co-ordinates
Everything
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Salient Features . .
24
• Storm > 0.7 supports Transactional Topologies
 Processes small batches of topologies
 If failure during commit, both batch+commit is
retried
• Storm guarantees message Processing using
acknowledgements
• Petrel by AirSage is a python wrapper for
Storm ; you can write and submit topologies in
Python.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
25
Part -2
Neo4J – “Get Graphed”
26
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
This is how
Graph Data was
represented in
RDBMS.
27
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
ENTER, NOSQL DATABASES
28
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Types of NOSQL Databases
Graph
databases
Document
databases
Column-
Family
Key-Value
Stores
Data Complexity
DataSize
29
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Why NOSQL Databases
• Easily horizontally scalable
• Dynamic Schemas, Handle Unstructured data really
well.
• Excel in speed and volume
• Trade off in consistency for efficiency (except in
graph databases . . .We’ll see why  )
• Pleasure to code
• Free to use any query language ( even SQL ! )
• Downtime? What Downtime ?
30
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Property Graph Model of Graph Databases
• Core Abstractions
 Nodes
 Relationship between Nodes
 Properties of both
• Traversal Framework
High Performance Queries on connected datasets
• Bindings
REST, Gremlin, etc.
31
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J
• Fully ACID with rollbacks support (unbelievable!)
• Schema-less and Efficient storage of Semi Structured
Data
• Fast deep traversal instead of slow SQL queries that
span many table joins
• Whiteboard Friendly
• Very natural to express graph related problems with
traversals (recommendation engine, shortest path etc..)
32
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Neo4J Pythonized !
• Py2Neo is an excellent binding for Neo4J
• Accesses Neo4J using it’s RESTful API
• Still under development . . Features like labels yet to be
included !
33
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
So,Will Relational databases be Extinct ?
OOPS!
34
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Categories of Graphical Data
• Social Networks
• Citations
• Product Co-Purchasing
• Internet peer-to-peer
• Road Network and Map Data
• Web Graphs
Excellent Source of Sample Graphical Data
“ http://snap.Stanford.edu/data/ “
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
35
Part -3
Get your hands dirty !
36
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• Sample Social Network data set
• Data Includes people signing up info,
adding friends, unfriending etc. . . for a
month’s activity
• Neo4J
 Store and Update the social data
• Storm
 Calculate “friendship-index”
37
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A demo . .
• “friendship-index”
 n = Through how many people is
person “A” connected to person “B”
 Gives an idea of how close two people
are !
 Useful while searching friends on Social
Networks ( something like friends of friends concept
in facebook’s graph search )
38
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
The Topology . .
Update
Spout
Update
Bolt
Query
Spout Query
Bolt
Source
Source
39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Define what kind of tuples
are emitted
41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Spout
Gets and emits tuple streams
42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
Objects for database access
and indexing service
44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Update Bolt
45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Spout
The tuple to be emitted
can contain multiple
entities.
47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend and
requested friend ids
50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Query Bolt
Retrieve caller friend
and requested friend
ids as per database
51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Import all spout and
bolt files
53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Create Topology
Unfortunately,There was no option in
Petrel to turn off console debug, so the
console view is really messy.
54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
Topology.yaml
Configurations to the topology are
specified in this file
55
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
A little More . .
Update
Spout
Update
Bolt
Query
Spout Query
Bolt
Source
Source
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
56
Final Thoughts
• A Storm-Neo4j framework is a boon for real-time
graph computations
• Quite flexible in Java, Python bindings and
implementations still have a long way to go.
• If you are an Admin or developer, Analyse your data
and computing requirements before narrowing down
on a framework.
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
57
…to play with Storm and Neo4J
• My PyCon Talk Repo – slides, code skeletons,
etc.
http://www.sonalraj.com/neo-storm.html
• Storm documentation (official)
http://github.com/nathanmarz/storm
• Storm Book
http://www.amazon.com/Getting-Started-Storm-Jonathan-
Leibiusky/dp/1449324010
• Deployment of storm on AWS
http://github.com/nathanmarz/storm-deploy
• Neo4J Documentation
http://www.neo4j.org
Copyrights © 2013, Sonal Raj, http://www.sonalraj.com
58
Ex-terminated . . .
- That’s it
- Thanks for Listening !
- Questions

More Related Content

What's hot

Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
Uwe Printz
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
Dan Lynn
 
STORM
STORMSTORM
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopDataWorks Summit
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
Nati Shalom
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
Eugene Dvorkin
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
viirya
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
Robert Evans
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridDataWorks Summit
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
Sonal Raj
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
Shyam Raj
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with StormMariusz Gil
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
Dung Ngua
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Data Con LA
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
DataWorks Summit/Hadoop Summit
 
Tutorial Kafka-Storm
Tutorial Kafka-StormTutorial Kafka-Storm
Tutorial Kafka-Storm
Universidad de Santiago de Chile
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
DataWorks Summit/Hadoop Summit
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceP. Taylor Goetz
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
Humoyun Ahmedov
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
Md. Shamsur Rahim
 

What's hot (20)

Introduction to Twitter Storm
Introduction to Twitter StormIntroduction to Twitter Storm
Introduction to Twitter Storm
 
Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.Storm - As deep into real-time data processing as you can get in 30 minutes.
Storm - As deep into real-time data processing as you can get in 30 minutes.
 
STORM
STORMSTORM
STORM
 
Realtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and HadoopRealtime Analytics with Storm and Hadoop
Realtime Analytics with Storm and Hadoop
 
Real-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using StormReal-Time Big Data at In-Memory Speed, Using Storm
Real-Time Big Data at In-Memory Speed, Using Storm
 
Learning Stream Processing with Apache Storm
Learning Stream Processing with Apache StormLearning Stream Processing with Apache Storm
Learning Stream Processing with Apache Storm
 
Real-time Big Data Processing with Storm
Real-time Big Data Processing with StormReal-time Big Data Processing with Storm
Real-time Big Data Processing with Storm
 
Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)Scaling Apache Storm (Hadoop Summit 2015)
Scaling Apache Storm (Hadoop Summit 2015)
 
Multi-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop GridMulti-Tenant Storm Service on Hadoop Grid
Multi-Tenant Storm Service on Hadoop Grid
 
Storm Real Time Computation
Storm Real Time ComputationStorm Real Time Computation
Storm Real Time Computation
 
Storm presentation
Storm presentationStorm presentation
Storm presentation
 
Streams processing with Storm
Streams processing with StormStreams processing with Storm
Streams processing with Storm
 
Introduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & ExampleIntroduction to Apache Storm - Concept & Example
Introduction to Apache Storm - Concept & Example
 
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big AnalyticsReal time big data analytics with Storm by Ron Bodkin of Think Big Analytics
Real time big data analytics with Storm by Ron Bodkin of Think Big Analytics
 
Improved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as exampleImproved Reliable Streaming Processing: Apache Storm as example
Improved Reliable Streaming Processing: Apache Storm as example
 
Tutorial Kafka-Storm
Tutorial Kafka-StormTutorial Kafka-Storm
Tutorial Kafka-Storm
 
Resource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache StormResource Aware Scheduling in Apache Storm
Resource Aware Scheduling in Apache Storm
 
Cassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market SceinceCassandra and Storm at Health Market Sceince
Cassandra and Storm at Health Market Sceince
 
Apache Storm Internals
Apache Storm InternalsApache Storm Internals
Apache Storm Internals
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 

Similar to Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013

Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
rivetlogic
 
Social Content Management with MongoDB
Social Content Management with MongoDBSocial Content Management with MongoDB
Social Content Management with MongoDB
MongoDB
 
Logging & Docker - Season 2
Logging & Docker - Season 2Logging & Docker - Season 2
Logging & Docker - Season 2
Christian Beedgen
 
Introduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise MonitorIntroduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise MonitorMark Leith
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
jaxLondonConference
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoopChris Huang
 
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsGraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
Oracle Developers
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your Microservices
Marcus Hirt
 
Spring & messaging
Spring & messagingSpring & messaging
Spring & messaging
Artem Bilan
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for Docker
Christian Beedgen
 
Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017
Connor McDonald
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
Marco Gralike
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize Graphs
Jean Ihm
 
Pentest: footprinting & scan
Pentest: footprinting & scanPentest: footprinting & scan
Pentest: footprinting & scan
JUNIOR SORO
 
Jfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and ContainersJfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and Containers
Mika Rinne
 
What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?
Dan Jenkins
 
Full-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSFull-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWS
MongoDB
 
GraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereGraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster Everywhere
J On The Beach
 
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetCrafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Rohit Dhamija
 

Similar to Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013 (20)

Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
Morpheus Drive – A Simple File Sharing UI for Alfresco that Solves the Dropbo...
 
Social Content Management with MongoDB
Social Content Management with MongoDBSocial Content Management with MongoDB
Social Content Management with MongoDB
 
Logging & Docker - Season 2
Logging & Docker - Season 2Logging & Docker - Season 2
Logging & Docker - Season 2
 
Introduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise MonitorIntroduction to MySQL Enterprise Monitor
Introduction to MySQL Enterprise Monitor
 
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
The Java Virtual Machine is Over - The Polyglot VM is here - Marcus Lagergren...
 
Approaching real-time-hadoop
Approaching real-time-hadoopApproaching real-time-hadoop
Approaching real-time-hadoop
 
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish AbramsGraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
GraphPipe - Blazingly Fast Machine Learning Inference by Vish Abrams
 
Diagnose Your Microservices
Diagnose Your MicroservicesDiagnose Your Microservices
Diagnose Your Microservices
 
Spring & messaging
Spring & messagingSpring & messaging
Spring & messaging
 
Session 203 iouc summit database
Session 203 iouc summit databaseSession 203 iouc summit database
Session 203 iouc summit database
 
Comprehensive Monitoring for Docker
Comprehensive Monitoring for DockerComprehensive Monitoring for Docker
Comprehensive Monitoring for Docker
 
Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017Developers vs DBA's - APACOUC webinar 2017
Developers vs DBA's - APACOUC webinar 2017
 
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management CloudeProseed Oracle Open World 2016 debrief - Oracle Management Cloud
eProseed Oracle Open World 2016 debrief - Oracle Management Cloud
 
How To Visualize Graphs
How To Visualize GraphsHow To Visualize Graphs
How To Visualize Graphs
 
Pentest: footprinting & scan
Pentest: footprinting & scanPentest: footprinting & scan
Pentest: footprinting & scan
 
Jfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and ContainersJfokus 2017 Oracle Dev Cloud and Containers
Jfokus 2017 Oracle Dev Cloud and Containers
 
What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?What is WebRTC? What can I do with it?
What is WebRTC? What can I do with it?
 
Full-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWSFull-stack Web Development with MongoDB, Node.js and AWS
Full-stack Web Development with MongoDB, Node.js and AWS
 
GraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster EverywhereGraalVM: Run Programs Faster Everywhere
GraalVM: Run Programs Faster Everywhere
 
Crafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jetCrafting enhanced customer experience through chatbots, beacons and oracle jet
Crafting enhanced customer experience through chatbots, beacons and oracle jet
 

More from Sonal Raj

Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Sonal Raj
 
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
Sonal Raj
 
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Sonal Raj
 
Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?
Sonal Raj
 
Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018
Sonal Raj
 
Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.
Sonal Raj
 
IT Quiz Mains
IT Quiz MainsIT Quiz Mains
IT Quiz MainsSonal Raj
 
IT Quiz Prelims
IT Quiz PrelimsIT Quiz Prelims
IT Quiz PrelimsSonal Raj
 
Spock the human computer interaction system - synopsis
Spock   the human computer interaction system - synopsisSpock   the human computer interaction system - synopsis
Spock the human computer interaction system - synopsisSonal Raj
 

More from Sonal Raj (9)

Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
Internet of Things with Python & Serverless - PyCon MY 2019 - Kuala Lumpur, M...
 
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
IOT and Home Automation with Serverless Computing | Serverless Days 2019 | So...
 
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
Internet of Python - IOT with Python and Serverless | Sonal Raj | HydPy Feb 2019
 
Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?Progressive Javascript: Why React when you can Vue?
Progressive Javascript: Why React when you can Vue?
 
Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018Alexa enabled smart home programming in Python - PyCon India 2018
Alexa enabled smart home programming in Python - PyCon India 2018
 
Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.Startup Diagnostics: Reasons why startups can fail.
Startup Diagnostics: Reasons why startups can fail.
 
IT Quiz Mains
IT Quiz MainsIT Quiz Mains
IT Quiz Mains
 
IT Quiz Prelims
IT Quiz PrelimsIT Quiz Prelims
IT Quiz Prelims
 
Spock the human computer interaction system - synopsis
Spock   the human computer interaction system - synopsisSpock   the human computer interaction system - synopsis
Spock the human computer interaction system - synopsis
 

Recently uploaded

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Real Time Graph Computations in Storm, Neo4J, Python - PyCon India 2013

  • 1. Real-Time stream computation on graphs using Storm, Neo4j and Python Sonal Raj http://www.sonalraj.com Presented at Pycon India 2013 Bangalore, India Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 1
  • 2. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Introduction 2 • With data multiplying each day, storage and knowledge extraction is a major concern. • Social Data Analysis, Business Intelligence • Constraints of Real Time and Fault-Tolerant Processing
  • 3. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com . . In this Talk 3 • A look at storm as a distributed computation Framework • Neo4J as a NoSQL graph database • Some Cool Pictures • What are we trying to achieve ?
  • 4. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Disclaimer ! 4 • This talk presents an overview of Storm and Neo4J . . Less dirty details  • I’m going to go pretty fast . . . Please hang on.
  • 5. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 5 Part -1 Storm – The Hadoop of Real Time
  • 6. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Don’t we have Hadoop ? 6
  • 7. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 7 STORM HADOOP • Distributed Processing • Fault Tolerance
  • 8. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 8 HADOOP • Large but Finite Jobs • Processes a Lot of Data at Once • High Latency
  • 9. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Storm v/s Hadoop 9 HADOOP • Large but Finite Jobs • Processes a Lot of Data at Once • High Latency Storm Infinite Computations called Topologies Process Infinite Streams of data one-tuple-at-a-time Low Latency
  • 10. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com So, what Storm gives us . . 10  Real-Time Computations  Guaranteed data Processing  Horizontal Scalability and Fault-Tolerance  No intermediate message Brokers  Higher Abstraction than Message Passing, so makes sense !
  • 11. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 11 Streams Tuple Tuple Tuple Tuple Tuple An unbounded sequence of Tuples
  • 12. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 12 Streams Tuple Tuple Tuple Tuple Tuple An unbounded sequence of Tuples So, what kind of a tuple is this ?
  • 13. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 13 Spouts A source of Streams
  • 14. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 14 Spouts A source of Streams But, what is the source FOR the spouts ?
  • 15. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 15 Bolts Computational units processing input streams and producing new streams
  • 16. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 16 Bolts Computational units processing input streams and producing new streams Just 1 stream ?
  • 17. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little deeper . . Concepts 17 Topologies A network of spouts and bolts
  • 18. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Is that it . . . ? 18 Tasks and Parallelism A spout or bolt can execute multiple tasks across the cluster
  • 19. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 19 [ ]Mr. Tuple O Shoot, where do I go now?
  • 20. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Groupings . . To the rescue of Mr. Tuple ! 20 • Shuffle Grouping #pick a random task • Fields Grouping #mod hashing on a subset of tuple fields • All Grouping #sends to all tasks • Global Grouping #picks task with lowest task id
  • 21. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 21 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR
  • 22. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 22 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR If this were Hadoop Job Tracker Task Tracker
  • 23. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A Storm Cluster 23 NIMBUS ZOOKEEPER ZOOKEEPER ZOOKEEPER SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR SUPERVISOR But it’s NOT Hadoop ! Co-ordinates Everything
  • 24. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Salient Features . . 24 • Storm > 0.7 supports Transactional Topologies  Processes small batches of topologies  If failure during commit, both batch+commit is retried • Storm guarantees message Processing using acknowledgements • Petrel by AirSage is a python wrapper for Storm ; you can write and submit topologies in Python.
  • 25. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 25 Part -2 Neo4J – “Get Graphed”
  • 26. 26 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com This is how Graph Data was represented in RDBMS.
  • 27. 27 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com ENTER, NOSQL DATABASES
  • 28. 28 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Types of NOSQL Databases Graph databases Document databases Column- Family Key-Value Stores Data Complexity DataSize
  • 29. 29 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Why NOSQL Databases • Easily horizontally scalable • Dynamic Schemas, Handle Unstructured data really well. • Excel in speed and volume • Trade off in consistency for efficiency (except in graph databases . . .We’ll see why  ) • Pleasure to code • Free to use any query language ( even SQL ! ) • Downtime? What Downtime ?
  • 30. 30 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com The Property Graph Model of Graph Databases • Core Abstractions  Nodes  Relationship between Nodes  Properties of both • Traversal Framework High Performance Queries on connected datasets • Bindings REST, Gremlin, etc.
  • 31. 31 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Neo4J • Fully ACID with rollbacks support (unbelievable!) • Schema-less and Efficient storage of Semi Structured Data • Fast deep traversal instead of slow SQL queries that span many table joins • Whiteboard Friendly • Very natural to express graph related problems with traversals (recommendation engine, shortest path etc..)
  • 32. 32 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Neo4J Pythonized ! • Py2Neo is an excellent binding for Neo4J • Accesses Neo4J using it’s RESTful API • Still under development . . Features like labels yet to be included !
  • 33. 33 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com So,Will Relational databases be Extinct ? OOPS!
  • 34. 34 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Categories of Graphical Data • Social Networks • Citations • Product Co-Purchasing • Internet peer-to-peer • Road Network and Map Data • Web Graphs Excellent Source of Sample Graphical Data “ http://snap.Stanford.edu/data/ “
  • 35. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 35 Part -3 Get your hands dirty !
  • 36. 36 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A demo . . • Sample Social Network data set • Data Includes people signing up info, adding friends, unfriending etc. . . for a month’s activity • Neo4J  Store and Update the social data • Storm  Calculate “friendship-index”
  • 37. 37 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A demo . . • “friendship-index”  n = Through how many people is person “A” connected to person “B”  Gives an idea of how close two people are !  Useful while searching friends on Social Networks ( something like friends of friends concept in facebook’s graph search )
  • 38. 38 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com The Topology . . Update Spout Update Bolt Query Spout Query Bolt Source Source
  • 39. 39Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout
  • 40. 40Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout Define what kind of tuples are emitted
  • 41. 41Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Spout Gets and emits tuple streams
  • 42. 42Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt
  • 43. 43Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt Objects for database access and indexing service
  • 44. 44Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Update Bolt
  • 45. 45Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Spout
  • 46. 46Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Spout The tuple to be emitted can contain multiple entities.
  • 47. 47Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt
  • 48. 48Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt
  • 49. 49Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt Retrieve caller friend and requested friend ids
  • 50. 50Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Query Bolt Retrieve caller friend and requested friend ids as per database
  • 51. 51Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology
  • 52. 52Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology Import all spout and bolt files
  • 53. 53Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Create Topology Unfortunately,There was no option in Petrel to turn off console debug, so the console view is really messy.
  • 54. 54Copyrights © 2013, Sonal Raj, http://www.sonalraj.com Topology.yaml Configurations to the topology are specified in this file
  • 55. 55 Copyrights © 2013, Sonal Raj, http://www.sonalraj.com A little More . . Update Spout Update Bolt Query Spout Query Bolt Source Source
  • 56. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 56 Final Thoughts • A Storm-Neo4j framework is a boon for real-time graph computations • Quite flexible in Java, Python bindings and implementations still have a long way to go. • If you are an Admin or developer, Analyse your data and computing requirements before narrowing down on a framework.
  • 57. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 57 …to play with Storm and Neo4J • My PyCon Talk Repo – slides, code skeletons, etc. http://www.sonalraj.com/neo-storm.html • Storm documentation (official) http://github.com/nathanmarz/storm • Storm Book http://www.amazon.com/Getting-Started-Storm-Jonathan- Leibiusky/dp/1449324010 • Deployment of storm on AWS http://github.com/nathanmarz/storm-deploy • Neo4J Documentation http://www.neo4j.org
  • 58. Copyrights © 2013, Sonal Raj, http://www.sonalraj.com 58 Ex-terminated . . . - That’s it - Thanks for Listening ! - Questions