SlideShare a Scribd company logo
Big Graph Analytics on Neo4j with
Apache Spark
Kenny Bastani
FOSDEM '15, Graph Processing Devroom
My background
Second year speaking in the graph devroom
Extremely thankful for all the organizers of this
devroom have done for this technology
Thank you organizers!
I apologize in advance for these
incredibly low-budget slides
Engineer + Evangelism
I evangelize about Neo4j and graph database
technology
I am an engineer at Digital Insight, a Silicon Valley
based SaaS banking platform provider.
Engineering
NCR/Digital Insight
$18 Trillion per year in ATM withdrawals
As a Graph Database Evangelist
I make cool things with graphs and blog about it
Just so we're clear…
I'm not selling you anything today
Agenda
Let's try and have some fun.
We're going to run PageRank on all human knowledge.
After a lot of low-budget slides.
The Problem
It's hard to analyze graphs at scale
The importance of graph algorithms
PageRank gave us Google
Friend of a friend gave us Facebook
Every hacker and tinkerer should be
able to turn world changing ideas into a
reality
Every research scientist who needs
graph analytics to save millions of lives
should have that power
The simple fact is that you are brilliant
but your brilliant ideas require complex
big data analytics
So then you need to learn a lot of
things or steal a lot of money to change
the world
Why is it so hard to do this stuff?
Enemy #1:
Relational Databases
Relational databases store data in ways that make it
difficult to extract graphs for analysis
– This guy represents every business person that controls your future as an
engineer
"I need to combine 32 different tables in 5
different system of records and put it in a CSV
every hour."
You're probably thinking…
Sorry business, no PageRank for you!
But you courageously move forward
regardless
This is what is likely to happen next.
Enemy #2:
Big Data
If you still think Big Data is a buzz word
You haven't had to feel the pain of failing at it.
When you hit a wall because your data is
too big
You start to see what this big data thing is all about.
What seems to be the problem? Where's
my PageRank
All those records you merged from those 32 different
tables turns out to be petabytes of data.
Frantic and scared
You turn to your most dependable friend
You might search for a
Now that you know that Big Data is the
real deal
You read "Big Data for Dummies" and continue to
tackle the PageRank problem
Distributed File Systems
Distributed file systems are a foundational component
of big data analytics
Chops things into manageable sized blocks, usually
64mb
Spreads those blocks out across a cluster of VM
resources
Hadoop MapReduce
Worth mentioning, Hadoop started this whole
MapReduce craze
You could translate the raw data from a CSV and turn
it into a map of keys to values
Keys are distributed per node and used to reduce the
values into a partitioned analysis
Ok so now you know about Big Data,
Hadoop, HDFS
You fire up your Amazon EC2 Hadoop cluster...
This guy is still waiting…
You hold your breath..
And submit the PageRank job…
You wait…
3 hours later…
Out of memory: heap space exceeded!
It must be the configs
You check the configs
Increase the heap space
Do some Stackoverflow trolling
And you submit the PageRank job again…
Graph algorithms can be evil at scale
It depends on the complexity of your graph
How many strongly connected components you have
But since some graph algorithms like PageRank are
iterative
You have to iterate from one stage and use the results
of the previous stage
It doesn't matter how many nodes you
have in your cluster
For iterative graph algorithms the complexity of the
graph will make you or break you
Graphs with high complexity need a lot of memory to
be processed iteratively
This guy is going to have to settle for
collaborative filtering
Neo4j Mazerunner Project
What is Neo4j Mazerunner?
The basic idea is…
Graph databases need ETL so you can analyze your
data and look it up later.
Graph databases are great and all, but…
No platform in the open source world should be the
one platform that does everything.
Especially a database.
Docker
If you're not up on Docker, let me give you a quick
intro.
Docker
Docker is a VM framework that lets you easily create a
recipe for an image and deploy applications with ease.
The idea is that infrastructure and operational
complexity makes it hard for agile development of new
products.
Why?
If I am an engineer on a product team, I want to
choose my own software libraries and languages to
solve problems.
Microservices for the win
So here is the future of software development:
• Cloud OS like Apache Mesos manages datacenter
resources
• If you build a new service, use whatever application
framework you want. As long as you communicate
over REST.
Microservices cont.
Docker gives you the freedom to use Neo4j, or
OrientDB, or MongoDB or whatever application
dependency you want inside your container.
Because of something called graceful degradation, if
OrientDB or Neo4j fail at being everything, they'll fault
only within their container and not bring your entire
SaaS platform to its knees.
Beware of the monolith…
Monolithic apps are those software platforms that just
try and do every possible damn thing. They're like
Swiss army knives of the software world.
If you rely on one service to do everything, your entire
platform is going to come down when it fails.
And it will fail…
Docker cont.
Summarizing:
• Docker containerizes your bad engineering
decisions without bringing down your platform.
• So I'm pretty much a fan of that.
So is this guy…
Mazerunner runs on Docker
You can pull it down and deploy it safely and roll the
dice on some awesome analytics capabilities that lend
well to graph data models.
HDFS and Apache Spark
Apache Spark is a really interesting open source
project.
It is a scalable big data and machine learning platform.
It also lets you do in-memory analytics on a graph
dataset.
So I wrote the world's first analysis
service for a graph database that does
2-way ETL
That scared a lot of people in the
"graph databases are infallible" club
Analysis is not a lookup
Analytics on graphs takes massive amounts of
system resources and might bring down your OLTP
capabilities as it competes to share system
resources
Now let's fire up Neo4j Mazerunner
Demo Goals:
• I will hopefully be successful at showing you how
to install Mazerunner on Docker
• I will demo you an analysis job scheduler that
extracts subgraphs, analyzes them, and pops the
results back to Neo4j
Where do we go now?
Become a committer to the project and let's make it better
Find the link on my blog — www.kennybastani.com
Thanks!
Follow me on Twitter:
http://www.twitter.com/kennybastani

More Related Content

What's hot

How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
Samet KILICTAS
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
Sigmoid
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
William Lyon
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
Andy Petrella
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer space
GraphAware
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Krishna Sankar
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Spark Summit
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
South London Geek Nights
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
Paco Nathan
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
Paco Nathan
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Paco Nathan
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
Doug Needham
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
Kenny Bastani
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
Krishna Sankar
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
Philippe Mizrahi
 

What's hot (20)

How Graph Databases used in Police Department?
How Graph Databases used in Police Department?How Graph Databases used in Police Department?
How Graph Databases used in Police Department?
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Congressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4jCongressional PageRank: Graph Analytics of US Congress With Neo4j
Congressional PageRank: Graph Analytics of US Congress With Neo4j
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Machine Learning and GraphX
Machine Learning and GraphXMachine Learning and GraphX
Machine Learning and GraphX
 
Signals from outer space
Signals from outer spaceSignals from outer space
Signals from outer space
 
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & AntidotesBig Data Analytics - Best of the Worst : Anti-patterns & Antidotes
Big Data Analytics - Best of the Worst : Anti-patterns & Antidotes
 
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier TordoirShare and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
Share and analyze geonomic data at scale by Andy Petrella and Xavier Tordoir
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Apache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big DataApache Spark and the Emerging Technology Landscape for Big Data
Apache Spark and the Emerging Technology Landscape for Big Data
 
Use of standards and related issues in predictive analytics
Use of standards and related issues in predictive analyticsUse of standards and related issues in predictive analytics
Use of standards and related issues in predictive analytics
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Gephi, Graphx, and Giraph
Gephi, Graphx, and GiraphGephi, Graphx, and Giraph
Gephi, Graphx, and Giraph
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Open Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache SparkOpen Source Big Graph Analytics on Neo4j with Apache Spark
Open Source Big Graph Analytics on Neo4j with Apache Spark
 
An excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphXAn excursion into Graph Analytics with Apache Spark GraphX
An excursion into Graph Analytics with Apache Spark GraphX
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Meetup SF - Amundsen
Meetup SF  -  AmundsenMeetup SF  -  Amundsen
Meetup SF - Amundsen
 

Viewers also liked

Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
Max De Marzi
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
Kenny Bastani
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
Neo4j
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
Kenny Bastani
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
Antonio Maccioni
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
jexp
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
Kenny Bastani
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
Neo4j
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
Peter Neubauer
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
Max De Marzi
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
Yuanyuan Tian
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Spark Summit
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
Sigmoid
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Martin Junghanns
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Spark Summit
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
Andrea Iacono
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
Neo4j
 
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
Neo4j
 
GraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in ActionGraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in Action
Neo4j
 
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
Neo4j
 

Viewers also liked (20)

Introduction to Graph Databases
Introduction to Graph DatabasesIntroduction to Graph Databases
Introduction to Graph Databases
 
Natural Language Processing with Neo4j
Natural Language Processing with Neo4jNatural Language Processing with Neo4j
Natural Language Processing with Neo4j
 
Data Modeling with Neo4j
Data Modeling with Neo4jData Modeling with Neo4j
Data Modeling with Neo4j
 
Natural language search using Neo4j
Natural language search using Neo4jNatural language search using Neo4j
Natural language search using Neo4j
 
Converting Relational to Graph Databases
Converting Relational to Graph DatabasesConverting Relational to Graph Databases
Converting Relational to Graph Databases
 
Intro to Neo4j presentation
Intro to Neo4j presentationIntro to Neo4j presentation
Intro to Neo4j presentation
 
Document Classification with Neo4j
Document Classification with Neo4jDocument Classification with Neo4j
Document Classification with Neo4j
 
Webinar: RDBMS to Graphs
Webinar: RDBMS to GraphsWebinar: RDBMS to Graphs
Webinar: RDBMS to Graphs
 
Neo4j - 5 cool graph examples
Neo4j - 5 cool graph examplesNeo4j - 5 cool graph examples
Neo4j - 5 cool graph examples
 
Graph database Use Cases
Graph database Use CasesGraph database Use Cases
Graph database Use Cases
 
Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)Big Graph Analytics Systems (Sigmod16 Tutorial)
Big Graph Analytics Systems (Sigmod16 Tutorial)
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Using spark for timeseries graph analytics
Using spark for timeseries graph analyticsUsing spark for timeseries graph analytics
Using spark for timeseries graph analytics
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis RoosBuilding a Graph of all US Businesses Using Spark Technologies by Alexis Roos
Building a Graph of all US Businesses Using Spark Technologies by Alexis Roos
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial FraudGraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
GraphDay Stockholm - Levaraging Graph-Technology to fight Financial Fraud
 
GraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in ActionGraphDay Stockholm - Graphs in Action
GraphDay Stockholm - Graphs in Action
 
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
GraphDay Stockholm - iKnow Solutions - The Value Add of Graphs to Analytics a...
 

Similar to Big Graph Analytics on Neo4j with Apache Spark

GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
Turi, Inc.
 
2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata
balu kvm
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
benoitg
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Benjamin Nussbaum
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
Stanford University
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
Alluxio, Inc.
 
The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)
Pavlo Baron
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
Paolo Platter
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
yhadoop
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
Ahmed Kamal
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
Daniel Zivkovic
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
akira-ai
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
SangeetaTripathi8
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
Trieu Nguyen
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Data Science Milan
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
Max De Marzi
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
Stefano Paluello
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Ahmet Akyol
 

Similar to Big Graph Analytics on Neo4j with Apache Spark (20)

GraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos GuestrinGraphLab Conference 2014 Keynote - Carlos Guestrin
GraphLab Conference 2014 Keynote - Carlos Guestrin
 
2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata2951085 dzone-2016guidetobigdata
2951085 dzone-2016guidetobigdata
 
Databases benoitg 2009-03-10
Databases benoitg 2009-03-10Databases benoitg 2009-03-10
Databases benoitg 2009-03-10
 
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
Knowledge Graphs - Journey to the Connected Enterprise - Data Strategy and An...
 
Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
The Future of Computing is Distributed
The Future of Computing is DistributedThe Future of Computing is Distributed
The Future of Computing is Distributed
 
The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)The Big Data Developer (@pavlobaron)
The Big Data Developer (@pavlobaron)
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?Is Spark the right choice for data analysis ?
Is Spark the right choice for data analysis ?
 
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
All in AI: LLM Landscape & RAG in 2024 with Mark Ryan (Google) & Jerry Liu (L...
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx10-Hot-Data-Analytics-Tre-8904178.ppsx
10-Hot-Data-Analytics-Tre-8904178.ppsx
 
Lambda architecture for real time big data
Lambda architecture for real time big dataLambda architecture for real time big data
Lambda architecture for real time big data
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
 
DataDay 2023 Presentation - Notes
DataDay 2023 Presentation - NotesDataDay 2023 Presentation - Notes
DataDay 2023 Presentation - Notes
 
A gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and HadoopA gentle introduction to the world of BigData and Hadoop
A gentle introduction to the world of BigData and Hadoop
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
 

More from Kenny Bastani

In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
Kenny Bastani
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
Kenny Bastani
 
Extending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud FoundryExtending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud Foundry
Kenny Bastani
 
Back your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud FoundryBack your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud Foundry
Kenny Bastani
 
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing MicroservicesUsing Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Kenny Bastani
 
Cloud Native Java Microservices
Cloud Native Java MicroservicesCloud Native Java Microservices
Cloud Native Java Microservices
Kenny Bastani
 
Building REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring CloudBuilding REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring Cloud
Kenny Bastani
 
Neo4j Graph Data Modeling
Neo4j Graph Data ModelingNeo4j Graph Data Modeling
Neo4j Graph Data Modeling
Kenny Bastani
 
Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0
Kenny Bastani
 

More from Kenny Bastani (9)

In the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at MicroservicesIn the Eventual Consistency of Succeeding at Microservices
In the Eventual Consistency of Succeeding at Microservices
 
Building Cloud Native Architectures with Spring
Building Cloud Native Architectures with SpringBuilding Cloud Native Architectures with Spring
Building Cloud Native Architectures with Spring
 
Extending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud FoundryExtending the Platform with Spring Boot and Cloud Foundry
Extending the Platform with Spring Boot and Cloud Foundry
 
Back your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud FoundryBack your app with MySQL and Redis on Cloud Foundry
Back your app with MySQL and Redis on Cloud Foundry
 
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing MicroservicesUsing Docker, Neo4j, and Spring Cloud for Developing Microservices
Using Docker, Neo4j, and Spring Cloud for Developing Microservices
 
Cloud Native Java Microservices
Cloud Native Java MicroservicesCloud Native Java Microservices
Cloud Native Java Microservices
 
Building REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring CloudBuilding REST APIs with Spring Boot and Spring Cloud
Building REST APIs with Spring Boot and Spring Cloud
 
Neo4j Graph Data Modeling
Neo4j Graph Data ModelingNeo4j Graph Data Modeling
Neo4j Graph Data Modeling
 
Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0Building Killer Apps with Neo4j 2.0
Building Killer Apps with Neo4j 2.0
 

Recently uploaded

Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
ldtexsolbl
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
Matthias Neugebauer
 
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
Priyanka Aash
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
SynapseIndia
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
aakash malhotra
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Torry Harris
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
aslasdfmkhan4750
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
ankush9927
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
BrainSell Technologies
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
Priyanka Aash
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
Shiv Technolabs
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
Anant Gupta
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
Neo4j
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
Priyanka Aash
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
maigasapphire
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
Axel Rennoch
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
Ivanti
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
Baishakhi Ray
 
What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024
Toru Wonyoung Choi
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
Priyanka Aash
 

Recently uploaded (20)

Types of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technologyTypes of Weaving loom machine & it's technology
Types of Weaving loom machine & it's technology
 
Opencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of MünsterOpencast Summit 2024 — Opencast @ University of Münster
Opencast Summit 2024 — Opencast @ University of Münster
 
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
(CISOPlatform Summit & SACON 2024) Orientation by CISO Platform_ Using CISO P...
 
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptxUse Cases & Benefits of RPA in Manufacturing in 2024.pptx
Use Cases & Benefits of RPA in Manufacturing in 2024.pptx
 
Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024Three New Criminal Laws in India 1 July 2024
Three New Criminal Laws in India 1 July 2024
 
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...Evolution of iPaaS - simplify IT workloads to provide a unified view of  data...
Evolution of iPaaS - simplify IT workloads to provide a unified view of data...
 
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
High Profile Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class ...
 
Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10Computer HARDWARE presenattion by CWD students class 10
Computer HARDWARE presenattion by CWD students class 10
 
Acumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptxAcumatica vs. Sage Intacct _Construction_July (1).pptx
Acumatica vs. Sage Intacct _Construction_July (1).pptx
 
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
(CISOPlatform Summit & SACON 2024) Keynote _ Power Digital Identities With AI...
 
The Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF GuideThe Role of IoT in Australian Mobile App Development - PDF Guide
The Role of IoT in Australian Mobile App Development - PDF Guide
 
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes..."Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
"Mastering Graphic Design: Essential Tips and Tricks for Beginners and Profes...
 
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdfBT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
BT & Neo4j: Knowledge Graphs for Critical Enterprise Systems.pptx.pdf
 
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
(CISOPlatform Summit & SACON 2024) Digital Personal Data Protection Act.pdf
 
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
Girls Call Churchgate 9910780858 Provide Best And Top Girl Service And No1 in...
 
The importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT StandardizationThe importance of Quality Assurance for ICT Standardization
The importance of Quality Assurance for ICT Standardization
 
Patch Tuesday de julio
Patch Tuesday de julioPatch Tuesday de julio
Patch Tuesday de julio
 
Semantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software DevelopmentSemantic-Aware Code Model: Elevating the Future of Software Development
Semantic-Aware Code Model: Elevating the Future of Software Development
 
What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024What's new in android: jetpack compose 2024
What's new in android: jetpack compose 2024
 
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
(CISOPlatform Summit & SACON 2024) Gen AI & Deepfake In Overall Security.pdf
 

Big Graph Analytics on Neo4j with Apache Spark

  • 1. Big Graph Analytics on Neo4j with Apache Spark Kenny Bastani FOSDEM '15, Graph Processing Devroom
  • 2. My background Second year speaking in the graph devroom Extremely thankful for all the organizers of this devroom have done for this technology Thank you organizers!
  • 3. I apologize in advance for these incredibly low-budget slides
  • 4. Engineer + Evangelism I evangelize about Neo4j and graph database technology I am an engineer at Digital Insight, a Silicon Valley based SaaS banking platform provider.
  • 5. Engineering NCR/Digital Insight $18 Trillion per year in ATM withdrawals
  • 6. As a Graph Database Evangelist I make cool things with graphs and blog about it
  • 7. Just so we're clear… I'm not selling you anything today
  • 8. Agenda Let's try and have some fun. We're going to run PageRank on all human knowledge. After a lot of low-budget slides.
  • 9. The Problem It's hard to analyze graphs at scale
  • 10. The importance of graph algorithms PageRank gave us Google Friend of a friend gave us Facebook
  • 11. Every hacker and tinkerer should be able to turn world changing ideas into a reality
  • 12. Every research scientist who needs graph analytics to save millions of lives should have that power
  • 13. The simple fact is that you are brilliant but your brilliant ideas require complex big data analytics
  • 14. So then you need to learn a lot of things or steal a lot of money to change the world
  • 15. Why is it so hard to do this stuff?
  • 16. Enemy #1: Relational Databases Relational databases store data in ways that make it difficult to extract graphs for analysis
  • 17. – This guy represents every business person that controls your future as an engineer "I need to combine 32 different tables in 5 different system of records and put it in a CSV every hour."
  • 19. Sorry business, no PageRank for you!
  • 20. But you courageously move forward regardless This is what is likely to happen next.
  • 21. Enemy #2: Big Data If you still think Big Data is a buzz word You haven't had to feel the pain of failing at it.
  • 22. When you hit a wall because your data is too big You start to see what this big data thing is all about.
  • 23. What seems to be the problem? Where's my PageRank All those records you merged from those 32 different tables turns out to be petabytes of data.
  • 24. Frantic and scared You turn to your most dependable friend
  • 26. Now that you know that Big Data is the real deal You read "Big Data for Dummies" and continue to tackle the PageRank problem
  • 27. Distributed File Systems Distributed file systems are a foundational component of big data analytics Chops things into manageable sized blocks, usually 64mb Spreads those blocks out across a cluster of VM resources
  • 28. Hadoop MapReduce Worth mentioning, Hadoop started this whole MapReduce craze You could translate the raw data from a CSV and turn it into a map of keys to values Keys are distributed per node and used to reduce the values into a partitioned analysis
  • 29. Ok so now you know about Big Data, Hadoop, HDFS You fire up your Amazon EC2 Hadoop cluster...
  • 30. This guy is still waiting…
  • 31. You hold your breath.. And submit the PageRank job… You wait…
  • 32. 3 hours later… Out of memory: heap space exceeded!
  • 33. It must be the configs You check the configs Increase the heap space Do some Stackoverflow trolling And you submit the PageRank job again…
  • 34. Graph algorithms can be evil at scale It depends on the complexity of your graph How many strongly connected components you have But since some graph algorithms like PageRank are iterative You have to iterate from one stage and use the results of the previous stage
  • 35. It doesn't matter how many nodes you have in your cluster For iterative graph algorithms the complexity of the graph will make you or break you Graphs with high complexity need a lot of memory to be processed iteratively
  • 36. This guy is going to have to settle for collaborative filtering
  • 38. What is Neo4j Mazerunner?
  • 39. The basic idea is… Graph databases need ETL so you can analyze your data and look it up later. Graph databases are great and all, but… No platform in the open source world should be the one platform that does everything. Especially a database.
  • 40. Docker If you're not up on Docker, let me give you a quick intro.
  • 41. Docker Docker is a VM framework that lets you easily create a recipe for an image and deploy applications with ease. The idea is that infrastructure and operational complexity makes it hard for agile development of new products.
  • 42. Why? If I am an engineer on a product team, I want to choose my own software libraries and languages to solve problems.
  • 43. Microservices for the win So here is the future of software development: • Cloud OS like Apache Mesos manages datacenter resources • If you build a new service, use whatever application framework you want. As long as you communicate over REST.
  • 44. Microservices cont. Docker gives you the freedom to use Neo4j, or OrientDB, or MongoDB or whatever application dependency you want inside your container. Because of something called graceful degradation, if OrientDB or Neo4j fail at being everything, they'll fault only within their container and not bring your entire SaaS platform to its knees.
  • 45. Beware of the monolith… Monolithic apps are those software platforms that just try and do every possible damn thing. They're like Swiss army knives of the software world. If you rely on one service to do everything, your entire platform is going to come down when it fails. And it will fail…
  • 46. Docker cont. Summarizing: • Docker containerizes your bad engineering decisions without bringing down your platform. • So I'm pretty much a fan of that.
  • 47. So is this guy…
  • 48. Mazerunner runs on Docker You can pull it down and deploy it safely and roll the dice on some awesome analytics capabilities that lend well to graph data models.
  • 49. HDFS and Apache Spark Apache Spark is a really interesting open source project. It is a scalable big data and machine learning platform. It also lets you do in-memory analytics on a graph dataset.
  • 50. So I wrote the world's first analysis service for a graph database that does 2-way ETL
  • 51. That scared a lot of people in the "graph databases are infallible" club
  • 52. Analysis is not a lookup
  • 53. Analytics on graphs takes massive amounts of system resources and might bring down your OLTP capabilities as it competes to share system resources
  • 54. Now let's fire up Neo4j Mazerunner Demo Goals: • I will hopefully be successful at showing you how to install Mazerunner on Docker • I will demo you an analysis job scheduler that extracts subgraphs, analyzes them, and pops the results back to Neo4j
  • 55. Where do we go now? Become a committer to the project and let's make it better Find the link on my blog — www.kennybastani.com
  • 56. Thanks! Follow me on Twitter: http://www.twitter.com/kennybastani