SlideShare a Scribd company logo
1 of 21
Download to read offline
Realtime

Distributed Analysis

of Datastreams

Philipp Nolte – University of Passau – January 2014

1
Learn
Why we need fancy Big Data frameworks.
How the lambda architecture looks like.
How twitter used to do real-time analytics.
Why twitter created Storm.
How Storm works.
2
Limits

Imagine a traditional web analytics software:
Every page view increments

the url’s database row.

3
First Aid

Queue your writes and write in batches.
Shard your data: Partition horizontally.

4
Chronic Issues
Fault-tolerance is hard.
Applications become more and more complex.
You have to do all the work.

5
New Tools
Large scale computation systems such as Hadoop.
Scalable databases such as Casandra and Riak.
Easy to use frameworks such as Storm and Dempsy.

6
Lambda Architecture
Theoretical, abstract architecture for working with big data.

Speed Layer
Serving Layer
Batch Layer
7
Goal

Compute arbitrary functions on arbitrary data.
query = function ( all data )

8
Properties
Robust and fault-tolerant.
Low latency reads and updates.
Scalable.
Minimal maintenance.

9
Batch Layer

Speed Layer
Serving Layer

Stores the immutable master dataset.
Precomputes arbitrary batch views.
Home of batch processing and map

reduce systems such as Hadoop.

10

Batch Layer
Serving Layer

Speed Layer
Serving Layer
Batch Layer

Read-only random-access to batch views.
Updated by batch layer.
Indexes batch views.
Home of real-time query systems

such as Cloudera Impala for Hadoop.
11
Speed Layer

Speed Layer
Serving Layer
Batch Layer

Compensates for high-latency batch views.
Fast, incremental algorithms.
More complex because of random-writes.
Home of Apache HBase or Storm.

12
Lambda Architecture
Speed Layer
Realtime Views
Batch Views

Data

Serving Layer
Batch Layer
13

Query
Available Data
Batch View
Time

Realtime View
Discard Realtime View

as soon as it is represented
in the batch view.

Batch View

Realtime View

14
Twitter’s Early Days
Worker

Queue

Queue

Worker

Worker

Queue

Worker

Map

Queue

Worker

Queue

Worker

Tweets

Worker

Queue

Worker

URLs
Hadoop

Cassandra
15
Storm
Guaranteed message processing without

message brokers.
Horizontal scalability.
Fault-tolerance.
High level of abstraction.
Just works.
16
Storm Topologies
Stream

Spout

⚡️Bolt

⚡️Bolt

Spout

⚡️Bolt

⚡️Bolt

17
Parallel Tasks
Task

Spout
T

T

⚡️Bolt
T

Spout
T

Stream

T

⚡️Bolt

T

⚡️Bolt
T

18

T

T

⚡️Bolt
T

T

T
Demo

Storm in action

19
Know
Why we need fancy Big Data frameworks.
How the lambda architecture looks like.
How twitter used to do real-time analytics.
Why twitter created Storm.
How Storm works.
20
The End.

Questions?

21

More Related Content

What's hot

Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013
Christopher Curtin
 

What's hot (20)

Cassandra Essentials Day Cambridge
Cassandra Essentials Day CambridgeCassandra Essentials Day Cambridge
Cassandra Essentials Day Cambridge
 
Final deck
Final deckFinal deck
Final deck
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
Managed Cluster Services
Managed Cluster ServicesManaged Cluster Services
Managed Cluster Services
 
Spark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed AwanSpark Summit EU talk by Ahsan Javed Awan
Spark Summit EU talk by Ahsan Javed Awan
 
Big Data - Part IV
Big Data - Part IVBig Data - Part IV
Big Data - Part IV
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
What's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache SparkWhat's next for Big Data? -- Apache Spark
What's next for Big Data? -- Apache Spark
 
Realtime search
Realtime searchRealtime search
Realtime search
 
Big Data - Part III
Big Data - Part IIIBig Data - Part III
Big Data - Part III
 
Ajug april 2011
Ajug april 2011Ajug april 2011
Ajug april 2011
 
How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...How to teach your data scientist to leverage an analytics cluster with Presto...
How to teach your data scientist to leverage an analytics cluster with Presto...
 
Hadoop and Big Data: Revealed
Hadoop and Big Data: RevealedHadoop and Big Data: Revealed
Hadoop and Big Data: Revealed
 
Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...Developing high frequency indicators using real time tick data on apache supe...
Developing high frequency indicators using real time tick data on apache supe...
 
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J..."Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
"Einstürzenden Neudaten: Building an Analytics Engine from Scratch", Tobias J...
 
Big Data - Part II
Big Data - Part IIBig Data - Part II
Big Data - Part II
 
Amazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the CloudAmazon Web Services: Lessons for Architecting Data in the Cloud
Amazon Web Services: Lessons for Architecting Data in the Cloud
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013Atlanta hadoop users group july 2013
Atlanta hadoop users group july 2013
 
Big Data - Part I
Big Data - Part IBig Data - Part I
Big Data - Part I
 

Viewers also liked (8)

Presentació tdr Ignasi Güell
Presentació tdr Ignasi GüellPresentació tdr Ignasi Güell
Presentació tdr Ignasi Güell
 
Fortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative FilteringFortschritte im Bereich Collaborative Filtering
Fortschritte im Bereich Collaborative Filtering
 
Robustheit in Empfehlungssystemen
Robustheit in EmpfehlungssystemenRobustheit in Empfehlungssystemen
Robustheit in Empfehlungssystemen
 
Trust-based recommender systems
Trust-based recommender systemsTrust-based recommender systems
Trust-based recommender systems
 
Ansätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches FilteringAnsätze für gemeinschaftliches Filtering
Ansätze für gemeinschaftliches Filtering
 
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für EmpfehlungssystemeTrust und Interest Similarity und deren Anwendung für Empfehlungssysteme
Trust und Interest Similarity und deren Anwendung für Empfehlungssysteme
 
Effiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen DatenmengenEffiziente Verarbeitung von großen Datenmengen
Effiziente Verarbeitung von großen Datenmengen
 
Profile Injection Attack Detection in Recommender System
Profile Injection Attack Detection in Recommender SystemProfile Injection Attack Detection in Recommender System
Profile Injection Attack Detection in Recommender System
 

Similar to Realtime
 Distributed Analysis
 of Datastreams

Similar to Realtime
 Distributed Analysis
 of Datastreams (20)

Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
Artur Borycki - Beyond Lambda - how to get from logical to physical - code.ta...
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
How can Hadoop & SAP be integrated
How can Hadoop & SAP be integratedHow can Hadoop & SAP be integrated
How can Hadoop & SAP be integrated
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 
Big Data, Mob Scale.
Big Data, Mob Scale.Big Data, Mob Scale.
Big Data, Mob Scale.
 
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Events, Mob Scale - Darach Ennis (Push Technology)
 
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
Part 2: Apache Kudu: Extending the Capabilities of Operational and Analytic D...
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
Real time data processing frameworks
Real time data processing frameworksReal time data processing frameworks
Real time data processing frameworks
 
Big data and hadoop product page
Big data and hadoop product pageBig data and hadoop product page
Big data and hadoop product page
 
Big data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edgeBig data analytics: Technology's bleeding edge
Big data analytics: Technology's bleeding edge
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Need for Time series Database
Need for Time series DatabaseNeed for Time series Database
Need for Time series Database
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
 

More from Florian Stegmaier

More from Florian Stegmaier (10)

Musikempfehlungssysteme
MusikempfehlungssystemeMusikempfehlungssysteme
Musikempfehlungssysteme
 
Linked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für EmpfehlungssystemeLinked Open Data als Basis für Empfehlungssysteme
Linked Open Data als Basis für Empfehlungssysteme
 
Entscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender SystemEntscheidungshilfe: Recommender System
Entscheidungshilfe: Recommender System
 
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem FilternFunktionsweise und Ansätze von inhaltsbasiertem Filtern
Funktionsweise und Ansätze von inhaltsbasiertem Filtern
 
Context Basierte Personalisierungsansätze
Context Basierte PersonalisierungsansätzeContext Basierte Personalisierungsansätze
Context Basierte Personalisierungsansätze
 
Evaluierung von Empfehlungssystemen
Evaluierung von EmpfehlungssystemenEvaluierung von Empfehlungssystemen
Evaluierung von Empfehlungssystemen
 
Effiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen DatenmengenEffiziente Verarbeitung von grossen Datenmengen
Effiziente Verarbeitung von grossen Datenmengen
 
Introduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBCIntroduction to the FP7 CODE project @ BDBC
Introduction to the FP7 CODE project @ BDBC
 
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
Generische Datenintegration zur semantischen Diagnoseunterstützung im Projekt...
 
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
AIR: Architecture for Interoperable Retrieval on Distributed and Heterogeneou...
 

Recently uploaded

CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Enterprise Knowledge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Recently uploaded (20)

From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Realtime
 Distributed Analysis
 of Datastreams