SlideShare a Scribd company logo
1 of 91
Modern Fast
Streaming Data
Todd L. Montgomery
@toddlmontgomery
InfoQ.com: News & Community Site
• 750,000 unique visitors/month
• Published in 4 languages (English, Chinese, Japanese and Brazilian
Portuguese)
• Post content from our QCon conferences
• News 15-20 / week
• Articles 3-4 / week
• Presentations (videos) 12-15 / week
• Interviews 2-3 / week
• Books 1 / month
Watch the video with slide
synchronization on InfoQ.com!
http://www.infoq.com/presentations
/myths-data-streaming
Presented at QCon New York
www.qconnewyork.com
Purpose of QCon
- to empower software development by facilitating the spread of
knowledge and innovation
Strategy
- practitioner-driven conference designed for YOU: influencers of
change and innovation in your teams
- speakers and topics driving the evolution and innovation
- connecting and catalyzing the influencers and innovators
Highlights
- attended by more than 12,000 delegates since 2007
- held in 9 cities worldwide
Why Should We Care?
Myths & Misconceptions
You can’t escape the Math
Technologies & Techniques
Why Should We Care?
Human Knowledge is now
doubling every year*
* by discipline, 12-18 months
Fueled by Technology
Middle Ages - 1500 yrs
Renaissance - 250 yrs
Industrial Revolution - 150 yrs
WWII - 25 years
Buckminster Fuller - Critical Path 1981
IoT
IoT
Ubiquitous Computing
In the near future,
Human Knowledge could
double every 72 hours
What this could mean for
our systems…
Updates/Sec
=
Devices * Frequency * Market Share
Either ingest or streaming.
2x for Request/Response
Updates/Sec
=
Devices * Frequency * Market Share
9 Billion (Today)
50 Billion by 2020 (Cisco)
26 Billion by 2020 (Smartphone/Tablet - Gartner)
75 Billion by 2020 (Morgan Stanley)
Updates/Sec
=
50 Billion * 6/min * 1%
=
50 Million/sec
Bandwidth
=
50 Billion * 6/min * 1% * 200 bytes
=
9.3 GB/s (74.5 Gb/s)
And…
Geographic Distribution
30%
15%
10%
15%
20%
10%
Social & Societal demands
will require
processing an intense stream
of data in real-time
Myths & Misconceptions
Excuses, Excuses!
Myth
(CPUs, Storage, Networks) are not
capable of processing in real-time*
* for some unknown, unquantified data volume
Accumulated
Improvement
Time
Network
Bandwidth
Response Time
Storage
Capacity
CPU Cores
Memory
Capacity
http://en.wikipedia.org/wiki/Instructions_per_second
Year Processor MIPS
1974 Intel 8080 0.29
1982 Intel 286 1.28
1993 PowerPC 601 157
2003 Pentium 4 Extreme 9,726
2008
Intel Core i7 920
(Quad)
82,300
2011
Intel Core i7 2600K
(4/8) Sandy Bridge
128,300
2014
Intel Core i7 5960x
(8/16) Haswell
298,190
Raspberry Pi 2
(Quad)
1,186 MIPS!
http://en.wikipedia.org/wiki/Instructions_per_second
DDRSSD
PCIe - 3
100 GbE
…
OmniPath
1 thread of awesome
>
128 cores of so-so
http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html
http://blog.acolyer.org/2015/06/05/scalability-but-at-what-cost/
Misconception
Data needs to come to rest to
be processed
Data at rest is a liability
Data at rest is a liability
MDMILM
WarehouseETL
Why would you want transient
data at rest?
But what about sort?
…REALLY?!
You can’t escape the Math
… and you don’t want to
The Math will guide you to a solution
"AmdahlsLaw" by Daniels220 at English Wikipedia - Own work based on: File:AmdahlsLaw.png. Licensed under CC BY-SA 3.0 via Wikimedia Commons
Setup & Scheduling
Work Unit Work UnitWork UnitWork Unit
Post Processing
Setup & Scheduling
Work Unit Work UnitWork UnitWork Unit
Post Processing
Contention
Contention
Contention isn’t the biggest enemy
Coherence is!
Universal Scalability Law
0
2
4
6
8
10
12
14
16
18
20
1 2 4 8 16 32 64 128 256 512 1024
Speedup
Processors
Amdahl USL
Setup & Scheduling
Work Unit Work UnitWork UnitWork Unit
Post Processing
Contention
Contention
Contention +
Coherence
Contention +
Coherence
Up Front Partitioning
Work Unit Work UnitWork UnitWork Unit
… and more
Queuing Theory ⭐
Complexity Theory
CAP Theorem
Technologies & Techniques
The Essence of Architecture
Data Structures
Protocols of Interaction
Mechanical Sympathy
Understanding is essential
Accumulated
Improvement
Time
Network
Bandwidth
Response Time
Storage
Capacity
CPU Cores
Memory
Capacity
Accumulated
Improvement
Time
Network
Bandwidth
Response Time
Storage
Capacity
CPU Cores
Memory
Capacity
Batching…
Technique
Smart Batching (Natural Batching)
http://mechanical-sympathy.blogspot.com/2011/10/smart-batching.html
Resource
Resource
Ring Buffer
Batching Thread
Resource
Pull off as much waiting
data as possible
Single Writer Principle
Avoid Resource Contention
Batching only when needed
Rate Decoupling
Back Pressure
Techniques
Freedom!
Lock-Free, Wait-Free
http://en.wikipedia.org/wiki/Non-blocking_algorithm
Words Matter
Obstruction-Freedom
Partially completed operations
aborted & changes made rolled back
Lock-Freedom
Individual thread may starve, but
guaranteed system-wide throughput
Lock-Free is Obstruction-Free
Wait-Freedom
Starvation free and guaranteed
system-wide throughput
Wait-Free is Lock-Free
These properties are
awesome!
Who wouldn’t want them?
System-wide properties start
at the lowest level
Essence
Just because we could take an
action right now, doesn’t mean
we should
Technology
CRDTs
http://en.wikipedia.org/wiki/Conflict-free_replicated_data_type
0
0
0
0
Node
0
1
2
N
Value
sum(0,N) = 0
…
0
1
0
0
Node
0
1
2
N
Value
sum(0,N) = 1
…
1
1
0
0
Node
0
1
2
N
Value
sum(0,N) = 2
…
Gossip for visibility
[2] = 0 [1] = 2 [0] = 4[N] = 0
4 2 0 0…
4200
Shared View
Technology
Append-only Data Structures
https://github.com/real-logic/Aeron
Header
Message
Log
Header
Message
Header
Message
Header
Message
Log
Efficiently
Replicating an Append-only Log
What If…?
The Data Structure could be
directly sent to the “network”?
Header
Message
Header
Message
Position in Log
Length
Header
Message
Position in Log
Length
Version/Flags
Type
etc.
+
Header
Message
Fragment 0
Header
Message
Header
Message
Fragment 0
Header
Message
Header
Message
Header
Message
Header
Message
Fragment 0
Header
Message
Header
Message
Header
Message
Header
Message
Fragment 0
Fragment 1
Header
Message
Header
Message
Header
Message
Header
Message
Header
Message
Header
Message
Fragment 0
Fragment 1
Natural for broadcast replication
In Closing…
A flood of data is coming,
Many say it is already here,
How will you deal with it?
Ever seen a PitBull drink from a
Fire Hydrant?
It won’t give up…
Be the Pitbull at the Fire Hydrant!
@toddlmontgomery
Questions?
• Aeron https://github.com/real-logic/Aeron
• SlideShare http://www.slideshare.com/toddleemontgomery
• Twitter @toddlmontgomery
Thank You!
Watch the video with slide synchronization on
InfoQ.com!
http://www.infoq.com/presentations/myths-
data-streaming

More Related Content

Viewers also liked

Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Spark Summit
 

Viewers also liked (16)

Data Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operationsData Streaming (in a Nutshell) ... and Spark's window operations
Data Streaming (in a Nutshell) ... and Spark's window operations
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Using Windows Azure for Solving Identity Management Challenges (Visual Studio...
Using Windows Azure for Solving Identity Management Challenges (Visual Studio...Using Windows Azure for Solving Identity Management Challenges (Visual Studio...
Using Windows Azure for Solving Identity Management Challenges (Visual Studio...
 
An Introduction to Distributed Data Streaming
An Introduction to Distributed Data StreamingAn Introduction to Distributed Data Streaming
An Introduction to Distributed Data Streaming
 
Data Streaming Technology Overview
Data Streaming Technology OverviewData Streaming Technology Overview
Data Streaming Technology Overview
 
Advanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming DataAdvanced Threat Detection on Streaming Data
Advanced Threat Detection on Streaming Data
 
Streaming Data Analysis and Online Learning by John Myles White
Streaming Data Analysis and Online Learning by John Myles White Streaming Data Analysis and Online Learning by John Myles White
Streaming Data Analysis and Online Learning by John Myles White
 
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
Interactive Visualization of Streaming Data Powered by Spark by Ruhollah Farc...
 
The Art of Scalability - Managing growth
The Art of Scalability - Managing growthThe Art of Scalability - Managing growth
The Art of Scalability - Managing growth
 
Streaming Data in R
Streaming Data in RStreaming Data in R
Streaming Data in R
 
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
Case Studies on Big-Data Processing and Streaming - Iranian Java User GroupCase Studies on Big-Data Processing and Streaming - Iranian Java User Group
Case Studies on Big-Data Processing and Streaming - Iranian Java User Group
 
Real-Time Streaming Data on AWS
Real-Time Streaming Data on AWSReal-Time Streaming Data on AWS
Real-Time Streaming Data on AWS
 
Detecting Anomalies in Streaming Data
Detecting Anomalies in Streaming DataDetecting Anomalies in Streaming Data
Detecting Anomalies in Streaming Data
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
High Performance Processing of Streaming Data
High Performance Processing of Streaming DataHigh Performance Processing of Streaming Data
High Performance Processing of Streaming Data
 
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
Big Data Streaming processing using Apache Storm - FOSSCOMM 2016
 

More from C4Media

More from C4Media (20)

Streaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live VideoStreaming a Million Likes/Second: Real-Time Interactions on Live Video
Streaming a Million Likes/Second: Real-Time Interactions on Live Video
 
Next Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy MobileNext Generation Client APIs in Envoy Mobile
Next Generation Client APIs in Envoy Mobile
 
Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020Software Teams and Teamwork Trends Report Q1 2020
Software Teams and Teamwork Trends Report Q1 2020
 
Understand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java ApplicationsUnderstand the Trade-offs Using Compilers for Java Applications
Understand the Trade-offs Using Compilers for Java Applications
 
Kafka Needs No Keeper
Kafka Needs No KeeperKafka Needs No Keeper
Kafka Needs No Keeper
 
High Performing Teams Act Like Owners
High Performing Teams Act Like OwnersHigh Performing Teams Act Like Owners
High Performing Teams Act Like Owners
 
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to JavaDoes Java Need Inline Types? What Project Valhalla Can Bring to Java
Does Java Need Inline Types? What Project Valhalla Can Bring to Java
 
Service Meshes- The Ultimate Guide
Service Meshes- The Ultimate GuideService Meshes- The Ultimate Guide
Service Meshes- The Ultimate Guide
 
Shifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CDShifting Left with Cloud Native CI/CD
Shifting Left with Cloud Native CI/CD
 
CI/CD for Machine Learning
CI/CD for Machine LearningCI/CD for Machine Learning
CI/CD for Machine Learning
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 
Architectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep SystemsArchitectures That Scale Deep - Regaining Control in Deep Systems
Architectures That Scale Deep - Regaining Control in Deep Systems
 
ML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.jsML in the Browser: Interactive Experiences with Tensorflow.js
ML in the Browser: Interactive Experiences with Tensorflow.js
 
Build Your Own WebAssembly Compiler
Build Your Own WebAssembly CompilerBuild Your Own WebAssembly Compiler
Build Your Own WebAssembly Compiler
 
User & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix ScaleUser & Device Identity for Microservices @ Netflix Scale
User & Device Identity for Microservices @ Netflix Scale
 
Scaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's EdgeScaling Patterns for Netflix's Edge
Scaling Patterns for Netflix's Edge
 
Make Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home EverywhereMake Your Electron App Feel at Home Everywhere
Make Your Electron App Feel at Home Everywhere
 
The Talk You've Been Await-ing For
The Talk You've Been Await-ing ForThe Talk You've Been Await-ing For
The Talk You've Been Await-ing For
 
Future of Data Engineering
Future of Data EngineeringFuture of Data Engineering
Future of Data Engineering
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and MoreAutomated Testing for Terraform, Docker, Packer, Kubernetes, and More
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
UXDXConf
 

Recently uploaded (20)

Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
Optimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through ObservabilityOptimizing NoSQL Performance Through Observability
Optimizing NoSQL Performance Through Observability
 
The Metaverse: Are We There Yet?
The  Metaverse:    Are   We  There  Yet?The  Metaverse:    Are   We  There  Yet?
The Metaverse: Are We There Yet?
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024IoT Analytics Company Presentation May 2024
IoT Analytics Company Presentation May 2024
 
Strategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering TeamsStrategic AI Integration in Engineering Teams
Strategic AI Integration in Engineering Teams
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
IESVE for Early Stage Design and Planning
IESVE for Early Stage Design and PlanningIESVE for Early Stage Design and Planning
IESVE for Early Stage Design and Planning
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
Measures in SQL (a talk at SF Distributed Systems meetup, 2024-05-22)
 
AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101AI presentation and introduction - Retrieval Augmented Generation RAG 101
AI presentation and introduction - Retrieval Augmented Generation RAG 101
 
ECS 2024 Teams Premium - Pretty Secure
ECS 2024   Teams Premium - Pretty SecureECS 2024   Teams Premium - Pretty Secure
ECS 2024 Teams Premium - Pretty Secure
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024Top 10 Symfony Development Companies 2024
Top 10 Symfony Development Companies 2024
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
Structuring Teams and Portfolios for Success
Structuring Teams and Portfolios for SuccessStructuring Teams and Portfolios for Success
Structuring Teams and Portfolios for Success
 

Modern Fast Streaming Data