SlideShare a Scribd company logo
1 of 36
Download to read offline
The History and Future of the Blurring of
Stream Processing & OLTP
John Hugg
June 12th, 2017
@johnhugg
jhugg@voltdb.com
Who Am I?
• Engineer #1 at
• Responsible for many poor
decisions and even a few
good ones.
• jhugg@voltdb.com
• @johnhugg
• http://chat.voltdb.com
Beatlejuice App
Minimum Viable Product
• Alexa/Siri-ish device in your house
• Sends discernible words to your servers
• If it hears “beetlejuice” three times, play sound clips from the movie
• Because everyone hates “Word Count”
“Just Use Postgres"
Dumbish Client
Running in IaaS or Datacenter
Audio
Play Sound
Rich Client
Logic
Speech to text
Etc…
Add record
Count records
Record Action
& Reset Count
“Just Use Postgres"
CREATE TABLE BEETLECOUNT (
user_id BIGINT NOT NULL,
spoken_count INTEGER NOT NULL,
PRIMARY KEY(user_id)
);
Storm?
Client Running in IaaS or Datacenter
?
Why Storm?
• Would I use Storm for pretty much anything now? No.
• It’s decently exemplar, well known, and not fancy.
• But there are soooo many other choices.
Storm
Client
Audio
Instructions to play soundSometimes…
Counts for a
partition of users
User
Code
Counts for a
partition of users
User
Code
Counts for a
partition of users
User
Code
Failure
Postgres Failure
Audio
Play Sound
• Postgres software or hardware stops
Postgres Failure
Audio
Play Sound
• Postgres software or hardware stops
• Network fails
Storm
Client
Audio
Instructions to play soundSometimes…
Counts for a
partition of users
User
Code
Counts for a
partition of users
User
Code
Counts for a
partition of users
User
Code
Side Effects Ruin Everything
• Playing a sound on the speaker is not a transaction you can roll
back. It’s an external action — A side effect.
• Good examples are SMS or a REST-API call.
Use Idempotence with At-Least-Once
delivery for Exactly Once Semantics
Does Kafka Fix Failure?
All better?
Audio
Play
Sound
Rich Client
Logic
• More robust to Postgres failure in some ways
• Side effects can’t be helped much
• Latency hit
• But many good reasons to do this (coming up)
Commonalities: Stream & DB
• In order to ensure delivery, the original source has
to be prepared to resend an event until
acknowledged.
• Idempotency is the key to exactly-once-semantics.
• It’s impossible to guarantee a side effect happens
exactly once.

So we do our best.
• Systems that are flexible/safe/accurate are really
really hard.
What makes a “hard” app?
• Scale / Complexity
• Velocity of requirements changes
• Precision 

“exactly three times”

Chaining precise conditions and actions

Non-commutative math
• Side Effects
• Partial Control
Stream vs Database
2017 Edition
In the tradition of: Tabs vs Spaces and Emacs vs Vim
Javascript Floating Point vs Anything Resembling Sanityand comes:
Why Stream / Log?
• Can easily Tee (split into two identical streams) to prod and test, or to A/B test.
• Often allows for simpler clients, as richer processing is moved into the
streaming system.
• Often easier to understand performance characteristics, especially under load.
• By replaying a logged stream, can often roll-back/forward to any recent state.

Need truncating snapshots for this.
• Can sometimes make multi-DC easier and more consistent.
• Horizontal scalability and fault tolerance often easier.
Why Database
• Truncating a stream requires compaction or a snapshot. This is hard
to get right for many apps.
• You can query it!
• Secondary indexes, Materialized views, Constraints, Joins, FKs
• The tooling is really mature.
• Typically more appropriate for apps that require lower latency.
Use A Streaming System with a DB?
• Sometimes you can get a lot of the benefits of both
• More integration points and more ways to fail.
BUT IT’S OK TO WANT IT ALL
You promised blurring…
Add Databaseness to Streaming
Counts for a partition of users
User Code
Probably terrible user glue code
Storm punts on this
User Code
Counts for users
TABULAR!
Add Databaseness to Streaming
A library that handles much of the hard
interactions between the stream (log)
and tabular state
Client Logic Kafka Log
Kafka Log
Downstream
Consumer
More Kafka Logs?
Downstream
Consumer
• Other systems have ways to integrate state with streams, but none
seem as ambitious as Kafka Streams
Make a database
that smells streamy
Put Processing in the DB
Rich Client
Logic
Many
RPC Calls
User logic in
stored
procedures
Simpler Client
Logic
Fewer
RPC Calls
Beyond Postgres:
• Better tools for managing user code.
• Better tools for debugging.
• Better monitoring and transparency
for user code running in the DB.
Give the DB a A-Priori-Log
WAL
Client Logic
RPC CallsClient Logic
A-Priori Logical Log
RPC Responses
RPC Calls
Could do this as
But an integrated product
has many advantages.
+ +
Give the DB an A-Posteriori Log
RPC CallsClient Logic
A-Priori Logical Log
RPC Responses
A-Posteriori Log
Downstream
Consumer
Emited Events
RPC CallsClient Logic
A-Priori Logical Log
Consume Event
A-Posteriori Log
Emited Events
Horizontally Partition the DB
Client 1
A-Priori Log A-Posteriori Log
A-Priori Log A-Posteriori Log
A-Priori Log A-Posteriori Log
Client 2
Consumer 1
Consumer 2
Client 3
Client 4
Throw out but keep
• Needs to present as a
single, managed entity
• Global stats
• Global reads without extra
work
• SQL, JDBC, ODBC, etc…
• Global writes? Maybe…
Throw out but keep
Because Latency
Where is this going?
We’re building a lot of this at
• Horizonitally partitioned, but acts as a single system
• Per partition ordered input and ordered output

All based on an a-priori logical log
• Debuggable Java stored procedures that can use 3rd party libs
• Ability to emit events into an a-posteriori log
• Native support for secondary indexes, ranking, materialized views,
transactions, cross-partition operations, JDBC/ODBC, etc…
Conclusion
• Note: VoltDB is a not as general as we
would like yet.
• The future of operations and OLTP is
going to be a mix of streams, logs and
state.

We are getting good at these things
individually.
• Let’s build systems that tackle integration
and the in-between problems.

There’s an opportunity here.
chat.voltdb.com
jhugg
@voltdb.com
@johnhugg
@voltdb
Thank You!
• Please ask me questions now or later.
• Feedback on what was interesting,
helpful, confusing, boring is ALWAYS
welcome.
• Happy to talk about:

Data management

Systems software dev

Distributed systems

Japanese preschools
BS
Stuff I Don't Know
Stuff I Know
T H I S TA L K

More Related Content

Recently uploaded

Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
mbmh111980
 

Recently uploaded (20)

AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdfImplementing KPIs and Right Metrics for Agile Delivery Teams.pdf
Implementing KPIs and Right Metrics for Agile Delivery Teams.pdf
 
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdfMastering Windows 7 A Comprehensive Guide for Power Users .pdf
Mastering Windows 7 A Comprehensive Guide for Power Users .pdf
 
The Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion ProductionThe Impact of PLM Software on Fashion Production
The Impact of PLM Software on Fashion Production
 
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
COMPUTER AND ITS COMPONENTS PPT.by naitik sharma Class 9th A mittal internati...
 
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
KLARNA -  Language Models and Knowledge Graphs: A Systems ApproachKLARNA -  Language Models and Knowledge Graphs: A Systems Approach
KLARNA - Language Models and Knowledge Graphs: A Systems Approach
 
10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf10 Essential Software Testing Tools You Need to Know About.pdf
10 Essential Software Testing Tools You Need to Know About.pdf
 
AI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in MichelangeloAI/ML Infra Meetup | ML explainability in Michelangelo
AI/ML Infra Meetup | ML explainability in Michelangelo
 
how-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdfhow-to-download-files-safely-from-the-internet.pdf
how-to-download-files-safely-from-the-internet.pdf
 
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdfStrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
StrimziCon 2024 - Transition to Apache Kafka on Kubernetes with Strimzi.pdf
 
IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024IT Software Development Resume, Vaibhav jha 2024
IT Software Development Resume, Vaibhav jha 2024
 
CompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdfCompTIA Security+ (Study Notes) for cs.pdf
CompTIA Security+ (Study Notes) for cs.pdf
 
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product UpdatesGraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
GraphSummit Stockholm - Neo4j - Knowledge Graphs and Product Updates
 
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdfA Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
A Comprehensive Appium Guide for Hybrid App Automation Testing.pdf
 
Workforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdfWorkforce Efficiency with Employee Time Tracking Software.pdf
Workforce Efficiency with Employee Time Tracking Software.pdf
 
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
Tree in the Forest - Managing Details in BDD Scenarios (live2test 2024)
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
How to pick right visual testing tool.pdf
How to pick right visual testing tool.pdfHow to pick right visual testing tool.pdf
How to pick right visual testing tool.pdf
 
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdfMicrosoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
Microsoft 365 Copilot; An AI tool changing the world of work _PDF.pdf
 
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAGAI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
AI/ML Infra Meetup | Reducing Prefill for LLM Serving in RAG
 

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 

The History and Future of the Blurring of Stream Processing & OLTP

  • 1. The History and Future of the Blurring of Stream Processing & OLTP John Hugg June 12th, 2017 @johnhugg jhugg@voltdb.com
  • 2. Who Am I? • Engineer #1 at • Responsible for many poor decisions and even a few good ones. • jhugg@voltdb.com • @johnhugg • http://chat.voltdb.com
  • 4. Minimum Viable Product • Alexa/Siri-ish device in your house • Sends discernible words to your servers • If it hears “beetlejuice” three times, play sound clips from the movie • Because everyone hates “Word Count”
  • 5. “Just Use Postgres" Dumbish Client Running in IaaS or Datacenter Audio Play Sound Rich Client Logic Speech to text Etc… Add record Count records Record Action & Reset Count
  • 6. “Just Use Postgres" CREATE TABLE BEETLECOUNT ( user_id BIGINT NOT NULL, spoken_count INTEGER NOT NULL, PRIMARY KEY(user_id) );
  • 7. Storm? Client Running in IaaS or Datacenter ?
  • 8. Why Storm? • Would I use Storm for pretty much anything now? No. • It’s decently exemplar, well known, and not fancy. • But there are soooo many other choices.
  • 9. Storm Client Audio Instructions to play soundSometimes… Counts for a partition of users User Code Counts for a partition of users User Code Counts for a partition of users User Code
  • 11. Postgres Failure Audio Play Sound • Postgres software or hardware stops
  • 12. Postgres Failure Audio Play Sound • Postgres software or hardware stops • Network fails
  • 13. Storm Client Audio Instructions to play soundSometimes… Counts for a partition of users User Code Counts for a partition of users User Code Counts for a partition of users User Code
  • 14. Side Effects Ruin Everything • Playing a sound on the speaker is not a transaction you can roll back. It’s an external action — A side effect. • Good examples are SMS or a REST-API call. Use Idempotence with At-Least-Once delivery for Exactly Once Semantics
  • 15. Does Kafka Fix Failure?
  • 16. All better? Audio Play Sound Rich Client Logic • More robust to Postgres failure in some ways • Side effects can’t be helped much • Latency hit • But many good reasons to do this (coming up)
  • 17. Commonalities: Stream & DB • In order to ensure delivery, the original source has to be prepared to resend an event until acknowledged. • Idempotency is the key to exactly-once-semantics. • It’s impossible to guarantee a side effect happens exactly once.
 So we do our best. • Systems that are flexible/safe/accurate are really really hard.
  • 18. What makes a “hard” app? • Scale / Complexity • Velocity of requirements changes • Precision 
 “exactly three times”
 Chaining precise conditions and actions
 Non-commutative math • Side Effects • Partial Control
  • 19. Stream vs Database 2017 Edition In the tradition of: Tabs vs Spaces and Emacs vs Vim Javascript Floating Point vs Anything Resembling Sanityand comes:
  • 20. Why Stream / Log? • Can easily Tee (split into two identical streams) to prod and test, or to A/B test. • Often allows for simpler clients, as richer processing is moved into the streaming system. • Often easier to understand performance characteristics, especially under load. • By replaying a logged stream, can often roll-back/forward to any recent state.
 Need truncating snapshots for this. • Can sometimes make multi-DC easier and more consistent. • Horizontal scalability and fault tolerance often easier.
  • 21. Why Database • Truncating a stream requires compaction or a snapshot. This is hard to get right for many apps. • You can query it! • Secondary indexes, Materialized views, Constraints, Joins, FKs • The tooling is really mature. • Typically more appropriate for apps that require lower latency.
  • 22. Use A Streaming System with a DB? • Sometimes you can get a lot of the benefits of both • More integration points and more ways to fail. BUT IT’S OK TO WANT IT ALL
  • 24. Add Databaseness to Streaming Counts for a partition of users User Code Probably terrible user glue code Storm punts on this
  • 25. User Code Counts for users TABULAR! Add Databaseness to Streaming A library that handles much of the hard interactions between the stream (log) and tabular state Client Logic Kafka Log Kafka Log Downstream Consumer More Kafka Logs? Downstream Consumer • Other systems have ways to integrate state with streams, but none seem as ambitious as Kafka Streams
  • 26. Make a database that smells streamy
  • 27. Put Processing in the DB Rich Client Logic Many RPC Calls User logic in stored procedures Simpler Client Logic Fewer RPC Calls Beyond Postgres: • Better tools for managing user code. • Better tools for debugging. • Better monitoring and transparency for user code running in the DB.
  • 28. Give the DB a A-Priori-Log WAL Client Logic RPC CallsClient Logic A-Priori Logical Log RPC Responses RPC Calls Could do this as But an integrated product has many advantages. + +
  • 29. Give the DB an A-Posteriori Log RPC CallsClient Logic A-Priori Logical Log RPC Responses A-Posteriori Log Downstream Consumer Emited Events RPC CallsClient Logic A-Priori Logical Log Consume Event A-Posteriori Log Emited Events
  • 30. Horizontally Partition the DB Client 1 A-Priori Log A-Posteriori Log A-Priori Log A-Posteriori Log A-Priori Log A-Posteriori Log Client 2 Consumer 1 Consumer 2 Client 3 Client 4
  • 31. Throw out but keep • Needs to present as a single, managed entity • Global stats • Global reads without extra work • SQL, JDBC, ODBC, etc… • Global writes? Maybe…
  • 32. Throw out but keep Because Latency
  • 33. Where is this going?
  • 34. We’re building a lot of this at • Horizonitally partitioned, but acts as a single system • Per partition ordered input and ordered output
 All based on an a-priori logical log • Debuggable Java stored procedures that can use 3rd party libs • Ability to emit events into an a-posteriori log • Native support for secondary indexes, ranking, materialized views, transactions, cross-partition operations, JDBC/ODBC, etc…
  • 35. Conclusion • Note: VoltDB is a not as general as we would like yet. • The future of operations and OLTP is going to be a mix of streams, logs and state.
 We are getting good at these things individually. • Let’s build systems that tackle integration and the in-between problems.
 There’s an opportunity here.
  • 36. chat.voltdb.com jhugg @voltdb.com @johnhugg @voltdb Thank You! • Please ask me questions now or later. • Feedback on what was interesting, helpful, confusing, boring is ALWAYS welcome. • Happy to talk about:
 Data management
 Systems software dev
 Distributed systems
 Japanese preschools BS Stuff I Don't Know Stuff I Know T H I S TA L K