SlideShare a Scribd company logo
1 of 29
Download to read offline
1
Zen and the Art of Streaming Joins
Nick Dearden
2
33
What is Quality ?
What makes a thing or a person or
an idea good ?
‘Classic’ vs ‘Romantic’
44
Why Join ?
5
Join Types
6
Stream/Table Duality
7
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
TABLE STREAM TABLE
(“alice”, 1)
(“charlie”, 1)
(“alice”, 2)
(“bob”, 1)
alice 1
alice 1
charlie 1
alice 2
charlie 1
alice 2
charlie 1
bob 1
8
Streams & Tables
● STREAM and TABLE as first-class citizens
● Interpretations of topic content
● STREAM - data in motion
● TABLE - collected state of a stream
• One record per key (per window)
• Current values (compacted topic)
• Changelog
● STREAM – TABLE Joins
9
Type INNER LEFT OUTER FULL OUTER
Stream-Stream Windowed
Table-Table Non-windowed
Stream-Table Non-windowed
10
Do you think that’s a table you are querying ?
11
Pre-requisites
● Equi-Join on message keys (only)
● Co-partitioning
12
Why ?
● Relate 2 streams of ongoing facts or events
● Ad impressions -> ad clicks
● Orders -> ShipmentsStream-Stream
13
How ?
● Equi-join on the key from each side
● Co-partitioning
● Stream – stream joins are time-windowed
● Use asymmetric windowing to indicate
happens-before or happens-after
● Each input record triggers an output for
every match from the other side
● Input records with NULL key or value are
ignored and don’t trigger an output
Stream-Stream
14
15
CREATE STREAM orders_shipped_within_hour AS
SELECT o.order_id, o.item_id FROM orders o
LEFT JOIN shipments s WITHIN 1 HOUR
ON o.order_id = s.order_id;
16
Stream-Stream Join - What
Time
Left
Stream
Right
Stream
INNER JOIN LEFT JOIN OUTER JOIN
1 null
2 A [A, null] [A, null]
3 a [A, a] [A, a] [A, a]
4 B [B, a] [B, a] [B, a]
5 b [A, b], [B, b] [A, b], [B, b] [A, b], [B, b]
6 null
7 C [C, a], [C, b] [C, a], [C, b] [C, a], [C, b]
8 c [A,c],[B,c],[C,c] [A,c],[B,c],[C,c] [A,c],[B,c],[C,c]
Condition: all incoming records have the same key
Condition: all incoming records arrive within the join window
17
Why ?
● Relate 2 evolving sets of state
● Hotel rooms -> Room Rates
Table-Table
18
How ?
● Equi-join on the key from each side
● Co-partitioning
● Table–Table joins are NOT time-windowed
● Each input record triggers an output for at
most one match from the other side
● Input records with NULL key are ignored
and don’t trigger an output
● Input records with NULL value are tombstones
and don’t trigger an output
Table-Table
19
20
CREATE TABLE priced_rooms AS
SELECT h.hotel_id, r.room_rate FROM hotels h
JOIN rates r ON h.hotel_id = r.hotel_id;
21
Table-Table Join - What
Time
Left
Table
Right
Table
INNER JOIN LEFT JOIN OUTER JOIN
1
null
2 A [A, null] [A, null]
3 a [A, a] [A, a] [A, a]
4 B [B, a] [B, a] [B, a]
5 b [B, b] [B, b] [B, b]
6 null null null [null, b]
7 null null
8 C [C, null] [C, null]
Condition: all incoming records have the same key
9 c [C, c] [C, c] [C, c]
22
Why ?
● Lookup / enrichment
● Clickstream-with-user-id -> user-details
Stream-Table
23
How ?
● Equi-join on the key from each side
● Co-partitioning
● Stream–Table joins are NOT time-windowed
● Each input record from stream side triggers
at most one output
● Input stream records with NULL key or value
are ignored and don’t trigger an output
● Input table records with NULL value are
tombstones and don’t trigger an output
Stream-Table
24
25
CREATE STREAM vip_actions AS
SELECT userid, page, action FROM clickstream c
LEFT JOIN users u ON c.userid = u.user_id
WHERE u.level = 'Platinum';
26
Stream-Table Join - What
Time
Left
Stream
Right
Table
INNER JOIN LEFT JOIN
1 null
2 A [A, null]
3 a
4 B [B, a] [B, a]
5 b
6 null
7 null
8 C [C, null]
Condition: all incoming records have the same key
27
Stream-Table Join - What
Condition: all incoming records have the same key
28
29

More Related Content

More from confluent

More from confluent (20)

Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 

Recently uploaded

Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Using IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & IrelandUsing IESVE for Room Loads Analysis - UK & Ireland
Using IESVE for Room Loads Analysis - UK & Ireland
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024Long journey of Ruby Standard library at RubyKaigi 2024
Long journey of Ruby Standard library at RubyKaigi 2024
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 

Zen and the Art of Streaming Joins—The What, When and Why

  • 1. 1 Zen and the Art of Streaming Joins Nick Dearden
  • 2. 2
  • 3. 33 What is Quality ? What makes a thing or a person or an idea good ? ‘Classic’ vs ‘Romantic’
  • 7. 7 alice 1 alice 1 charlie 1 alice 2 charlie 1 alice 2 charlie 1 bob 1 TABLE STREAM TABLE (“alice”, 1) (“charlie”, 1) (“alice”, 2) (“bob”, 1) alice 1 alice 1 charlie 1 alice 2 charlie 1 alice 2 charlie 1 bob 1
  • 8. 8 Streams & Tables ● STREAM and TABLE as first-class citizens ● Interpretations of topic content ● STREAM - data in motion ● TABLE - collected state of a stream • One record per key (per window) • Current values (compacted topic) • Changelog ● STREAM – TABLE Joins
  • 9. 9 Type INNER LEFT OUTER FULL OUTER Stream-Stream Windowed Table-Table Non-windowed Stream-Table Non-windowed
  • 10. 10 Do you think that’s a table you are querying ?
  • 11. 11 Pre-requisites ● Equi-Join on message keys (only) ● Co-partitioning
  • 12. 12 Why ? ● Relate 2 streams of ongoing facts or events ● Ad impressions -> ad clicks ● Orders -> ShipmentsStream-Stream
  • 13. 13 How ? ● Equi-join on the key from each side ● Co-partitioning ● Stream – stream joins are time-windowed ● Use asymmetric windowing to indicate happens-before or happens-after ● Each input record triggers an output for every match from the other side ● Input records with NULL key or value are ignored and don’t trigger an output Stream-Stream
  • 14. 14
  • 15. 15 CREATE STREAM orders_shipped_within_hour AS SELECT o.order_id, o.item_id FROM orders o LEFT JOIN shipments s WITHIN 1 HOUR ON o.order_id = s.order_id;
  • 16. 16 Stream-Stream Join - What Time Left Stream Right Stream INNER JOIN LEFT JOIN OUTER JOIN 1 null 2 A [A, null] [A, null] 3 a [A, a] [A, a] [A, a] 4 B [B, a] [B, a] [B, a] 5 b [A, b], [B, b] [A, b], [B, b] [A, b], [B, b] 6 null 7 C [C, a], [C, b] [C, a], [C, b] [C, a], [C, b] 8 c [A,c],[B,c],[C,c] [A,c],[B,c],[C,c] [A,c],[B,c],[C,c] Condition: all incoming records have the same key Condition: all incoming records arrive within the join window
  • 17. 17 Why ? ● Relate 2 evolving sets of state ● Hotel rooms -> Room Rates Table-Table
  • 18. 18 How ? ● Equi-join on the key from each side ● Co-partitioning ● Table–Table joins are NOT time-windowed ● Each input record triggers an output for at most one match from the other side ● Input records with NULL key are ignored and don’t trigger an output ● Input records with NULL value are tombstones and don’t trigger an output Table-Table
  • 19. 19
  • 20. 20 CREATE TABLE priced_rooms AS SELECT h.hotel_id, r.room_rate FROM hotels h JOIN rates r ON h.hotel_id = r.hotel_id;
  • 21. 21 Table-Table Join - What Time Left Table Right Table INNER JOIN LEFT JOIN OUTER JOIN 1 null 2 A [A, null] [A, null] 3 a [A, a] [A, a] [A, a] 4 B [B, a] [B, a] [B, a] 5 b [B, b] [B, b] [B, b] 6 null null null [null, b] 7 null null 8 C [C, null] [C, null] Condition: all incoming records have the same key 9 c [C, c] [C, c] [C, c]
  • 22. 22 Why ? ● Lookup / enrichment ● Clickstream-with-user-id -> user-details Stream-Table
  • 23. 23 How ? ● Equi-join on the key from each side ● Co-partitioning ● Stream–Table joins are NOT time-windowed ● Each input record from stream side triggers at most one output ● Input stream records with NULL key or value are ignored and don’t trigger an output ● Input table records with NULL value are tombstones and don’t trigger an output Stream-Table
  • 24. 24
  • 25. 25 CREATE STREAM vip_actions AS SELECT userid, page, action FROM clickstream c LEFT JOIN users u ON c.userid = u.user_id WHERE u.level = 'Platinum';
  • 26. 26 Stream-Table Join - What Time Left Stream Right Table INNER JOIN LEFT JOIN 1 null 2 A [A, null] 3 a 4 B [B, a] [B, a] 5 b 6 null 7 null 8 C [C, null] Condition: all incoming records have the same key
  • 27. 27 Stream-Table Join - What Condition: all incoming records have the same key
  • 28. 28
  • 29. 29