SlideShare a Scribd company logo
1 of 21
Follow the
Kafka
Streams
Kafka
HelloWorld!
○ Mario Molina
○ Big Data Engineer @ Datio
○ Working in data & all things related since 2005.
○ You can find me at:
mmolimar_
mmolimar
mmolimar
A distributed streaming platform
○ Distributed and ordered commit log.
○ Pull-based publish/subscribe messaging with message
retention on disk.
○ Fault-tolerant, arbitrary scalability.
○ Isolated topics and partitions per consumer group.
○ Binary TCP-based communication protocol.
○ Actively developed.
○ Great stability. It’s used by industry-leading companies.
○ Excellent APIs (JVM languages mainly).
○ Optimized for read in the same order as write was done.
○ Optimized for massive writes.
The ecosystem
App
App
Producers
App
App
App
App
Consumers
App
App
Sources
Connectors
Sinks
App
Streams
App
Kafka APIs
○ Producer API.
○ Consumer API.
○ Connect API: sources and sinks.
○ Streams API.
Streams API
○ Library (Java & Scala) for stream processing (one-record-at-a-time).
○ Lightweight with a low barrier entry.
○ High-level DSL & low-level Processor API.
○ Semantics: at-least-once & exactly-once.
○ Fault tolerance.
○ Scalable & recoverable (improved with KIP-429 and KIP-441).
○ No external dependencies.
Key concepts in Streams API
○ The processor topology: represented by a directed
acyclic graph (DAG).
○ Sort of nodes in the processor topology:
○ Source processor.
○ Stream processor.
○ Sink processor.
○ State stores.
○ Sub-topologies.
○ Abstractions: KStream, KTable and GlobalKTable.
Processor
Processor
Processor*
Sink
Topology
Processor*
Sink
sub-topology sub-topology
state
store
Source
KTable
○ Partitioned table.
○ Each record represents the
latest state/value of its key.
○ “UPSERT” mode (from the
SQL perspective).
Abstractions
KStream
○ Partitioned record stream.
○ Immutable data (append only).
○ “INSERT” mode (from the SQL
perspective).
GlobalKTable
○ Not partitioned.
○ Same as a KTable but with
data from all partitions.
○ Just for the DSL.
k1 -> A
k1 -> A
T0 T1 T2 T3
KStream
KTable
k2 -> B
k1 -> A
k2 -> B
k1 -> C
k1 -> C
k2 -> B
k2 -> D
k1 -> C
k2 -> D
stream-table
duality
Terminal (stateless)
○ print.
○ foreach.
○ to.
Types of operations (DSL)
Stateless
○ filter / filterNot.
○ mapValues.
○ flatMapValues.
○ branch.
○ toStream.
○ map(*).
○ flatMap(*)
○ selectKey(*)
○ groupByKey.
○ groupBy.
○ ...
Stateful
○ aggregate.
○ joins (inner, left, outer).
○ count.
○ reduce.
○ windowed ops.
Parallelism
tasktask
Thread
Consumer
Producer
task
Thread
Consumer
Producer
App AppSample 1
AppSample 2
The typical WordCount
Topologies:
Sub-topology: 0
Source: KSTREAM-SOURCE-0000000000 (topics: [TextLinesTopic])
--> KSTREAM-FLATMAPVALUES-0000000001
Processor: KSTREAM-FLATMAPVALUES-0000000001 (stores: [])
--> KSTREAM-KEY-SELECT-0000000002
<-- KSTREAM-SOURCE-0000000000
Processor: KSTREAM-KEY-SELECT-0000000002 (stores: [])
--> counts-store-repartition-filter
<-- KSTREAM-FLATMAPVALUES-0000000001
Processor: counts-store-repartition-filter (stores: [])
--> counts-store-repartition-sink
<-- KSTREAM-KEY-SELECT-0000000002
Sink: counts-store-repartition-sink (topic: counts-store-repartition)
<-- counts-store-repartition-filter
Sub-topology: 1
Source: counts-store-repartition-source (topics: [counts-store-repartition])
--> KSTREAM-AGGREGATE-0000000003
Processor: KSTREAM-AGGREGATE-0000000003 (stores: [counts-store])
--> KTABLE-MAPVALUES-0000000008
<-- counts-store-repartition-source
Processor: KTABLE-MAPVALUES-0000000008 (stores: [])
--> KTABLE-TOSTREAM-0000000009
<-- KSTREAM-AGGREGATE-0000000003
Processor: KTABLE-TOSTREAM-0000000009 (stores: [])
--> KSTREAM-SINK-0000000010
<-- KTABLE-MAPVALUES-0000000008
Sink: KSTREAM-SINK-0000000010 (topic: WordsWithCountsTopic)
<-- KTABLE-TOSTREAM-0000000009
Physical plan
Other interesting features
○ Windowing.
○ Interactive queries.
○ Topology optimization.
What if I don’t use it?
○ It’s OK if you just need to move data from one place to another.
○ But if you need to process/enrich or do other things with the data:
○ Code your specific use case using the producer and consumer
APIs.
○ Integrate another processing framework (ie: Spark, Flink...).
Demoooooo!
Demo - Product purchases
KafkaConnect
voluble
kukulcan
○ A REPL for Apache Kafka.
○ Support POSIX and Windows OS.
○ Written in Scala, Java and Python.
○ Shells in:
○ Ammonite REPL.
○ Scala REPL.
○ JShell.
○ Python shell.
○ APIs for Admin, Producer, Consumer, Connect
and Streams.
kukulcan
https://github.com/mmolimar/kukulcan
○ Intelligent data generator.
○ Source code:
○ https://github.com/MichaelDrogalis/voluble
○ Confluent Hub:
○ https://www.confluent.io/hub/mdrogalis/voluble
voluble
○ Scripts to run the demo in Kukulcan.
○ Source code:
○ https://github.com/mmolimar/meetups
○ Documentation:
○ https://github.com/mmolimar/meetups/tree/master/kafka-streams
Ammonite scripts
Getting involved with Apache Kafka
○ Website: http://kafka.apache.org
○ Join the mailing lists:
○ users@kafka.apache.org
○ dev@kafka.apache.org
○ Slack: https://confluentcommunity.slack.com
○ Meetups: https://www.meetup.com/<LOCATION>-Kafka
○ Contribute: https://github.com/apache/kafka
○ Kafka Summit 2020: https://kafka-summit.org
Thanks!
mmolimar
mmolimar
mmolimar_

More Related Content

What's hot

Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
confluent
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
HostedbyConfluent
 

What's hot (20)

Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
 
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
Enabling Data Scientists to easily create and own Kafka Consumers | Stefan Kr...
 
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
Cross the streams thanks to Kafka and Flink (Christophe Philemotte, Digazu) K...
 
Performance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State StoresPerformance Tuning RocksDB for Kafka Streams’ State Stores
Performance Tuning RocksDB for Kafka Streams’ State Stores
 
From Zero to Hero with Kafka Connect
From Zero to Hero with Kafka ConnectFrom Zero to Hero with Kafka Connect
From Zero to Hero with Kafka Connect
 
ksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database SystemksqlDB: A Stream-Relational Database System
ksqlDB: A Stream-Relational Database System
 
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
Utilizing Kafka Connect to Integrate Classic Monoliths into Modern Microservi...
 
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
Securing Kafka At Zendesk (Joy Nag, Zendesk) Kafka Summit 2020
 
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
Improving Logging Ingestion Quality At Pinterest: Fighting Data Corruption An...
 
Kafka Summit SF 2017 - Database Streaming at WePay
Kafka Summit SF 2017 - Database Streaming at WePayKafka Summit SF 2017 - Database Streaming at WePay
Kafka Summit SF 2017 - Database Streaming at WePay
 
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming ApplicationsRunning Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
Running Kafka On Kubernetes With Strimzi For Real-Time Streaming Applications
 
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson LearnedApache Kafka from 0.7 to 1.0, History and Lesson Learned
Apache Kafka from 0.7 to 1.0, History and Lesson Learned
 
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
Consistency and Completeness: Rethinking Distributed Stream Processing in Apa...
 
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
Monitoring and Resiliency Testing our Apache Kafka Clusters at Goldman Sachs ...
 
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
Deep Dive Into Kafka Streams (and the Distributed Stream Processing Engine) (...
 
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
Welcome to Kafka; We’re Glad You’re Here (Dave Klein, Centene) Kafka Summit 2020
 
A Tour of Apache Kafka
A Tour of Apache KafkaA Tour of Apache Kafka
A Tour of Apache Kafka
 
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUpStrimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
Strimzi - Where Apache Kafka meets OpenShift - OpenShift Spain MeetUp
 
Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017Exactly-once Data Processing with Kafka Streams - July 27, 2017
Exactly-once Data Processing with Kafka Streams - July 27, 2017
 
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, AzulBetter Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
 

Similar to Follow the (Kafka) Streams

Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoAtom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Redis Labs
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
confluent
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
StreamNative
 

Similar to Follow the (Kafka) Streams (20)

Enabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka ConsumersEnabling Data Scientists to easily create and own Kafka Consumers
Enabling Data Scientists to easily create and own Kafka Consumers
 
Onnc intro
Onnc introOnnc intro
Onnc intro
 
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan PipemazoAtom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
Atom The Redis Streams-Powered Microservices SDK: Dan Pipemazo
 
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
Achieving a 50% Reduction in Cross-AZ Network Costs from Kafka (Uday Sagar Si...
 
Containerizing Distributed Pipes
Containerizing Distributed PipesContainerizing Distributed Pipes
Containerizing Distributed Pipes
 
DevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoMDevoxxFR 2016 - 3 degrees of MoM
DevoxxFR 2016 - 3 degrees of MoM
 
Data Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFixData Science in the Cloud @StitchFix
Data Science in the Cloud @StitchFix
 
C som-programmeringssprog-bt
C som-programmeringssprog-btC som-programmeringssprog-bt
C som-programmeringssprog-bt
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
Summer training vhdl
Summer training vhdlSummer training vhdl
Summer training vhdl
 
Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7Heterogeneous multiprocessing on androd and i.mx7
Heterogeneous multiprocessing on androd and i.mx7
 
Building Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editorBuilding Conclave: a decentralized, real-time collaborative text editor
Building Conclave: a decentralized, real-time collaborative text editor
 
Stream Application Development with Apache Kafka
Stream Application Development with Apache KafkaStream Application Development with Apache Kafka
Stream Application Development with Apache Kafka
 
Haskell Symposium 2010: An LLVM backend for GHC
Haskell Symposium 2010: An LLVM backend for GHCHaskell Symposium 2010: An LLVM backend for GHC
Haskell Symposium 2010: An LLVM backend for GHC
 
Closing Keynote
Closing KeynoteClosing Keynote
Closing Keynote
 
Circuit Simplifier
Circuit SimplifierCircuit Simplifier
Circuit Simplifier
 
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...The Evolution of Trillion-level Real-time Messaging System in BIGO  - Puslar ...
The Evolution of Trillion-level Real-time Messaging System in BIGO - Puslar ...
 
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
FOSDEM 2019: M3, Prometheus and Graphite with metrics and monitoring in an in...
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntax
 
OpenCL programming using Python syntax
OpenCL programming using Python syntax OpenCL programming using Python syntax
OpenCL programming using Python syntax
 

More from confluent

More from confluent (20)

Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Evolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI EraEvolving Data Governance for the Real-time Streaming and AI Era
Evolving Data Governance for the Real-time Streaming and AI Era
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 

Recently uploaded

Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
FIDO Alliance
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
FIDO Alliance
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
1111 ChatGPT Prompts PDF Free Download - Prompts for ChatGPT
 
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
Choosing the Right FDO Deployment Model for Your Application _ Geoffrey at In...
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptxHarnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
Harnessing Passkeys in the Battle Against AI-Powered Cyber Threats.pptx
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdfIntroduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
Introduction to FDO and How It works Applications _ Richard at FIDO Alliance.pdf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
AI mind or machine power point presentation
AI mind or machine power point presentationAI mind or machine power point presentation
AI mind or machine power point presentation
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The InsideCollecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
Collecting & Temporal Analysis of Behavioral Web Data - Tales From The Inside
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdfHow Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
How Red Hat Uses FDO in Device Lifecycle _ Costin and Vitaliy at Red Hat.pdf
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
Secure Zero Touch enabled Edge compute with Dell NativeEdge via FDO _ Brad at...
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 

Follow the (Kafka) Streams

  • 2. HelloWorld! ○ Mario Molina ○ Big Data Engineer @ Datio ○ Working in data & all things related since 2005. ○ You can find me at: mmolimar_ mmolimar mmolimar
  • 3. A distributed streaming platform ○ Distributed and ordered commit log. ○ Pull-based publish/subscribe messaging with message retention on disk. ○ Fault-tolerant, arbitrary scalability. ○ Isolated topics and partitions per consumer group. ○ Binary TCP-based communication protocol. ○ Actively developed. ○ Great stability. It’s used by industry-leading companies. ○ Excellent APIs (JVM languages mainly). ○ Optimized for read in the same order as write was done. ○ Optimized for massive writes.
  • 5. Kafka APIs ○ Producer API. ○ Consumer API. ○ Connect API: sources and sinks. ○ Streams API.
  • 6. Streams API ○ Library (Java & Scala) for stream processing (one-record-at-a-time). ○ Lightweight with a low barrier entry. ○ High-level DSL & low-level Processor API. ○ Semantics: at-least-once & exactly-once. ○ Fault tolerance. ○ Scalable & recoverable (improved with KIP-429 and KIP-441). ○ No external dependencies.
  • 7. Key concepts in Streams API ○ The processor topology: represented by a directed acyclic graph (DAG). ○ Sort of nodes in the processor topology: ○ Source processor. ○ Stream processor. ○ Sink processor. ○ State stores. ○ Sub-topologies. ○ Abstractions: KStream, KTable and GlobalKTable. Processor Processor Processor* Sink Topology Processor* Sink sub-topology sub-topology state store Source
  • 8. KTable ○ Partitioned table. ○ Each record represents the latest state/value of its key. ○ “UPSERT” mode (from the SQL perspective). Abstractions KStream ○ Partitioned record stream. ○ Immutable data (append only). ○ “INSERT” mode (from the SQL perspective). GlobalKTable ○ Not partitioned. ○ Same as a KTable but with data from all partitions. ○ Just for the DSL. k1 -> A k1 -> A T0 T1 T2 T3 KStream KTable k2 -> B k1 -> A k2 -> B k1 -> C k1 -> C k2 -> B k2 -> D k1 -> C k2 -> D stream-table duality
  • 9. Terminal (stateless) ○ print. ○ foreach. ○ to. Types of operations (DSL) Stateless ○ filter / filterNot. ○ mapValues. ○ flatMapValues. ○ branch. ○ toStream. ○ map(*). ○ flatMap(*) ○ selectKey(*) ○ groupByKey. ○ groupBy. ○ ... Stateful ○ aggregate. ○ joins (inner, left, outer). ○ count. ○ reduce. ○ windowed ops.
  • 12. Topologies: Sub-topology: 0 Source: KSTREAM-SOURCE-0000000000 (topics: [TextLinesTopic]) --> KSTREAM-FLATMAPVALUES-0000000001 Processor: KSTREAM-FLATMAPVALUES-0000000001 (stores: []) --> KSTREAM-KEY-SELECT-0000000002 <-- KSTREAM-SOURCE-0000000000 Processor: KSTREAM-KEY-SELECT-0000000002 (stores: []) --> counts-store-repartition-filter <-- KSTREAM-FLATMAPVALUES-0000000001 Processor: counts-store-repartition-filter (stores: []) --> counts-store-repartition-sink <-- KSTREAM-KEY-SELECT-0000000002 Sink: counts-store-repartition-sink (topic: counts-store-repartition) <-- counts-store-repartition-filter Sub-topology: 1 Source: counts-store-repartition-source (topics: [counts-store-repartition]) --> KSTREAM-AGGREGATE-0000000003 Processor: KSTREAM-AGGREGATE-0000000003 (stores: [counts-store]) --> KTABLE-MAPVALUES-0000000008 <-- counts-store-repartition-source Processor: KTABLE-MAPVALUES-0000000008 (stores: []) --> KTABLE-TOSTREAM-0000000009 <-- KSTREAM-AGGREGATE-0000000003 Processor: KTABLE-TOSTREAM-0000000009 (stores: []) --> KSTREAM-SINK-0000000010 <-- KTABLE-MAPVALUES-0000000008 Sink: KSTREAM-SINK-0000000010 (topic: WordsWithCountsTopic) <-- KTABLE-TOSTREAM-0000000009 Physical plan
  • 13. Other interesting features ○ Windowing. ○ Interactive queries. ○ Topology optimization.
  • 14. What if I don’t use it? ○ It’s OK if you just need to move data from one place to another. ○ But if you need to process/enrich or do other things with the data: ○ Code your specific use case using the producer and consumer APIs. ○ Integrate another processing framework (ie: Spark, Flink...).
  • 16. Demo - Product purchases KafkaConnect voluble kukulcan
  • 17. ○ A REPL for Apache Kafka. ○ Support POSIX and Windows OS. ○ Written in Scala, Java and Python. ○ Shells in: ○ Ammonite REPL. ○ Scala REPL. ○ JShell. ○ Python shell. ○ APIs for Admin, Producer, Consumer, Connect and Streams. kukulcan https://github.com/mmolimar/kukulcan
  • 18. ○ Intelligent data generator. ○ Source code: ○ https://github.com/MichaelDrogalis/voluble ○ Confluent Hub: ○ https://www.confluent.io/hub/mdrogalis/voluble voluble
  • 19. ○ Scripts to run the demo in Kukulcan. ○ Source code: ○ https://github.com/mmolimar/meetups ○ Documentation: ○ https://github.com/mmolimar/meetups/tree/master/kafka-streams Ammonite scripts
  • 20. Getting involved with Apache Kafka ○ Website: http://kafka.apache.org ○ Join the mailing lists: ○ users@kafka.apache.org ○ dev@kafka.apache.org ○ Slack: https://confluentcommunity.slack.com ○ Meetups: https://www.meetup.com/<LOCATION>-Kafka ○ Contribute: https://github.com/apache/kafka ○ Kafka Summit 2020: https://kafka-summit.org