SlideShare a Scribd company logo
1 of 17
Download to read offline
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Lei Chen
Software Engineer, Team Lead
lchen576@bloomberg.net
Real-time* Market Data Processing
Using Kafka Streams
* Actually, just low-latency
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Agenda
• Use cases and challenges
• Why Kafka Streams
• Deep dive into our implementation
• Some other tips & tricks
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Who we are & What we do
• Derivative market data group in Bloomberg
• Builds market data pipelines
• Apply big data & AI technologies in financial domain
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Streaming use cases
Market movement
(bid/ask/trade)
Composite price
(Bloomberg Generated Market Indicator)
Option price
(Calculate option price using
calculated volatilities)
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Challenges and why Kafka Streams
• Zero data loss
• Ultra-Low latency
• Huge data volume
• Large state
• Corporate DR compliance
• Maintenance
‘Exactly once’ delivery
Super fast
Highly scalable
State store
Fault tolerant
Minimal management overhead
*Data already in Kafka!
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Deep dive into our implementation
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Key takeaways
• transform()/process() combines the best of both DSL and PAPI
• Kryo for state serialization/deserialization
• Think stream/table duality
• Avoid unnecessary DSL call by accessing state directly
• Use Kubernetes as runtime
• Monitoring is important (leverage built-in metrics)
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
DSL vs Processor API
• Declarative vs Imperative
• Usability vs Flexibility
• High-level API vs low-level programming model
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Monitoring
• Built-in webserver
• Queryable state
• Metrics
• Internal topics
• Lags (topic/partition)
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Hot load configuration
KStream
KTable
Stream & Table Join
Joined Stream
KStream
GlobalKTable
Joined Stream
Stream & Table Join
Conf. topic
Data topic
Comdb2 Kafka Connect
Conf. topic
Data topic
DB Kafka Connect
KStream
KTable state
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Community Involvement
• KIP-362 - Dynamic Gap Session Window
— Versus fixed-gap session window
session1 session2 session3
gap3gap2gap1
© 2018 Bloomberg Finance L.P. All rights reserved.
Some other tips & tricks
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Thread model and depth first topology
APP
KAFKA
STREAMS
topology
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Trick 1 - Batch Processing In Kafka Streams
10/15/18
Micro batch?
Possible!
Batch?
Harder!
watermark
State
App2
App1
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Trick 2 - Chain Multiple Kafka Streams App
• Kafka as message bus
• Compose pipeline using multiple Kafka Streams apps
• Could leverage third-party pipeline framework – Spring Cloud Data Flow, etc.
KSTREAMS
APP1
topology
KSTREAMS
APPN
topology
KSTREAMS
APP2
topology
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
Our Streaming Platform
© 2018 Bloomberg Finance L.P. All rights reserved.
© 2018 Bloomberg Finance L.P. All rights reserved.
We are hiring!
Questions?
https://tinyurl.com/y7bepre9

More Related Content

What's hot

PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
Atsushi Torikoshi
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
Timothy Spann
 

What's hot (20)

A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Securing data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache RangerSecuring data in hybrid environments using Apache Ranger
Securing data in hybrid environments using Apache Ranger
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?NATS Streaming - an alternative to Apache Kafka?
NATS Streaming - an alternative to Apache Kafka?
 
CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®CDC patterns in Apache Kafka®
CDC patterns in Apache Kafka®
 
PostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical ReplicationPostgreSQL major version upgrade using built in Logical Replication
PostgreSQL major version upgrade using built in Logical Replication
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
CoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFiCoC23_ Looking at the New Features of Apache NiFi
CoC23_ Looking at the New Features of Apache NiFi
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Apache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with SparkApache Sedona: how to process petabytes of agronomic data with Spark
Apache Sedona: how to process petabytes of agronomic data with Spark
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Meetup: Streaming Data Pipeline Development
Meetup:  Streaming Data Pipeline DevelopmentMeetup:  Streaming Data Pipeline Development
Meetup: Streaming Data Pipeline Development
 
Delta from a Data Engineer's Perspective
Delta from a Data Engineer's PerspectiveDelta from a Data Engineer's Perspective
Delta from a Data Engineer's Perspective
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-PlatformDelight: An Improved Apache Spark UI, Free, and Cross-Platform
Delight: An Improved Apache Spark UI, Free, and Cross-Platform
 
Apache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals ExplainedApache Kafka Architecture & Fundamentals Explained
Apache Kafka Architecture & Fundamentals Explained
 

Similar to Real-Time Market Data Analytics Using Kafka Streams

HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
Michael Stack
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Lucidworks
 

Similar to Real-Time Market Data Analytics Using Kafka Streams (20)

Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
Building a Data Subscription Service with Kafka Connect (Danica Fine & Ajay V...
 
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
Kafka Connect and KSQL: Useful Tools in Migrating from a Legacy System to Kaf...
 
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
Why My Streaming Job is Slow - Profiling and Optimizing Kafka Streams Apps (L...
 
Replaying KStreams Apps Using State Snapshots (Nishchay Sinha & Yan Wang, Blo...
Replaying KStreams Apps Using State Snapshots (Nishchay Sinha & Yan Wang, Blo...Replaying KStreams Apps Using State Snapshots (Nishchay Sinha & Yan Wang, Blo...
Replaying KStreams Apps Using State Snapshots (Nishchay Sinha & Yan Wang, Blo...
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
 
Serving queries at low latency using HBase
Serving queries at low latency using HBaseServing queries at low latency using HBase
Serving queries at low latency using HBase
 
Multi-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-finalMulti-Tenant HBase Cluster - HBaseCon2018-final
Multi-Tenant HBase Cluster - HBaseCon2018-final
 
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal GemfireIMCSummit 2015 - 1 IT Business  - The Evolution of Pivotal Gemfire
IMCSummit 2015 - 1 IT Business - The Evolution of Pivotal Gemfire
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
Learning to Rank: From Theory to Production - Malvina Josephidou & Diego Cecc...
 
HBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, SolutionsHBase coprocessors, Uses, Abuses, Solutions
HBase coprocessors, Uses, Abuses, Solutions
 
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
How Trek10 Uses Datadog's Distributed Tracing to Improve AWS Lambda Projects ...
 
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
From Mainframe to Microservices with Pivotal Platform and Kafka: Bridging the...
 
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven EnterprisePivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
Pivotal Digital Transformation Forum: Journey to Become a Data-Driven Enterprise
 
KNIME Software Overview
KNIME Software OverviewKNIME Software Overview
KNIME Software Overview
 
How data modelling helps serve billions of queries in millisecond latency wit...
How data modelling helps serve billions of queries in millisecond latency wit...How data modelling helps serve billions of queries in millisecond latency wit...
How data modelling helps serve billions of queries in millisecond latency wit...
 
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
Build High-Throughput, Bursty Data Apps with Amazon SQS, SNS, & Lambda (API30...
 

More from confluent

More from confluent (20)

Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Santander Stream Processing with Apache Flink
Santander Stream Processing with Apache FlinkSantander Stream Processing with Apache Flink
Santander Stream Processing with Apache Flink
 
Unlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insightsUnlocking the Power of IoT: A comprehensive approach to real-time insights
Unlocking the Power of IoT: A comprehensive approach to real-time insights
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
Industry 4.0: Building the Unified Namespace with Confluent, HiveMQ and Spark...
 
AWS Immersion Day Mapfre - Confluent
AWS Immersion Day Mapfre   -   ConfluentAWS Immersion Day Mapfre   -   Confluent
AWS Immersion Day Mapfre - Confluent
 
Eventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalkEventos y Microservicios - Santander TechTalk
Eventos y Microservicios - Santander TechTalk
 
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent CloudQ&A with Confluent Experts: Navigating Networking in Confluent Cloud
Q&A with Confluent Experts: Navigating Networking in Confluent Cloud
 
Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 

Real-Time Market Data Analytics Using Kafka Streams

  • 1. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Lei Chen Software Engineer, Team Lead lchen576@bloomberg.net Real-time* Market Data Processing Using Kafka Streams * Actually, just low-latency
  • 2. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Agenda • Use cases and challenges • Why Kafka Streams • Deep dive into our implementation • Some other tips & tricks
  • 3. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Who we are & What we do • Derivative market data group in Bloomberg • Builds market data pipelines • Apply big data & AI technologies in financial domain
  • 4. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Streaming use cases Market movement (bid/ask/trade) Composite price (Bloomberg Generated Market Indicator) Option price (Calculate option price using calculated volatilities)
  • 5. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Challenges and why Kafka Streams • Zero data loss • Ultra-Low latency • Huge data volume • Large state • Corporate DR compliance • Maintenance ‘Exactly once’ delivery Super fast Highly scalable State store Fault tolerant Minimal management overhead *Data already in Kafka!
  • 6. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Deep dive into our implementation
  • 7. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Key takeaways • transform()/process() combines the best of both DSL and PAPI • Kryo for state serialization/deserialization • Think stream/table duality • Avoid unnecessary DSL call by accessing state directly • Use Kubernetes as runtime • Monitoring is important (leverage built-in metrics)
  • 8. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. DSL vs Processor API • Declarative vs Imperative • Usability vs Flexibility • High-level API vs low-level programming model
  • 9. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Monitoring • Built-in webserver • Queryable state • Metrics • Internal topics • Lags (topic/partition)
  • 10. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Hot load configuration KStream KTable Stream & Table Join Joined Stream KStream GlobalKTable Joined Stream Stream & Table Join Conf. topic Data topic Comdb2 Kafka Connect Conf. topic Data topic DB Kafka Connect KStream KTable state
  • 11. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Community Involvement • KIP-362 - Dynamic Gap Session Window — Versus fixed-gap session window session1 session2 session3 gap3gap2gap1
  • 12. © 2018 Bloomberg Finance L.P. All rights reserved. Some other tips & tricks
  • 13. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Thread model and depth first topology APP KAFKA STREAMS topology
  • 14. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Trick 1 - Batch Processing In Kafka Streams 10/15/18 Micro batch? Possible! Batch? Harder! watermark State App2 App1
  • 15. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Trick 2 - Chain Multiple Kafka Streams App • Kafka as message bus • Compose pipeline using multiple Kafka Streams apps • Could leverage third-party pipeline framework – Spring Cloud Data Flow, etc. KSTREAMS APP1 topology KSTREAMS APPN topology KSTREAMS APP2 topology
  • 16. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. Our Streaming Platform
  • 17. © 2018 Bloomberg Finance L.P. All rights reserved. © 2018 Bloomberg Finance L.P. All rights reserved. We are hiring! Questions? https://tinyurl.com/y7bepre9