SlideShare a Scribd company logo
1 of 32
Download to read offline
Concord: Simple & Flexible
Stream Processing on Apache Mesos
Shinji Kim
Co-founder, Concord Systems
@concord
@databythebay #datagrid
Overview
•  What is Stream Processing?
•  Today’s Stream Processing
•  Introducing Concord
1. Concepts & API
2. Job Topology Management
3. Operations, Toolings, Performance
4. Message Delivery Guarantees
•  Future Development Plans
Page 2
What is stream processing?
Page 3
•  Processing Data in motion
•  Sits between message queues and databases
•  Used for faster:
–  Data enrichment
–  Aggregation
–  Filtering / deduplication
Today’s Stream Processing
•  Faster MapReduce jobs à ends up running core
business logic on top
–  Fradulent click detection
–  Real-time budget updates
–  Trigger-based trading
•  Your stream processing jobs are more like microservices
•  Need support for services / application management:
Cluster mgmt, Monitoring, Debuggability
Page 4
Introducing Concord
Concord is a distributed stream processing framework
built in C++ on top of Apache Mesos, designed for
high-performance, real-time applications that require
flexibility & control.
Page 5
Introducing Concord
Page 6
Data	
  Sources	
   Data	
  Sinks	
  
Pub / Sub Operator Model
•  Composable jobs by Metadata
A	
   B	
  
words	
  Metadata(
Name=‘A’,
istreams=[],
ostreams=[‘words’])
Metadata(
Name=‘B’,
istreams=[‘words’,
StreamGrouping.GROUP_BY],
ostreams=[])
Page 7
Pub / Sub Operator Model
•  Composable jobs by Metadata
A	
   B	
  
words	
  Metadata(
Name=‘A’,
istreams=[],
ostreams=[‘words’])
Metadata(
Name=‘B’,
istreams=[‘words’,
StreamGrouping.GROUP_BY],
ostreams=[])
Page 8
C	
   Metadata(
Name=‘C’,
istreams=[‘words’,
StreamGrouping.SHUFFLE],
ostreams=[])
Simple API in Multiple Languages
•  ProcessRecord, ProduceRecord, ProcessTimer
•  GetState, SetState backed by Rocksdb
•  API available in Python, Ruby, Go, Java/Scala, C++
B	
  Metadata(
Name=‘C’,
istreams=[‘words’,
StreamGrouping.GROUP_BY],
ostreams=[‘wordcount’])
Page 9
words	
   wordcount	
  
Key	
   Value	
  
Corgi	
   2	
  
Chiwawa	
   4	
  
Dashhound	
   5	
  
Useful for multiple teams to consume the same
streaming data in real-time
Page 10
Native Integration with Apache Mesos
Page 11
•  Dynamic resource
scheduling
•  Task Isolation
•  Task supervision
•  High Availability
Containerized Execution Environment
•  Horizontal scaling
•  Multi-tenancy
•  Hot code deployment &
dynamic topology
Page 12
Mesos	
  Agent	
  
RocksDB	
  
Concord is Flexible: Run-time deployment
Page 13
Concord is Flexible: Run-time deployment
Page 14
Concord is Flexible: Run-time deployment
Page 15
Concord is Flexible: Run-time deployment
Page 16
Concord supports Distributed Tracing
Page 17
Monitor all operator instances at glance
Page 18
Concord supports Transparent Debugging
[2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000:
traceId: -8816532120874703981,
parentId: 0, id: -6816766813334129096,
p50: 388179us, p95: 519668us, p99: 524812us, p999: 526425us
[2015-11-02 15:37:13.929] [principal_latencies] [info] 127.0.0.1:31001:
traceId: -4811311467074699790,
parentId: -7681059555040553620,
id: -1899872683843643522,
p50: 73355us, p95: 145626us, p99: 210345us, p999: 272018us
[2015-11-02 15:36:43.323] [incoming_throughput] [info] 12288 req in 1045515us. total: 367616 req
[2015-11-02 15:36:30.240] [outgoing_throughput] [info] 100000 req in 4804526us. total: 600000 req
Page 19
Concord performs well at scale
•  Word count benchmark (1.13B msgs)
–  Concord: 500K QPS/node at 10ms/event
–  Storm: 16K QPS/node at 100ms/event
–  Spark Streaming: 100K QPS/node at 1s batch window
•  Server log processing (29G server log, ~260M msgs)
–  4 nodes, 8 vCPU, 32GB RAM each
–  Concord: 1M – 1.8M QPS
–  Spark Streaming: 72K – 2M QPS
•  Consistent performance
Page 20
Concord is designed for Predictability
•  As you scale, JVM reconfiguration and GC pauses are
inevitable (Framework GC vs. Application GC)
•  Cluster abstracted as CPU, Memory, Disk numbers à
cluster optimization & overall runtime
•  Fast Compile à Test à Deploy cycle without downtime
Page 21
Message Delivery Guarantees
Today: Fast > Complete or Perfect
•  Best-effort / at-most-once processing
–  When operator or node crashes, the local cache goes away
–  Automatically retries the failed operator (number of retries is
configurable)
–  Recommends implementing check mechanisms in operators
(e.g., Concord Kafka consumer)
Page 22
Message Delivery Guarantees
Soon: Fast + Complete > Perfect
•  In development for at-least-once with Kafka
–  Kafka acts as a message bus between operators
–  Kafka replays data from checked offset (data duplication)
Eventually: Fast + Complete + Perfect
•  Transactional datastore in design phase
Page 23
Future plans
•  “At least once” guarantee support with Kafka
•  DC/OS integration
•  More data source / data sink connector support
•  Higher level DSL
Page 24
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 25
•  Operator model that you can use multiple languages
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 26
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 27
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 28
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
à Decoupled development & dev ops work
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 29
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
à Decoupled development & dev ops work
•  High performance at scale
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 30
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
à Decoupled development & dev ops work
•  High performance at scale
à Predictable system for real-time applications
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 31
•  Low-latency / Real-time applications:
–  Real-time fraud detection
–  Financial market data processing for real-time risks and triggers
–  Real-time campaign management for real-time bidding (RTB)
Thank You!
Get Started: http://concord.io
shinji@concord.io / @shinjikim
@concord
@databythebay #datagrid

More Related Content

What's hot

ファイルシステム比較
ファイルシステム比較ファイルシステム比較
ファイルシステム比較NaoyaFukuda
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus OverviewBrian Brazil
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheuskawamuray
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSPeterAndreasEntschev
 
Classical problem of synchronization
Classical problem of synchronizationClassical problem of synchronization
Classical problem of synchronizationShakshi Ranawat
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsScyllaDB
 
できる!並列・並行プログラミング
できる!並列・並行プログラミングできる!並列・並行プログラミング
できる!並列・並行プログラミングPreferred Networks
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいwata2ki
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...Databricks
 
Intel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうIntel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうTakuya ASADA
 
Learned from KIND
Learned from KIND Learned from KIND
Learned from KIND HungWei Chiu
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems confluent
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConJérôme Petazzoni
 
OS Process and Thread Concepts
OS Process and Thread ConceptsOS Process and Thread Concepts
OS Process and Thread Conceptssgpraju
 
Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Hiroki Nakahara
 
Spark로 알아보는 빅데이터 처리
Spark로 알아보는 빅데이터 처리Spark로 알아보는 빅데이터 처리
Spark로 알아보는 빅데이터 처리Jeong-gyu Kim
 

What's hot (20)

ファイルシステム比較
ファイルシステム比較ファイルシステム比較
ファイルシステム比較
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
Monitoring Kafka w/ Prometheus
Monitoring Kafka w/ PrometheusMonitoring Kafka w/ Prometheus
Monitoring Kafka w/ Prometheus
 
Linux Kernel Live Patching
Linux Kernel Live PatchingLinux Kernel Live Patching
Linux Kernel Live Patching
 
自作GPUへの道
自作GPUへの道自作GPUへの道
自作GPUへの道
 
OS入門
OS入門OS入門
OS入門
 
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDSDistributed Multi-GPU Computing with Dask, CuPy and RAPIDS
Distributed Multi-GPU Computing with Dask, CuPy and RAPIDS
 
Classical problem of synchronization
Classical problem of synchronizationClassical problem of synchronization
Classical problem of synchronization
 
Hardware Assisted Latency Investigations
Hardware Assisted Latency InvestigationsHardware Assisted Latency Investigations
Hardware Assisted Latency Investigations
 
できる!並列・並行プログラミング
できる!並列・並行プログラミングできる!並列・並行プログラミング
できる!並列・並行プログラミング
 
ARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくいARM LinuxのMMUはわかりにくい
ARM LinuxのMMUはわかりにくい
 
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
MATS stack (MLFlow, Airflow, Tensorflow, Spark) for Cross-system Orchestratio...
 
Intel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼうIntel 82599 10GbE Controllerで遊ぼう
Intel 82599 10GbE Controllerで遊ぼう
 
Learned from KIND
Learned from KIND Learned from KIND
Learned from KIND
 
Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems Kafka on ZFS: Better Living Through Filesystems
Kafka on ZFS: Better Living Through Filesystems
 
MapReduce入門
MapReduce入門MapReduce入門
MapReduce入門
 
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxConAnatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
 
OS Process and Thread Concepts
OS Process and Thread ConceptsOS Process and Thread Concepts
OS Process and Thread Concepts
 
Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)Tensor flow usergroup 2016 (公開版)
Tensor flow usergroup 2016 (公開版)
 
Spark로 알아보는 빅데이터 처리
Spark로 알아보는 빅데이터 처리Spark로 알아보는 빅데이터 처리
Spark로 알아보는 빅데이터 처리
 

Similar to Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016

Migrate to platform of your choice
Migrate to platform of your choiceMigrate to platform of your choice
Migrate to platform of your choiceAshnikbiz
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBMongoDB
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservicesMithun Arunan
 
Faster, Simpler, Better - MongoDB to the rescue
Faster, Simpler, Better - MongoDB to the rescue Faster, Simpler, Better - MongoDB to the rescue
Faster, Simpler, Better - MongoDB to the rescue MongoDB
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table NotesTimothy Spann
 
Docker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge ComputingDocker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge ComputingBukhary Ikhwan Ismail
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022StreamNative
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceMongoDB
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureHuxing Zhang
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyMongoDB
 
DevOps LA Meetup Intro to Habitat
DevOps LA Meetup Intro to HabitatDevOps LA Meetup Intro to Habitat
DevOps LA Meetup Intro to HabitatJessica DeVita
 
130815 - Content Delviery Networks for the IEEE Singapore Broadcast group
130815 - Content Delviery Networks for the IEEE Singapore Broadcast group130815 - Content Delviery Networks for the IEEE Singapore Broadcast group
130815 - Content Delviery Networks for the IEEE Singapore Broadcast groupPasocoPteLtd
 
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Continuent
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveWalid Shaari
 
.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los AngelesVMware Tanzu
 
AdminCamp 2018 - ApplicationInsights für Administratoren
AdminCamp 2018 - ApplicationInsights für AdministratorenAdminCamp 2018 - ApplicationInsights für Administratoren
AdminCamp 2018 - ApplicationInsights für AdministratorenChristoph Adler
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021Ieva Navickaite
 

Similar to Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016 (20)

Migrate to platform of your choice
Migrate to platform of your choiceMigrate to platform of your choice
Migrate to platform of your choice
 
Shaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDBShaping the Future of Travel with MongoDB
Shaping the Future of Travel with MongoDB
 
Move fast and make things with microservices
Move fast and make things with microservicesMove fast and make things with microservices
Move fast and make things with microservices
 
Faster, Simpler, Better - MongoDB to the rescue
Faster, Simpler, Better - MongoDB to the rescue Faster, Simpler, Better - MongoDB to the rescue
Faster, Simpler, Better - MongoDB to the rescue
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Unconference Round Table Notes
Unconference Round Table NotesUnconference Round Table Notes
Unconference Round Table Notes
 
Docker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge ComputingDocker:- Application Delivery Platform Towards Edge Computing
Docker:- Application Delivery Platform Towards Edge Computing
 
Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022Event-Driven Applications Done Right - Pulsar Summit SF 2022
Event-Driven Applications Done Right - Pulsar Summit SF 2022
 
Webinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-ServiceWebinar: Enterprise Trends for Database-as-a-Service
Webinar: Enterprise Trends for Database-as-a-Service
 
Dubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architectureDubbo and Weidian's practice on micro-service architecture
Dubbo and Weidian's practice on micro-service architecture
 
Accelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data StrategyAccelerating a Path to Digital With a Cloud Data Strategy
Accelerating a Path to Digital With a Cloud Data Strategy
 
DevOps LA Meetup Intro to Habitat
DevOps LA Meetup Intro to HabitatDevOps LA Meetup Intro to Habitat
DevOps LA Meetup Intro to Habitat
 
130815 - Content Delviery Networks for the IEEE Singapore Broadcast group
130815 - Content Delviery Networks for the IEEE Singapore Broadcast group130815 - Content Delviery Networks for the IEEE Singapore Broadcast group
130815 - Content Delviery Networks for the IEEE Singapore Broadcast group
 
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
Marketing Automation at Scale: How Marketo Solved Key Data Management Challen...
 
Network Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspectiveNetwork Automation Journey, A systems engineer NetOps perspective
Network Automation Journey, A systems engineer NetOps perspective
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles.NET Cloud-Native Bootcamp- Los Angeles
.NET Cloud-Native Bootcamp- Los Angeles
 
AdminCamp 2018 - ApplicationInsights für Administratoren
AdminCamp 2018 - ApplicationInsights für AdministratorenAdminCamp 2018 - ApplicationInsights für Administratoren
AdminCamp 2018 - ApplicationInsights für Administratoren
 
Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021MuleSoft Manchester Meetup #4 slides 11th February 2021
MuleSoft Manchester Meetup #4 slides 11th February 2021
 

Recently uploaded

WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxJennifer Lim
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfFIDO Alliance
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimaginedpanagenda
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024Stephanie Beckett
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfUK Journal
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераMark Opanasiuk
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfFIDO Alliance
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...CzechDreamin
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Patrick Viafore
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyUXDXConf
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftshyamraj55
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsStefano
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfFIDO Alliance
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...FIDO Alliance
 

Recently uploaded (20)

WSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptxWSO2CONMay2024OpenSourceConferenceDebrief.pptx
WSO2CONMay2024OpenSourceConferenceDebrief.pptx
 
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdfThe Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
The Value of Certifying Products for FDO _ Paul at FIDO Alliance.pdf
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024What's New in Teams Calling, Meetings and Devices April 2024
What's New in Teams Calling, Meetings and Devices April 2024
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024Enterprise Knowledge Graphs - Data Summit 2024
Enterprise Knowledge Graphs - Data Summit 2024
 
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdfBreaking Down the Flutterwave Scandal What You Need to Know.pdf
Breaking Down the Flutterwave Scandal What You Need to Know.pdf
 
Intro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджераIntro in Product Management - Коротко про професію продакт менеджера
Intro in Product Management - Коротко про професію продакт менеджера
 
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdfWhere to Learn More About FDO _ Richard at FIDO Alliance.pdf
Where to Learn More About FDO _ Richard at FIDO Alliance.pdf
 
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
Integrating Telephony Systems with Salesforce: Insights and Considerations, B...
 
Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024Extensible Python: Robustness through Addition - PyCon 2024
Extensible Python: Robustness through Addition - PyCon 2024
 
A Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System StrategyA Business-Centric Approach to Design System Strategy
A Business-Centric Approach to Design System Strategy
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Syngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdfSyngulon - Selection technology May 2024.pdf
Syngulon - Selection technology May 2024.pdf
 
PLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. StartupsPLAI - Acceleration Program for Generative A.I. Startups
PLAI - Acceleration Program for Generative A.I. Startups
 
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdfLinux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
Linux Foundation Edge _ Overview of FDO Software Components _ Randy at Intel.pdf
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
ASRock Industrial FDO Solutions in Action for Industrial Edge AI _ Kenny at A...
 

Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016

  • 1. Concord: Simple & Flexible Stream Processing on Apache Mesos Shinji Kim Co-founder, Concord Systems @concord @databythebay #datagrid
  • 2. Overview •  What is Stream Processing? •  Today’s Stream Processing •  Introducing Concord 1. Concepts & API 2. Job Topology Management 3. Operations, Toolings, Performance 4. Message Delivery Guarantees •  Future Development Plans Page 2
  • 3. What is stream processing? Page 3 •  Processing Data in motion •  Sits between message queues and databases •  Used for faster: –  Data enrichment –  Aggregation –  Filtering / deduplication
  • 4. Today’s Stream Processing •  Faster MapReduce jobs à ends up running core business logic on top –  Fradulent click detection –  Real-time budget updates –  Trigger-based trading •  Your stream processing jobs are more like microservices •  Need support for services / application management: Cluster mgmt, Monitoring, Debuggability Page 4
  • 5. Introducing Concord Concord is a distributed stream processing framework built in C++ on top of Apache Mesos, designed for high-performance, real-time applications that require flexibility & control. Page 5
  • 6. Introducing Concord Page 6 Data  Sources   Data  Sinks  
  • 7. Pub / Sub Operator Model •  Composable jobs by Metadata A   B   words  Metadata( Name=‘A’, istreams=[], ostreams=[‘words’]) Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[]) Page 7
  • 8. Pub / Sub Operator Model •  Composable jobs by Metadata A   B   words  Metadata( Name=‘A’, istreams=[], ostreams=[‘words’]) Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[]) Page 8 C   Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.SHUFFLE], ostreams=[])
  • 9. Simple API in Multiple Languages •  ProcessRecord, ProduceRecord, ProcessTimer •  GetState, SetState backed by Rocksdb •  API available in Python, Ruby, Go, Java/Scala, C++ B  Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[‘wordcount’]) Page 9 words   wordcount   Key   Value   Corgi   2   Chiwawa   4   Dashhound   5  
  • 10. Useful for multiple teams to consume the same streaming data in real-time Page 10
  • 11. Native Integration with Apache Mesos Page 11 •  Dynamic resource scheduling •  Task Isolation •  Task supervision •  High Availability
  • 12. Containerized Execution Environment •  Horizontal scaling •  Multi-tenancy •  Hot code deployment & dynamic topology Page 12 Mesos  Agent   RocksDB  
  • 13. Concord is Flexible: Run-time deployment Page 13
  • 14. Concord is Flexible: Run-time deployment Page 14
  • 15. Concord is Flexible: Run-time deployment Page 15
  • 16. Concord is Flexible: Run-time deployment Page 16
  • 17. Concord supports Distributed Tracing Page 17
  • 18. Monitor all operator instances at glance Page 18
  • 19. Concord supports Transparent Debugging [2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000: traceId: -8816532120874703981, parentId: 0, id: -6816766813334129096, p50: 388179us, p95: 519668us, p99: 524812us, p999: 526425us [2015-11-02 15:37:13.929] [principal_latencies] [info] 127.0.0.1:31001: traceId: -4811311467074699790, parentId: -7681059555040553620, id: -1899872683843643522, p50: 73355us, p95: 145626us, p99: 210345us, p999: 272018us [2015-11-02 15:36:43.323] [incoming_throughput] [info] 12288 req in 1045515us. total: 367616 req [2015-11-02 15:36:30.240] [outgoing_throughput] [info] 100000 req in 4804526us. total: 600000 req Page 19
  • 20. Concord performs well at scale •  Word count benchmark (1.13B msgs) –  Concord: 500K QPS/node at 10ms/event –  Storm: 16K QPS/node at 100ms/event –  Spark Streaming: 100K QPS/node at 1s batch window •  Server log processing (29G server log, ~260M msgs) –  4 nodes, 8 vCPU, 32GB RAM each –  Concord: 1M – 1.8M QPS –  Spark Streaming: 72K – 2M QPS •  Consistent performance Page 20
  • 21. Concord is designed for Predictability •  As you scale, JVM reconfiguration and GC pauses are inevitable (Framework GC vs. Application GC) •  Cluster abstracted as CPU, Memory, Disk numbers à cluster optimization & overall runtime •  Fast Compile à Test à Deploy cycle without downtime Page 21
  • 22. Message Delivery Guarantees Today: Fast > Complete or Perfect •  Best-effort / at-most-once processing –  When operator or node crashes, the local cache goes away –  Automatically retries the failed operator (number of retries is configurable) –  Recommends implementing check mechanisms in operators (e.g., Concord Kafka consumer) Page 22
  • 23. Message Delivery Guarantees Soon: Fast + Complete > Perfect •  In development for at-least-once with Kafka –  Kafka acts as a message bus between operators –  Kafka replays data from checked offset (data duplication) Eventually: Fast + Complete + Perfect •  Transactional datastore in design phase Page 23
  • 24. Future plans •  “At least once” guarantee support with Kafka •  DC/OS integration •  More data source / data sink connector support •  Higher level DSL Page 24
  • 25. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 25 •  Operator model that you can use multiple languages
  • 26. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 26 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data
  • 27. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 27 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling
  • 28. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 28 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work
  • 29. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 29 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work •  High performance at scale
  • 30. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 30 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work •  High performance at scale à Predictable system for real-time applications
  • 31. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 31 •  Low-latency / Real-time applications: –  Real-time fraud detection –  Financial market data processing for real-time risks and triggers –  Real-time campaign management for real-time bidding (RTB)
  • 32. Thank You! Get Started: http://concord.io shinji@concord.io / @shinjikim @concord @databythebay #datagrid