SlideShare a Scribd company logo
Concord: Simple & Flexible
Stream Processing on Apache Mesos
Shinji Kim
Co-founder, Concord Systems
@concord
@databythebay #datagrid
Overview
•  What is Stream Processing?
•  Today’s Stream Processing
•  Introducing Concord
1. Concepts & API
2. Job Topology Management
3. Operations, Toolings, Performance
4. Message Delivery Guarantees
•  Future Development Plans
Page 2
What is stream processing?
Page 3
•  Processing Data in motion
•  Sits between message queues and databases
•  Used for faster:
–  Data enrichment
–  Aggregation
–  Filtering / deduplication
Today’s Stream Processing
•  Faster MapReduce jobs à ends up running core
business logic on top
–  Fradulent click detection
–  Real-time budget updates
–  Trigger-based trading
•  Your stream processing jobs are more like microservices
•  Need support for services / application management:
Cluster mgmt, Monitoring, Debuggability
Page 4
Introducing Concord
Concord is a distributed stream processing framework
built in C++ on top of Apache Mesos, designed for
high-performance, real-time applications that require
flexibility & control.
Page 5
Introducing Concord
Page 6
Data	
  Sources	
   Data	
  Sinks	
  
Pub / Sub Operator Model
•  Composable jobs by Metadata
A	
   B	
  
words	
  Metadata(
Name=‘A’,
istreams=[],
ostreams=[‘words’])
Metadata(
Name=‘B’,
istreams=[‘words’,
StreamGrouping.GROUP_BY],
ostreams=[])
Page 7
Pub / Sub Operator Model
•  Composable jobs by Metadata
A	
   B	
  
words	
  Metadata(
Name=‘A’,
istreams=[],
ostreams=[‘words’])
Metadata(
Name=‘B’,
istreams=[‘words’,
StreamGrouping.GROUP_BY],
ostreams=[])
Page 8
C	
   Metadata(
Name=‘C’,
istreams=[‘words’,
StreamGrouping.SHUFFLE],
ostreams=[])
Simple API in Multiple Languages
•  ProcessRecord, ProduceRecord, ProcessTimer
•  GetState, SetState backed by Rocksdb
•  API available in Python, Ruby, Go, Java/Scala, C++
B	
  Metadata(
Name=‘C’,
istreams=[‘words’,
StreamGrouping.GROUP_BY],
ostreams=[‘wordcount’])
Page 9
words	
   wordcount	
  
Key	
   Value	
  
Corgi	
   2	
  
Chiwawa	
   4	
  
Dashhound	
   5	
  
Useful for multiple teams to consume the same
streaming data in real-time
Page 10
Native Integration with Apache Mesos
Page 11
•  Dynamic resource
scheduling
•  Task Isolation
•  Task supervision
•  High Availability
Containerized Execution Environment
•  Horizontal scaling
•  Multi-tenancy
•  Hot code deployment &
dynamic topology
Page 12
Mesos	
  Agent	
  
RocksDB	
  
Concord is Flexible: Run-time deployment
Page 13
Concord is Flexible: Run-time deployment
Page 14
Concord is Flexible: Run-time deployment
Page 15
Concord is Flexible: Run-time deployment
Page 16
Concord supports Distributed Tracing
Page 17
Monitor all operator instances at glance
Page 18
Concord supports Transparent Debugging
[2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000:
traceId: -8816532120874703981,
parentId: 0, id: -6816766813334129096,
p50: 388179us, p95: 519668us, p99: 524812us, p999: 526425us
[2015-11-02 15:37:13.929] [principal_latencies] [info] 127.0.0.1:31001:
traceId: -4811311467074699790,
parentId: -7681059555040553620,
id: -1899872683843643522,
p50: 73355us, p95: 145626us, p99: 210345us, p999: 272018us
[2015-11-02 15:36:43.323] [incoming_throughput] [info] 12288 req in 1045515us. total: 367616 req
[2015-11-02 15:36:30.240] [outgoing_throughput] [info] 100000 req in 4804526us. total: 600000 req
Page 19
Concord performs well at scale
•  Word count benchmark (1.13B msgs)
–  Concord: 500K QPS/node at 10ms/event
–  Storm: 16K QPS/node at 100ms/event
–  Spark Streaming: 100K QPS/node at 1s batch window
•  Server log processing (29G server log, ~260M msgs)
–  4 nodes, 8 vCPU, 32GB RAM each
–  Concord: 1M – 1.8M QPS
–  Spark Streaming: 72K – 2M QPS
•  Consistent performance
Page 20
Concord is designed for Predictability
•  As you scale, JVM reconfiguration and GC pauses are
inevitable (Framework GC vs. Application GC)
•  Cluster abstracted as CPU, Memory, Disk numbers à
cluster optimization & overall runtime
•  Fast Compile à Test à Deploy cycle without downtime
Page 21
Message Delivery Guarantees
Today: Fast > Complete or Perfect
•  Best-effort / at-most-once processing
–  When operator or node crashes, the local cache goes away
–  Automatically retries the failed operator (number of retries is
configurable)
–  Recommends implementing check mechanisms in operators
(e.g., Concord Kafka consumer)
Page 22
Message Delivery Guarantees
Soon: Fast + Complete > Perfect
•  In development for at-least-once with Kafka
–  Kafka acts as a message bus between operators
–  Kafka replays data from checked offset (data duplication)
Eventually: Fast + Complete + Perfect
•  Transactional datastore in design phase
Page 23
Future plans
•  “At least once” guarantee support with Kafka
•  DC/OS integration
•  More data source / data sink connector support
•  Higher level DSL
Page 24
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 25
•  Operator model that you can use multiple languages
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 26
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 27
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 28
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
à Decoupled development & dev ops work
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 29
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
à Decoupled development & dev ops work
•  High performance at scale
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 30
•  Operator model that you can use multiple languages
à Fast development and iteration time for multiple
teams using the same data
•  Dynamic topology, run-time deployment and scaling
à Decoupled development & dev ops work
•  High performance at scale
à Predictable system for real-time applications
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 31
•  Low-latency / Real-time applications:
–  Real-time fraud detection
–  Financial market data processing for real-time risks and triggers
–  Real-time campaign management for real-time bidding (RTB)
Thank You!
Get Started: http://concord.io
shinji@concord.io / @shinjikim
@concord
@databythebay #datagrid

More Related Content

Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016

  • 1. Concord: Simple & Flexible Stream Processing on Apache Mesos Shinji Kim Co-founder, Concord Systems @concord @databythebay #datagrid
  • 2. Overview •  What is Stream Processing? •  Today’s Stream Processing •  Introducing Concord 1. Concepts & API 2. Job Topology Management 3. Operations, Toolings, Performance 4. Message Delivery Guarantees •  Future Development Plans Page 2
  • 3. What is stream processing? Page 3 •  Processing Data in motion •  Sits between message queues and databases •  Used for faster: –  Data enrichment –  Aggregation –  Filtering / deduplication
  • 4. Today’s Stream Processing •  Faster MapReduce jobs à ends up running core business logic on top –  Fradulent click detection –  Real-time budget updates –  Trigger-based trading •  Your stream processing jobs are more like microservices •  Need support for services / application management: Cluster mgmt, Monitoring, Debuggability Page 4
  • 5. Introducing Concord Concord is a distributed stream processing framework built in C++ on top of Apache Mesos, designed for high-performance, real-time applications that require flexibility & control. Page 5
  • 6. Introducing Concord Page 6 Data  Sources   Data  Sinks  
  • 7. Pub / Sub Operator Model •  Composable jobs by Metadata A   B   words  Metadata( Name=‘A’, istreams=[], ostreams=[‘words’]) Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[]) Page 7
  • 8. Pub / Sub Operator Model •  Composable jobs by Metadata A   B   words  Metadata( Name=‘A’, istreams=[], ostreams=[‘words’]) Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[]) Page 8 C   Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.SHUFFLE], ostreams=[])
  • 9. Simple API in Multiple Languages •  ProcessRecord, ProduceRecord, ProcessTimer •  GetState, SetState backed by Rocksdb •  API available in Python, Ruby, Go, Java/Scala, C++ B  Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[‘wordcount’]) Page 9 words   wordcount   Key   Value   Corgi   2   Chiwawa   4   Dashhound   5  
  • 10. Useful for multiple teams to consume the same streaming data in real-time Page 10
  • 11. Native Integration with Apache Mesos Page 11 •  Dynamic resource scheduling •  Task Isolation •  Task supervision •  High Availability
  • 12. Containerized Execution Environment •  Horizontal scaling •  Multi-tenancy •  Hot code deployment & dynamic topology Page 12 Mesos  Agent   RocksDB  
  • 13. Concord is Flexible: Run-time deployment Page 13
  • 14. Concord is Flexible: Run-time deployment Page 14
  • 15. Concord is Flexible: Run-time deployment Page 15
  • 16. Concord is Flexible: Run-time deployment Page 16
  • 17. Concord supports Distributed Tracing Page 17
  • 18. Monitor all operator instances at glance Page 18
  • 19. Concord supports Transparent Debugging [2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000: traceId: -8816532120874703981, parentId: 0, id: -6816766813334129096, p50: 388179us, p95: 519668us, p99: 524812us, p999: 526425us [2015-11-02 15:37:13.929] [principal_latencies] [info] 127.0.0.1:31001: traceId: -4811311467074699790, parentId: -7681059555040553620, id: -1899872683843643522, p50: 73355us, p95: 145626us, p99: 210345us, p999: 272018us [2015-11-02 15:36:43.323] [incoming_throughput] [info] 12288 req in 1045515us. total: 367616 req [2015-11-02 15:36:30.240] [outgoing_throughput] [info] 100000 req in 4804526us. total: 600000 req Page 19
  • 20. Concord performs well at scale •  Word count benchmark (1.13B msgs) –  Concord: 500K QPS/node at 10ms/event –  Storm: 16K QPS/node at 100ms/event –  Spark Streaming: 100K QPS/node at 1s batch window •  Server log processing (29G server log, ~260M msgs) –  4 nodes, 8 vCPU, 32GB RAM each –  Concord: 1M – 1.8M QPS –  Spark Streaming: 72K – 2M QPS •  Consistent performance Page 20
  • 21. Concord is designed for Predictability •  As you scale, JVM reconfiguration and GC pauses are inevitable (Framework GC vs. Application GC) •  Cluster abstracted as CPU, Memory, Disk numbers à cluster optimization & overall runtime •  Fast Compile à Test à Deploy cycle without downtime Page 21
  • 22. Message Delivery Guarantees Today: Fast > Complete or Perfect •  Best-effort / at-most-once processing –  When operator or node crashes, the local cache goes away –  Automatically retries the failed operator (number of retries is configurable) –  Recommends implementing check mechanisms in operators (e.g., Concord Kafka consumer) Page 22
  • 23. Message Delivery Guarantees Soon: Fast + Complete > Perfect •  In development for at-least-once with Kafka –  Kafka acts as a message bus between operators –  Kafka replays data from checked offset (data duplication) Eventually: Fast + Complete + Perfect •  Transactional datastore in design phase Page 23
  • 24. Future plans •  “At least once” guarantee support with Kafka •  DC/OS integration •  More data source / data sink connector support •  Higher level DSL Page 24
  • 25. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 25 •  Operator model that you can use multiple languages
  • 26. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 26 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data
  • 27. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 27 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling
  • 28. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 28 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work
  • 29. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 29 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work •  High performance at scale
  • 30. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 30 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work •  High performance at scale à Predictable system for real-time applications
  • 31. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 31 •  Low-latency / Real-time applications: –  Real-time fraud detection –  Financial market data processing for real-time risks and triggers –  Real-time campaign management for real-time bidding (RTB)
  • 32. Thank You! Get Started: http://concord.io shinji@concord.io / @shinjikim @concord @databythebay #datagrid