Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Concord: Simple & Flexible
Stream Processing on Apache Mesos
Shinji Kim
Co-founder, Concord Systems
@concord
@databythebay...
Overview
•  What is Stream Processing?
•  Today’s Stream Processing
•  Introducing Concord
1. Concepts & API
2. Job Topolo...
What is stream processing?
Page 3
•  Processing Data in motion
•  Sits between message queues and databases
•  Used for fa...
Today’s Stream Processing
•  Faster MapReduce jobs à ends up running core
business logic on top
–  Fradulent click detect...
Introducing Concord
Concord is a distributed stream processing framework
built in C++ on top of Apache Mesos, designed for...
Introducing Concord
Page 6
Data	
  Sources	
   Data	
  Sinks	
  
Pub / Sub Operator Model
•  Composable jobs by Metadata
A	
   B	
  
words	
  Metadata(
Name=‘A’,
istreams=[],
ostreams=[‘w...
Pub / Sub Operator Model
•  Composable jobs by Metadata
A	
   B	
  
words	
  Metadata(
Name=‘A’,
istreams=[],
ostreams=[‘w...
Simple API in Multiple Languages
•  ProcessRecord, ProduceRecord, ProcessTimer
•  GetState, SetState backed by Rocksdb
•  ...
Useful for multiple teams to consume the same
streaming data in real-time
Page 10
Native Integration with Apache Mesos
Page 11
•  Dynamic resource
scheduling
•  Task Isolation
•  Task supervision
•  High ...
Containerized Execution Environment
•  Horizontal scaling
•  Multi-tenancy
•  Hot code deployment &
dynamic topology
Page ...
Concord is Flexible: Run-time deployment
Page 13
Concord is Flexible: Run-time deployment
Page 14
Concord is Flexible: Run-time deployment
Page 15
Concord is Flexible: Run-time deployment
Page 16
Concord supports Distributed Tracing
Page 17
Monitor all operator instances at glance
Page 18
Concord supports Transparent Debugging
[2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000:
traceId: -...
Concord performs well at scale
•  Word count benchmark (1.13B msgs)
–  Concord: 500K QPS/node at 10ms/event
–  Storm: 16K ...
Concord is designed for Predictability
•  As you scale, JVM reconfiguration and GC pauses are
inevitable (Framework GC vs....
Message Delivery Guarantees
Today: Fast > Complete or Perfect
•  Best-effort / at-most-once processing
–  When operator or...
Message Delivery Guarantees
Soon: Fast + Complete > Perfect
•  In development for at-least-once with Kafka
–  Kafka acts a...
Future plans
•  “At least once” guarantee support with Kafka
•  DC/OS integration
•  More data source / data sink connecto...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 25
•  Operator model that you can use mult...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 26
•  Operator model that you can use mult...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 27
•  Operator model that you can use mult...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 28
•  Operator model that you can use mult...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 29
•  Operator model that you can use mult...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 30
•  Operator model that you can use mult...
Concord: Simple & Flexible streaming application
framework on Apache Mesos
Page 31
•  Low-latency / Real-time applications...
Thank You!
Get Started: http://concord.io
shinji@concord.io / @shinjikim
@concord
@databythebay #datagrid
Upcoming SlideShare
Loading in …5
×

Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016

1,262 views

Published on

If you’re trying to process financial market data, monitor IoT sensor metrics or run real-time fraud detection, you’ll be thinking of stream processing. Stream processing sounds wonderful in concept, but scaling and debugging stream processing frameworks on distributed systems can be a nightmare. In clustered environments, your logs are scattered across many different computers making errors and strange behaviors are hard to trace. On frameworks like Apache Storm, the many layers of abstraction make it difficult to predict performance and do capacity planning. In micro batching frameworks like Spark Streaming, stateful aggregations can be a hassle. Moreover, in most of the existing frameworks, changing a single line of code requires a full topology redeploy causing operational strain. Concord strives to solve all the challenges above. In this talk, you’ll learn how Concord differs from other stream processing frameworks and how Concord can provide flexibility, simplicity, and predictable performance with help from Apache Mesos.

https://databythebay2016.sched.org/event/6EPy/concord-simple-amp-flexible-stream-processing-on-apache-mesos

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Concord: Simple & Flexible Stream Processing on Apache Mesos: Data By The Bay May 2016

  1. 1. Concord: Simple & Flexible Stream Processing on Apache Mesos Shinji Kim Co-founder, Concord Systems @concord @databythebay #datagrid
  2. 2. Overview •  What is Stream Processing? •  Today’s Stream Processing •  Introducing Concord 1. Concepts & API 2. Job Topology Management 3. Operations, Toolings, Performance 4. Message Delivery Guarantees •  Future Development Plans Page 2
  3. 3. What is stream processing? Page 3 •  Processing Data in motion •  Sits between message queues and databases •  Used for faster: –  Data enrichment –  Aggregation –  Filtering / deduplication
  4. 4. Today’s Stream Processing •  Faster MapReduce jobs à ends up running core business logic on top –  Fradulent click detection –  Real-time budget updates –  Trigger-based trading •  Your stream processing jobs are more like microservices •  Need support for services / application management: Cluster mgmt, Monitoring, Debuggability Page 4
  5. 5. Introducing Concord Concord is a distributed stream processing framework built in C++ on top of Apache Mesos, designed for high-performance, real-time applications that require flexibility & control. Page 5
  6. 6. Introducing Concord Page 6 Data  Sources   Data  Sinks  
  7. 7. Pub / Sub Operator Model •  Composable jobs by Metadata A   B   words  Metadata( Name=‘A’, istreams=[], ostreams=[‘words’]) Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[]) Page 7
  8. 8. Pub / Sub Operator Model •  Composable jobs by Metadata A   B   words  Metadata( Name=‘A’, istreams=[], ostreams=[‘words’]) Metadata( Name=‘B’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[]) Page 8 C   Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.SHUFFLE], ostreams=[])
  9. 9. Simple API in Multiple Languages •  ProcessRecord, ProduceRecord, ProcessTimer •  GetState, SetState backed by Rocksdb •  API available in Python, Ruby, Go, Java/Scala, C++ B  Metadata( Name=‘C’, istreams=[‘words’, StreamGrouping.GROUP_BY], ostreams=[‘wordcount’]) Page 9 words   wordcount   Key   Value   Corgi   2   Chiwawa   4   Dashhound   5  
  10. 10. Useful for multiple teams to consume the same streaming data in real-time Page 10
  11. 11. Native Integration with Apache Mesos Page 11 •  Dynamic resource scheduling •  Task Isolation •  Task supervision •  High Availability
  12. 12. Containerized Execution Environment •  Horizontal scaling •  Multi-tenancy •  Hot code deployment & dynamic topology Page 12 Mesos  Agent   RocksDB  
  13. 13. Concord is Flexible: Run-time deployment Page 13
  14. 14. Concord is Flexible: Run-time deployment Page 14
  15. 15. Concord is Flexible: Run-time deployment Page 15
  16. 16. Concord is Flexible: Run-time deployment Page 16
  17. 17. Concord supports Distributed Tracing Page 17
  18. 18. Monitor all operator instances at glance Page 18
  19. 19. Concord supports Transparent Debugging [2015-11-02 15:36:44.770] [dispatcher_latencies] [info] 127.0.0.1:31000: traceId: -8816532120874703981, parentId: 0, id: -6816766813334129096, p50: 388179us, p95: 519668us, p99: 524812us, p999: 526425us [2015-11-02 15:37:13.929] [principal_latencies] [info] 127.0.0.1:31001: traceId: -4811311467074699790, parentId: -7681059555040553620, id: -1899872683843643522, p50: 73355us, p95: 145626us, p99: 210345us, p999: 272018us [2015-11-02 15:36:43.323] [incoming_throughput] [info] 12288 req in 1045515us. total: 367616 req [2015-11-02 15:36:30.240] [outgoing_throughput] [info] 100000 req in 4804526us. total: 600000 req Page 19
  20. 20. Concord performs well at scale •  Word count benchmark (1.13B msgs) –  Concord: 500K QPS/node at 10ms/event –  Storm: 16K QPS/node at 100ms/event –  Spark Streaming: 100K QPS/node at 1s batch window •  Server log processing (29G server log, ~260M msgs) –  4 nodes, 8 vCPU, 32GB RAM each –  Concord: 1M – 1.8M QPS –  Spark Streaming: 72K – 2M QPS •  Consistent performance Page 20
  21. 21. Concord is designed for Predictability •  As you scale, JVM reconfiguration and GC pauses are inevitable (Framework GC vs. Application GC) •  Cluster abstracted as CPU, Memory, Disk numbers à cluster optimization & overall runtime •  Fast Compile à Test à Deploy cycle without downtime Page 21
  22. 22. Message Delivery Guarantees Today: Fast > Complete or Perfect •  Best-effort / at-most-once processing –  When operator or node crashes, the local cache goes away –  Automatically retries the failed operator (number of retries is configurable) –  Recommends implementing check mechanisms in operators (e.g., Concord Kafka consumer) Page 22
  23. 23. Message Delivery Guarantees Soon: Fast + Complete > Perfect •  In development for at-least-once with Kafka –  Kafka acts as a message bus between operators –  Kafka replays data from checked offset (data duplication) Eventually: Fast + Complete + Perfect •  Transactional datastore in design phase Page 23
  24. 24. Future plans •  “At least once” guarantee support with Kafka •  DC/OS integration •  More data source / data sink connector support •  Higher level DSL Page 24
  25. 25. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 25 •  Operator model that you can use multiple languages
  26. 26. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 26 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data
  27. 27. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 27 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling
  28. 28. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 28 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work
  29. 29. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 29 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work •  High performance at scale
  30. 30. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 30 •  Operator model that you can use multiple languages à Fast development and iteration time for multiple teams using the same data •  Dynamic topology, run-time deployment and scaling à Decoupled development & dev ops work •  High performance at scale à Predictable system for real-time applications
  31. 31. Concord: Simple & Flexible streaming application framework on Apache Mesos Page 31 •  Low-latency / Real-time applications: –  Real-time fraud detection –  Financial market data processing for real-time risks and triggers –  Real-time campaign management for real-time bidding (RTB)
  32. 32. Thank You! Get Started: http://concord.io shinji@concord.io / @shinjikim @concord @databythebay #datagrid

×