Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Extending the Yahoo Streaming Benchmark

2,142 views

Published on

This presentation covers describes my own benchmarking of Apache Storm and Apache Flink based on the work started by Yahoo! It shows the incredible performance of Apache Flink

Published in: Data & Analytics

Extending the Yahoo Streaming Benchmark

  1. 1. Extending the Yahoo! Streaming Benchmark Jamie Grier @jamiegrier jamie@data-artisans.com
  2. 2. Who am I? • Director of Applications Engineering at data Artisans • Previously working on streaming computation at Twitter, Gnip and Boulder Imaging • Involved in various kinds of stream processing for about a decade • High-speed video, social media streaming, general frameworks for stream processing
  3. 3. Overview • Yahoo! performed a benchmark comparing Apache Flink, Storm and Spark • The benchmark never actually pushed Flink to it’s throughput limits but stopped at Storms limits • I knew Flink was capable of much more so I repeated the benchmarks myself • I did a follow up blog post explaining my findings and will summarize them here
  4. 4. Yahoo! Benchmark • Count ad impressions grouped by campaign • Compute aggregates over a 10 second window • Emit current value of window aggregates to Redis every second for query • Map ads to campaigns using Redis as well
  5. 5. Any questions so far?
  6. 6. Storm Code
  7. 7. Flink Code
  8. 8. Hardware Specs • 10 Kafka brokers with 2 partitions each • 10 compute nodes (Flink / Storm) • Each machine has 1 Xeon E3-1230-V2@3.30GHz CPU • 4 cores w/ hyperthreading • 32 GB RAM (only 8GB allocated to JVMs) • 10 GigE Ethernet between compute nodes • 1 GigE Ethernet between Kafka cluster and compute nodes
  9. 9. Logical Deployment Data Generat or Kafka Source Filter Project Join Redis Windo w Sink Redis Stream Processor
  10. 10. Redis Apache Storm Deployment Kafka Kafka Kafka Source Filter Project Join Window Sink Flink Data Generator Redis Shuffle Apache Storm 10 Gige Link 1 Gige Link
  11. 11. Redis Kafka Kafka Kafka Source Filter Project Join Window Sink Flink Data Generator Redis Shuffle 10 Gige Link 1 Gige Link
  12. 12. Redis Kafka Kafka Kafka Source / Filter Project Join Window Sink Flink Data Generator Redis Shuffle 10 Gige Link 1 Gige Link
  13. 13. Redis Kafka Kafka Kafka Source / Filter / Project Join Window Sink Flink Data Generator Redis Shuffle 10 Gige Link 1 Gige Link
  14. 14. Redis Kafka Kafka Kafka Source / Filter / Project / Join Window Sink Flink Data Generator Redis Shuffle 10 Gige Link 1 Gige Link
  15. 15. Redis Kafka Kafka Kafka Window / Sink Flink Data Generator Redis Shuffle Source / Filter / Project / Join 10 Gige Link 1 Gige Link
  16. 16. Redis Kafka Kafka Kafka Flink Data Generator Redis Shuffle Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link
  17. 17. Redis Kafka Kafka Kafka Flink Data Generator Redis Shuffle Apache Flink Deployment Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link
  18. 18. Processing Guarantees Apples and Oranges Apache Storm Apache Flink At least once semantics Exactly once semantics Double counting after failures No double counting Lost state after failures No state loss
  19. 19. Benchmark 0 750,000 1,500,000 2,250,000 3,000,000 3,750,000 Storm Flink Throughput: msgs/sec Baseline
  20. 20. Bottleneck Analysis Apache Storm Kafka Kafka Kafka Source Filter Project Join Window Sink Flink Data Generator Shuffle Apache Storm 10 Gige Link 1 Gige Link Redis Redis
  21. 21. Bottleneck Analysis Apache Storm Kafka Kafka Kafka Source Filter Project Join Window Sink Flink Data Generator Shuffle Apache Storm 10 Gige Link 1 Gige Link Redis Redis CPU
  22. 22. Redis Kafka Kafka Kafka Flink Data Generator Redis Shuffle Bottleneck Analysis Apache Flink Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link
  23. 23. Redis Kafka Kafka Kafka Flink Data Generator Redis Shuffle Bottleneck Analysis Apache Flink Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link Network
  24. 24. Redis Kafka Kafka Kafka Flink Data Generator Redis Shuffle Eliminate the Bottleneck Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link
  25. 25. Redis Flink Data Generator Redis Shuffle Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link Eliminate the Bottleneck
  26. 26. Redis Redis Shuffle Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link Data Generator Eliminate the Bottleneck
  27. 27. Redis Redis Shuffle Apache Flink Window / SinkSource / Filter / Project / Join 10 Gige Link 1 Gige Link Data Generator Apache Flink Deployment Round 2
  28. 28. Benchmark 0 750,000 1,500,000 2,250,000 3,000,000 3,750,000 Storm Flink Throughput: msgs/sec Baseline
  29. 29. Benchmark Round 2 0 4,000,000 8,000,000 12,000,000 16,000,000 Storm Flink Flink (10 GigE) Throughput: msgs/sec 10 GigE end-to-end
  30. 30. Results • Apache Flink achieved 15 million messages / sec on Yahoo! benchmark • Much stronger processing guarantees: Exactly once • 80x higher than what was reported in the original Yahoo! benchmark on similar hardware
  31. 31. Questions?
  32. 32. Storm Compatibility • Lot’s of companies already have applications written using the Storm API • Flink provides a Storm compatibility layer • Run your Storm jobs on Flink with a one line code change • Flink also allows you to reuse your existing Storm spout and bolt code from a Flink job • Give it a try!
  33. 33. Thanks!

×