Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Extending The Yahoo Streaming Benchmark to Apache Apex

1,266 views

Published on

Extending Yahoo Streaming computation Benchmark to Apache Apex

- Application topology

- Comparison of results between Storm, Flink and Apex

- Variation of the Apex Benchmarking App with event time and 'results query' support

Published in: Technology
  • Be the first to comment

Extending The Yahoo Streaming Benchmark to Apache Apex

  1. 1. Extending the Yahoo Streaming Benchmark for Apache Apex San Jose Apache Apex Meetup May 4th 2016 Sandesh Hegde sandesh@apache.org
  2. 2. Background • Yahoo created a benchmark to compare Stream processing systems and compared Storm, Flink and Spark Streaming [1] • dataArtisans extended the benchmark by comparing Flink and Storm with different scenarios [2] • No benchmark comparison about Stream processing is complete without including Apache Apex. 2
  3. 3. Yahoo Streaming Benchmark Simple Advertisement Application : To see how many times an ad campaign has been seen in an window. • Read ads from Kafka • Deserialize JSON string • Filter unnecessary ads • Projection of Fields ( remove non-essential fields ) • Join ad id with campaign id from Redis • Windowed count per campaign and output to Redis 3
  4. 4. Application - with Kafka 4 Kafka Input Deserialize FilterKafka Redis OutputRedis JoinFilter Fields
  5. 5. Setup • Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz • 10GigE Between compute nodes • 4 Kafka Brokers ( 2 Partitions each & 1 Replica ) • Kafka Version : 0.8.2 • Apex ( 3.4-SNAPSHOT & 3.3 ) & Flink ( 1.0.2 ) • Yarn-Containers size: 16GB • 1 ZooKeeper • Message Size: 218 Bytes • Sample Message: {"user_id":"e5e0db4b-05ea-4ac5-af7a-4bba5ed27c4c"," page_id":"80f60d0a-b02b-40e2-a667-5548a1120dda","ad_id":" 600589859","ad_type":"banner78","event_type":"purchase","event_time":" 1462374087774","ip_address":"1.2.3.4"} 5
  6. 6. Apex Application 6
  7. 7. Physical Plan 7
  8. 8. Quick Primer on Locality 8 • CONTAINER_LOCAL ■ Deployed in the same process, different threads ■ No serialization ■ Queue between the operators • THREAD_LOCAL ■ Same thread ■ No serialization ■ Use it only when operators do light work Note: [New feature] Anti Affinity is not covered here.
  9. 9. Benchmarking Against Previous Releases 9 https://www.datatorrent.com/blog/blog-apex-performance-benchmark/ Part of Release Certification
  10. 10. Application : with Kafka 10 https://github.com/sandeshh/streaming-benchmarks
  11. 11. Application - With Generator 11 Kafka Input Deserialize FilterKafka Redis OutputRedis JoinFilter Fields Generator
  12. 12. Application - With Generator 12 https://github.com/sandeshh/streaming-benchmarks Setup: Single Partition
  13. 13. State of the Art & Streaming 13 Generator Filter Redis OutputRedis JoinFilter Fields What’s our recommendation to query the State? In memory Key-Value store in the operators?
  14. 14. Application - State Store & Query 14 Generator Filter Dimensional Computation Redis JoinFilter Fields Store (HDHT) QueryResult 1. Durable state ( HDHT is a key value store native to Hadoop ) [4] 2. Single System, scales with your application 3. Easy integration with external Consoles [7] 4. Low operability cost 5. Complex Dimensional Computation [5][6]
  15. 15. Demo 15
  16. 16. Q&A 16
  17. 17. References 17 1. https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at 2. http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ 3. https://www.datatorrent.com/blog/blog-apex-performance-benchmark/ 4. https://www.datatorrent.com/blog/data-store-for-scalable-stream-processing/ 5. https://www.datatorrent.com/blog/blog-dimensions-computation-aggregate-navigator-part-1-intro/ 6. https://www.datatorrent.com/blog/dimensions-computation-aggregate-navigator-part-2- implementation/ 7. http://docs.datatorrent.com/app_data_framework/
  18. 18. © 2016 DataTorrent Resources 18 • Apache Apex website - http://apex.apache.org/ • Subscribe - http://apex.apache.org/community.html • Download - http://apex.apache.org/downloads.html • Twitter - @ApacheApex; Follow - https://twitter.com/apacheapex • Facebook - https://www.facebook.com/ApacheApex/ • Meetup - http://www.meetup.com/topics/apache-apex • Free Enterprise License for Startups - https://www.datatorrent.com/product/startup- accelerator/
  19. 19. © 2016 DataTorrent We Are Hiring 19 • jobs@datatorrent.com • Developers/Architects • QA Automation Developers • Information Developers • Build and Release • Community Leaders

×