Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Twitter Heron. Evolution or Revolution


Published on

Twitter is a stream processing precursor. They process ones of the biggest streams in the internet. Storm introduction 5 years ago was a revolution in real-time distributed computing but after few years Twitter decided to replace it. What were Storm issues? Why Heron has been implemented? Why using existing engines was not an option (eg Samza, Spark, Flink) ? Is this a revolution in open source stream processing? I will give you a brief overview of Twitter stream processing history with interesting technical details.

Published in: Engineering
  • Be the first to comment

Twitter Heron. Evolution or Revolution

  1. 1. Twitter Heron Evolution or Revolution? Analytics Conf, November 15-16, 2016
  2. 2. Grzegorz Kolpuc @gkolpuc
  3. 3. There are 310M monthly active users
  4. 4. A total of 1.3 billion accounts have been created
  5. 5. There are 500 million Tweets sent each day. That’s 6,000 Tweets every second.
  6. 6. Enable analytics: scoring, stats, trends, recommendations, real-time reporting
  7. 7. A long time ago...
  8. 8. What is Storm?
  9. 9. What is Storm? Storm makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing (DAG processing engine)
  10. 10. 2011 : Twitter acquires @BackType
  11. 11. Storm at Twitter (2013) Benchmarked at a million tuples processed per second Running 30 topologies in a 200 node cluster Processing 50 billion messages a day with an average complete latency under 50ms
  12. 12. Storm is very powerful, but...
  13. 13. Apache Storm issues Performance ● Every worker is homogeneous, which results in inefficient utilization of allocated resources ● There is no backpressure mechanism ● Topologies using a large amount of RAM for a worker encounter gc cycles greater than a minute Debugging Each worker runs a mix of tasks Logs from multiple tasks are written into a single file Each tuple has to pass through four threads in the worker process from the point of entry to the point of exit Scheduling ● Multiple level of scheduling ● Single task takes down the whole worker process ● Nimbus is a single point of failure
  14. 14. Enhancing Storm would take too long and no other system met their scaling, throughput and latency needs. Plus, other systems are not compatible with Storm’s API, requiring rewriting all topologies. The decision was to create Heron, but keep its external API compatible with Storm’s. Twitter approach...
  15. 15. Flying faster with Twitter Heron Tuesday, June 2, 2015 | By Karthik Ramasamy (@karthikz), Engineering Manager
  16. 16. Flying Faster with Twitter Heron Scheduler Pluggable solution. Fit to Twitter infrastructure: Apache Mesos + Apache Aurora Back Pressure Automatically slows down on tuples producing when queues overloaded Easy Debugging Moved from typical thread-based system to process-based system (running each tusk in isolation) Compatibility with Storm Easy migration from Storm to Heron
  17. 17. Heron Performance We compared the performance of Heron with Twitter’s production version of Storm, which was forked from an open source version in October 2013, using word count topology. This topology counts the distinct words in a stream generated from a set of 150,000 words.
  18. 18. Heron at Twitter At Twitter, Heron is used as our primary streaming system, running hundreds of development and production topologies. Since Heron is efficient in terms of resource usage, after migrating all Twitter’s topologies to it we’ve seen an overall 3x reduction in hardware, causing a significant improvement in our infrastructure efficiency.
  19. 19. Heron Topology Topology master Stream Manager Stream Manager Metrics Manager Metrics Manager I1 I2 I3 I4 I1 I2 I3 I4 ZK Cluster Nimbus Supervisor Supervisor W1 W2 W3 W4 W1 W2 W3 W4 ZK Cluster Storm Topology
  20. 20. W1 W2 W3 W4 W1 W2 W3 W4 Topology master Stream Manager Stream Manager Metrics Manager Metrics Manager I1 I2 I3 I4 I1 I2 I3 I4 Heron Topology ZK Cluster Nimbus Supervisor Supervisor ZK Cluster Storm Topology Scheduler Uploader Heron Tracker
  21. 21. Open Sourcing Twitter Heron Wednesday, May 25, 2016 | By Karthik Ramasamy (@karthikz), Engineering Manager
  22. 22. Inside Heron Written in Java & Python (~80%) Critical parts of the framework, the code that manages the topologies and network communications are not written in a JVM language
  23. 23. In the meantime...
  24. 24. Storm has evolved Heron's speed improvements are measured from the Storm 0.8.x code it diverged from, not the current version; if you have migrated over to Storm 1.0 already, you might not see much more improvement over your current Storm topologies, and you may run into incompatibilities between the implementation of new features like back-pressure support between Storm and Heron
  25. 25.
  26. 26. Storm has evolved ➢ Support for back pressure ➢ Introduced pacemaker (daemon for offloading heartbeat traffic from ZooKeeper, freeing larger topologies from the infamous ZooKeeper bottleneck) ➢ Nimbus HA ➢ Distributed cache
  27. 27. Storm has evolved ➢ improved debugging and profiling options ➢ 60 percent decrease in latency ➢ up to 16x speed improvement.
  28. 28. When to use Storm? ➢ Want to avoid infrastructure configuration overhead (Heron is currently tied to Mesos, so if you don't have existing Mesos infrastructure, you'll need to set that up as well, which is no small undertaking) ➢ Don’t need extremely large scale ➢ DRPC (deprecated in Heron) ➢ More ready to use integrations
  29. 29. When to use Heron? ➢ Have Mesos infrastructure ➢ Larger scale ➢ Running multiple clusters
  30. 30. Evolution or Revolution?
  31. 31. Q&A @gkolpuc