Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Twitter Heron
@Scale
KARTHIK RAMASAMY
@KARTHIKZ
#TwitterHeron
BEGIN
END
HERON
OVERVIEW
!
I
MICRO STREAM
ENGINE
(II
CONCLUSION
K
V
HERON
RESOURCE
USAGE
ZIV
TALK OUTLINE
STRAGGLERS
bIII
TWITTER IS REAL TIME
G
Emerging break out
trends in Twitter (in the
form #hashtags)
Ü
Real time sports
conversations relat...
HERON TERMINOLOGY
TOPOLOGY
Directed acyclic graph
Vertices=computation, and edges=streams of data tuples
SPOUTS
Sources of...
HERON TOPOLOGY
%
%
%
%
%
SPOUT 1
SPOUT 2
BOLT 1
BOLT 2
BOLT 3
BOLT 4
BOLT 5
HERON DESIGN DECISIONS
FULLY API COMPATIBLE WITH STORM
Directed acyclic graph
Topologies, spouts and bolts
USE OF MAIN STR...
HERON ARCHITECTURE
Topology 1
TOPOLOGY
SUBMISSION
Scheduler
Topology 2
Topology 3
Topology N
TOPOLOGY ARCHITECTURE
Topology
Master
ZK
CLUSTER
Stream
Manager
I1 I2 I3 I4
Stream
Manager
I1 I2 I3 I4
Logical Plan,
Physi...
S1 B2
B3
TOPOLOGY EXECUTION
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2
B3 B4
O...
HERON SAMPLE TOPOLOGIES
Large amount of data
produced every day
Large cluster Several hundred
topologies deployed
Several billion
messages every d...
HERON USE CASES
REALTIME
ETL
REAL TIME
BI
PRODUCT
SAFETY
REAL TIME
TRENDS
REALTIME
ML
REAL TIME
MEDIA
REAL TIME
OPS
COMPONENTIZING
HERON
x
9
HETEROGENOUS ECOSYSTEM
Unmanaged Schedulers
Managed Schedulers State Managers
Job Uploaders
LOOKING BACK- MICROKERNELS
HERON- MICROSTREAM ENGINE
HARDWARE
BASIC INTER/INTRA IPC
Topology
Master
Stream
Manager
Instance
Metrics
Manager
Scribe Gr...
HERON - MICROSTREAM ENGINE
PLUG AND PLAY COMPONENTS
As environment changes, core does not change
MULTI LANGUAGE INSTANCES
...
HERON ENVIRONMENT
Local Scheduler
Local State Manager
Local Uploader
Aurora/Scheduler
Zookeeper State Manager
Packer Uploa...
STRAGGLERS
AND BACK
PRESSURE
x
9
STRAGGLERS
BAD HOST EXECUTION
SKEW
INADEQUATE
PROVISIONING
b  Ñ
Stragglers are the norm in a multi-tenant distributed syst...
APPROACHES TO HANDLE STRAGGLERS
SENDERS TO STRAGGLER DROP DATA
DETECT STRAGGLERS AND RESCHEDULE THEM
!
d
" SENDERS SLOW DO...
DROP DATA STRATEGY
UNPREDICTABLE AFFECTS
ACCURACY
POOR
VISIBILITY
b  Ñ
SLOW DOWN SENDER STRATEGY
PROVIDES
PREDICTABILITY
PROCESSES
DATA AT
MAXIMUM
RATE
REDUCE
RECOVERY
TIMES
HANDLES
TEMPORARY
S...
SLOW DOWN SENDER STRATEGY
% %
S1 B2 B3
%
B4
back pressure
S1 B2
B3
SLOW DOWN SENDERS STRATEGY
Stream
Manager
Stream
Manager
Stream
Manager
Stream
Manager
S1 B2
B3 B4
S1 B2
B3
S1 B2...
BACK PRESSURE IN PRACTICE
Read pointer Write pointer
Manually restart the container
BACK PRESSURE IN PRACTICE
IN MOST SCENARIOS BACK PRESSURE RECOVERS
Without any manual intervention
SOMETIMES USER PREFER D...
DETECTING BAD/FAULTY HOSTS
HERON
RESOURCE
USAGE
x
9
HERON RESOURCE USAGE
Event Spout Aggregate Bolt
60-100M/min
Filter
8-12M/min
Flat-Map
40-60M/min
Aggregate
Cache 1 sec
Out...
RESOURCE CONSUMPTION
Cores
Requested
Cores
Used
Memory
Requested
(GB)
Memory
Used
Redis 24 2-4 48 N/A
Heron 120 30-50 200 ...
RESOURCE CONSUMPTION
7%
9%
84%
Spout Instances Bolt Instances Heron Overhead
PROFILING SPOUTS
2%7%
16%
6%
6%
63%
Deserialize Parse/Filter Mapping Kafka Iterator Kafka Fetch Rest
PROFILING BOLTS
2%4%
5%
2%
19%
68%
Write Data Serialize Deserialize Aggregation Data Transport Rest
RESOURCE CONSUMPTION - BREAKDOWN
8%
11%
21%
61%
Fetching Data User Logic Heron Usage Writing Data
CURIOUS TO LEARN MORE…
Twitter Heron: Stream Processing at Scale
Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedige...
CURIOUS TO LEARN MORE…
VLDB 2015 Tutorial
INTERESTED IN HERON?
CONTRIBUTIONS ARE WELCOME!
https://github.com/twitter/heron
http://heronstreaming.io
HERON IS OPEN SO...
#
#ThankYou
FOR LISTENING
QUESTIONS
and
ANSWERS
R
$ Go ahead. Ask away.
Upcoming SlideShare
Loading in …5
×

Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasamy, Engineering Manager - Twitter

347 views

Published on

Twitter generates billions and billions of events per day. Analyzing these events in real time presents a massive challenge. Twitter designed and deployed a new streaming system called Heron. Heron has been in production nearly 2 years and is widely used by several teams for diverse use cases. This talk looks at Twitter's operating experiences and challenges of running Heron at scale and the approaches taken to solve those challenges.

Published in: Technology
  • Be the first to comment

Big Data Day LA 2016/ Big Data Track - Twitter Heron @ Scale - Karthik Ramasamy, Engineering Manager - Twitter

  1. 1. Twitter Heron @Scale KARTHIK RAMASAMY @KARTHIKZ #TwitterHeron
  2. 2. BEGIN END HERON OVERVIEW ! I MICRO STREAM ENGINE (II CONCLUSION K V HERON RESOURCE USAGE ZIV TALK OUTLINE STRAGGLERS bIII
  3. 3. TWITTER IS REAL TIME G Emerging break out trends in Twitter (in the form #hashtags) Ü Real time sports conversations related with a topic (recent goal or touchdown) " Real time product recommendations based on your behavior & profile real time searchreal time trends real time conversations real time recommendations Real time search of tweets s ANALYZING BILLIONS OF EVENTS IN REAL TIME IS A CHALLENGE!
  4. 4. HERON TERMINOLOGY TOPOLOGY Directed acyclic graph Vertices=computation, and edges=streams of data tuples SPOUTS Sources of data tuples for the topology Examples - Kafka/Kestrel/MySQL/Postgres BOLTS Process incoming tuples and emit outgoing tuples Examples - filtering/aggregation/join/arbitrary function , %
  5. 5. HERON TOPOLOGY % % % % % SPOUT 1 SPOUT 2 BOLT 1 BOLT 2 BOLT 3 BOLT 4 BOLT 5
  6. 6. HERON DESIGN DECISIONS FULLY API COMPATIBLE WITH STORM Directed acyclic graph Topologies, spouts and bolts USE OF MAIN STREAM LANGUAGES C++/JAVA/Python ! d " TASK ISOLATION Ease of debug ability/resource isolation/profiling
  7. 7. HERON ARCHITECTURE Topology 1 TOPOLOGY SUBMISSION Scheduler Topology 2 Topology 3 Topology N
  8. 8. TOPOLOGY ARCHITECTURE Topology Master ZK CLUSTER Stream Manager I1 I2 I3 I4 Stream Manager I1 I2 I3 I4 Logical Plan, Physical Plan and Execution State Sync Physical Plan CONTAINER CONTAINER Metrics Manager Metrics Manager
  9. 9. S1 B2 B3 TOPOLOGY EXECUTION Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 O(n2 ) O(k2 ) B4
  10. 10. HERON SAMPLE TOPOLOGIES
  11. 11. Large amount of data produced every day Large cluster Several hundred topologies deployed Several billion messages every day HERON @TWITTER 1 stage 10 stages Heron has been in production for 2 years
  12. 12. HERON USE CASES REALTIME ETL REAL TIME BI PRODUCT SAFETY REAL TIME TRENDS REALTIME ML REAL TIME MEDIA REAL TIME OPS
  13. 13. COMPONENTIZING HERON x 9
  14. 14. HETEROGENOUS ECOSYSTEM Unmanaged Schedulers Managed Schedulers State Managers Job Uploaders
  15. 15. LOOKING BACK- MICROKERNELS
  16. 16. HERON- MICROSTREAM ENGINE HARDWARE BASIC INTER/INTRA IPC Topology Master Stream Manager Instance Metrics Manager Scribe Graphite SCHEDULERSPISTATEMANAGERSPI
  17. 17. HERON - MICROSTREAM ENGINE PLUG AND PLAY COMPONENTS As environment changes, core does not change MULTI LANGUAGE INSTANCES Support multiple language API with native instances MULTIPLE PROCESSING SEMANTICS Efficient stream managers for each semantics EASE OF DEVELOPMENT Faster development of components with little dependency ! ! ! !
  18. 18. HERON ENVIRONMENT Local Scheduler Local State Manager Local Uploader Aurora/Scheduler Zookeeper State Manager Packer Uploader Mesos/Aurora Scheduler Zookeeper State Manager HDFS Uploader Laptop/Server Cluster/ProductionCluster/Testing
  19. 19. STRAGGLERS AND BACK PRESSURE x 9
  20. 20. STRAGGLERS BAD HOST EXECUTION SKEW INADEQUATE PROVISIONING b Ñ Stragglers are the norm in a multi-tenant distributed systems
  21. 21. APPROACHES TO HANDLE STRAGGLERS SENDERS TO STRAGGLER DROP DATA DETECT STRAGGLERS AND RESCHEDULE THEM ! d " SENDERS SLOW DOWN TO THE SPEED OF STRAGGLER
  22. 22. DROP DATA STRATEGY UNPREDICTABLE AFFECTS ACCURACY POOR VISIBILITY b Ñ
  23. 23. SLOW DOWN SENDER STRATEGY PROVIDES PREDICTABILITY PROCESSES DATA AT MAXIMUM RATE REDUCE RECOVERY TIMES HANDLES TEMPORARY SPIKES /b Ñ back pressure
  24. 24. SLOW DOWN SENDER STRATEGY % % S1 B2 B3 % B4 back pressure
  25. 25. S1 B2 B3 SLOW DOWN SENDERS STRATEGY Stream Manager Stream Manager Stream Manager Stream Manager S1 B2 B3 B4 S1 B2 B3 S1 B2 B3 B4 B4 S1 S1 S1S1
  26. 26. BACK PRESSURE IN PRACTICE Read pointer Write pointer Manually restart the container
  27. 27. BACK PRESSURE IN PRACTICE IN MOST SCENARIOS BACK PRESSURE RECOVERS Without any manual intervention SOMETIMES USER PREFER DROPPING OF DATA Care about only latest data ! d " SUSTAINED BACK PRESSURE Irrecoverable GC cycles Bad or faulty host
  28. 28. DETECTING BAD/FAULTY HOSTS
  29. 29. HERON RESOURCE USAGE x 9
  30. 30. HERON RESOURCE USAGE Event Spout Aggregate Bolt 60-100M/min Filter 8-12M/min Flat-Map 40-60M/min Aggregate Cache 1 sec Output 25-42M/min Redis
  31. 31. RESOURCE CONSUMPTION Cores Requested Cores Used Memory Requested (GB) Memory Used Redis 24 2-4 48 N/A Heron 120 30-50 200 180
  32. 32. RESOURCE CONSUMPTION 7% 9% 84% Spout Instances Bolt Instances Heron Overhead
  33. 33. PROFILING SPOUTS 2%7% 16% 6% 6% 63% Deserialize Parse/Filter Mapping Kafka Iterator Kafka Fetch Rest
  34. 34. PROFILING BOLTS 2%4% 5% 2% 19% 68% Write Data Serialize Deserialize Aggregation Data Transport Rest
  35. 35. RESOURCE CONSUMPTION - BREAKDOWN 8% 11% 21% 61% Fetching Data User Logic Heron Usage Writing Data
  36. 36. CURIOUS TO LEARN MORE… Twitter Heron: Stream Processing at Scale Sanjeev Kulkarni, Nikunj Bhagat, Maosong Fu, Vikas Kedigehalli, Christopher Kellogg, Sailesh Mittal, Jignesh M. Patel*,1 , Karthik Ramasamy, Siddarth Taneja @sanjeevrk, @challenger_nik, @Louis_Fumaosong, @vikkyrk, @cckellogg, @saileshmittal, @pateljm, @karthikz, @staneja Twitter, Inc., *University of Wisconsin – Madison ABSTRACT Storm has long served as the main platform for real-time analytics at Twitter. However, as the scale of data being processed in real- time at Twitter has increased, along with an increase in the diversity and the number of use cases, many limitations of Storm have become apparent. We need a system that scales better, has better debug-ability, has better performance, and is easier to manage – all while working in a shared cluster infrastructure. We considered various alternatives to meet these needs, and in the end concluded that we needed to build a new real-time stream data processing system. This paper presents the design and implementation of this new system, called Heron. Heron is now system process, which makes debugging very challenging. Thus, we needed a cleaner mapping from the logical units of computation to each physical process. The importance of such clean mapping for debug-ability is really crucial when responding to pager alerts for a failing topology, especially if it is a topology that is critical to the underlying business model. In addition, Storm needs dedicated cluster resources, which requires special hardware allocation to run Storm topologies. This approach leads to inefficiencies in using precious cluster resources, and also limits the ability to scale on demand. We needed the ability to work in a more flexible way with popular cluster scheduling software that allows sharing the cluster resources across different types of data Storm @Twitter Ankit Toshniwal, Siddarth Taneja, Amit Shukla, Karthik Ramasamy, Jignesh M. Patel*, Sanjeev Kulkarni, Jason Jackson, Krishna Gade, Maosong Fu, Jake Donham, Nikunj Bhagat, Sailesh Mittal, Dmitriy Ryaboy @ankitoshniwal, @staneja, @amits, @karthikz, @pateljm, @sanjeevrk, @jason_j, @krishnagade, @Louis_Fumaosong, @jakedonham, @challenger_nik, @saileshmittal, @squarecog Twitter, Inc., *University of Wisconsin – Madison Streaming@Twitter Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry, Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu Twitter, Inc. Abstract Twitter generates tens of billions of events per hour when users interact with it. Analyzing these events to surface relevant content and to derive insights in real time is a challenge. To address this, we developed Heron, a new real time distributed streaming engine. In this paper, we first describe the design goals of Heron and show how the Heron architecture achieves task isolation and resource reservation to ease debugging, troubleshooting, and seamless use of shared cluster infrastructure with other critical Twitter services. We subsequently explore how a topology self adjusts using back pressure so that the pace of the topology goes as its slowest component. Finally, we outline how Heron implements at most once and at least once semantics and we describe a few operational stories based on running Heron in production. 1 Introduction
  37. 37. CURIOUS TO LEARN MORE… VLDB 2015 Tutorial
  38. 38. INTERESTED IN HERON? CONTRIBUTIONS ARE WELCOME! https://github.com/twitter/heron http://heronstreaming.io HERON IS OPEN SOURCED FOLLOW US @HERONSTREAMING
  39. 39. # #ThankYou FOR LISTENING
  40. 40. QUESTIONS and ANSWERS R $ Go ahead. Ask away.

×