DATA HIGHWAY
Petabyte Scale Event Collection, Transport & Delivery
PRESENTED BY Nilam Sharma, Huibing Yin⎪ June 13, 2017
About Us
Nilam Sharma
Masters in CS
Huibing Yin
Ph.D. in Control
 Alumni of UIUC
 5 years in Yahoo
 Data Highway team
 Interested in distributed systems
Agenda
 What is Data Highway?
 Why?
 How?
 Challenge Stories
http://www.publicdomainpictures.net/view-image.php?image=41013
https://commons.wikimedia.org/wiki/File:WaterloopCompetitionHyperloopPod.jpg
What is Data Highway?
https://commons.wikimedia.org/wiki/File:WaterloopCompetitionHyperloopPod.jpg
A network of tubes transmitting data
Tenant1Tenant1
DC2 DC4
What is Data Highway? - Use Case
DC3
Tenant1Tenant1
DC1
Tenant2
A network of tubes transmitting data
DC4
DC1
DC2
DC3
Tenant1
Tenant2
Tenant1
What is Data Highway?
Router Gateway
Prism
Prism
GatewayRouter
Data Highway
What is Data Highway?
250
billion inbound events
800
terabytes input
What is Data Highway?
1.2
petabytes output
What is Data Highway?
30K
publisher hosts
What is Data Highway?
60+
tenants in production
What is Data Highway?
Data Highway
A platform that collects, aggregates and delivers data
250billion events
800terabytes input
1.2petabytes output
30Kpublisher hosts
60+tenants
Why Data Highway?
https://flic.kr/p/9u96mQ
Why Data Highway?
 Data aggregation and delivery at large scale (30K publishing hosts)
 Multi-tenant
 Customized for Yahoo requirements and technology
 Onboarding of new tenants
● Quick
● Easy
● Economic
Why Data Highway?
Salient Features
Collection
 Multi datacenter event
collection (including AWS
Regions)
 Flexible options to publish —
web-service and multilingual
API, server plugins
Delivery
 Multi data-center delivery with
business continuity support
 Batched HDFS file delivery
 Low latency streaming
delivery to Storm and Kafka
Why Data Highway?
Supporting Features
 Delivery completeness web-service for audits
 Rate limits for tenant isolation
 Adaptive limits to absorb traffic spikes
Why Data Highway?
Supporting Features
 Customizable filters per streaming endpoint
 Partition based on schema/time
 Fault tolerance for network and endpoint failures
 Schema Registry
Why Data Highway?
Additional Facts
 Data agnostic
 Event size — 1 MB max
 Accepts Avro or any Blob content
 Production H/W Footprint : ~500 hosts
Why Data Highway?
Tenant Data Highway
SLA
99% in < 5 minutes, 99.9999% in < 15 minutes
95% in < 1 second, 99.99% in < 5 seconds
Why Data Highway?
How? Data Highway Architecture
https://flic.kr/p/GYtbM
Technologies
Data Highway Architecture
DC4
DC1
DC2
DC3
Tenant1
Tenant2
Tenant1
Data Highway
Data Highway Architecture — Subsystem View
Data Highway Architecture — Subsystem View
DC3DC1
Router Gateway Prism
Tenant1
DC4DC2
Tenant2
Tenant1
Data Highway Architecture — Simple Single Tenant View
DC1
Tenant
RouterRouterRouter
GatewayGatewayGateway
PrismPrismPrism
DC3
Tenant
Customer code
DH API
(C++, Java, Perl)
DH Emitter Daemon Router
Data Highway Architecture — Emitter
Tenant Router
Gateway
Prism
Spool file
Spools only on failure
Router
Data Highway Architecture — Router
Tenant Router
Gateway
Prism
Spool
File
Spools only on failure
HTTP
server
Broker
Forwarder
Gateway1Emitter
Gateway2
Gateway
HTTP
server
Broker
Forwarder
Spool
File
PrismRouter
Data Highway Architecture — Gateway
Tenant Router
Gateway
Prism
Grid
Delivery
Agent
avro avro
Prism
HTTP
server
Broker
Streaming
Delivery
Agent
Spool
File
Gateway
Data Highway Architecture — Prism
Tenant Router
Gateway
Prism
Kafka
Delivery
Agent
Gateway
Prism
Data Highway Architecture — Metadata
Tenant Router
Metadata Store
Publisher
Data Flow
Metadata Flow
Counter
Service
Data Highway Architecture - Overall
DC4
DC1
DC2
DC3
Router Gateway
Prism
Tenant1
Tenant2
Tenant1
Prism
GatewayRouter
DC4
DC1
DC2
DC3
Tenant1
Tenant2
Tenant1
Router Gateway
Prism
Prism
GatewayRouter
Data Highway
Challenges
and what we have managed
to solve so far!!!
https://flic.kr/p/oKBh6a
Challenge Category: Improving Throughput
Gateway
HTTP
server
Broker
Forwarder
Spool
File
PrismRouter
Improving Throughput
Tenant Router
Gateway
Prism
Grid
Delivery
Agent
avro avro
Grid Delivery Agent
Understanding Event Flow
HDD
Gateway
avro
File Spooler
Thread
Thread
File Uploader
Thread
Thread
Spool avro
21
3
5
4
Broker
HTTP
server
Improving Throughput
File Uploader
Broker
Multiple threads fight for resources when reading/writing files to disk
File Spooler
Thread
Thread
Thread
Thread
Thread
https://clipartfest.com/
Challenge: Disk Contention
Improving Throughput
Multiple threads fight for resources when reading/writing files to disk
Broker
Solution: Disk I/O Separation
File Spooler
Thread
https://clipartfest.com/
https://clipartfest.com/
Improving Throughput: Disk Contention
Thread
Thread
File Uploader
Thread
Thread
File Uploader
File Spooler
Challenge: Namespace Explosion
Thread
Thread
Thread
Thread
https://clipartfest.com/
avro avroavro
Improving Throughput
Aggregate events over longer time,
e.g., 5 mins, 10 mins etc.
File Uploader
File Spooler
Solution: File Merging?
Thread
Thread
Thread
Thread
avro
avro
avro
avro
Improving Throughput: Namespace Explosion
https://clipartfest.com/
File Uploader
File Spooler
Better Solution: Single Thread Writing
Thread
Thread
Thread
Thread
Thread
avro avroavro
Improving Throughput: Namespace Explosion
Saves 10x namespace
https://clipartfest.com/
Improving Throughput: Summary
 Improve Disk I/O by disk separation
 Replace HDD with SSD for avro file creation
 Aggregate files over time and threads in File Spooler
Improved Throughput by 2x
Challenge Category: Controlling Throughput
Controlling Throughput: Challenges
 Heterogeneous publishers
 Traffic spikes
 Unbalanced receiver
Unanticipated
spikes
Controlling Throughput: Traffic Spike
DC4
DC1
DC2
DC3
Router Gateway
Prism
Tenant1
Tenant2
Tenant1
Prism
GatewayRouter
429
Solution: Rate Limiting
Better Solution: Adaptive Rate Limiting
Planned
Capacity
Tolerated
Traffic
Bytes
Data (bytes)
Controlling Throughput: Traffic Spike
Headroom
Controlling Throughput: Traffic Spike
DC4
DC1
DC2
DC3
Router Gateway
Prism
Tenant1
Tenant2
Tenant1
Prism
GatewayRouter
429
429
429
429 / 200 with
delay header
Solution: Rate Limiting In All Nodes
 Instant feedback
● Returns 429 if can’t accept more
● Used across all DH nodes
 Early feedback
● Specify delay in response header
● Used in Storm consumers
Receiver
Solution: Rate Limiting Summary
429
200 but
delay 20 ms
Controlling Throughput: Traffic Spike
Controlling Throughput
DC4
DC1
DC2
DC3
Router Gateway
Prism
Tenant1
Tenant2
Tenant1
Prism
GatewayRouter
Challenge: Unbalanced Receiver
Challenge: Unbalanced Receiver
Spout
Spout
Spout
Round-robin leads to
more connections
and higher latency for
some Spouts
Controlling Throughput
Prism
SDA
Solution: Penalty Based Load Balancing
Spout
Spout
Spout
Penalty = # active connections x latency weight
Controlling Throughput: Unbalanced Receiver
Prism
SDA
Controlling Throughput
DC4
DC1
DC2
DC3
Router Gateway
Prism
Tenant1
Tenant2
Tenant1
Prism
GatewayRouter
Challenge: Traffic Flood During Recovery
Extra traffic from
broker spool
Challenge: Traffic Flood During Recovery
Controlling Throughput
Router
HTTP
server
Broker Forwarder
GatewayEmitter
Spool File
Rate
Controller
Rate Controller
Solution: Rate Control
 Additive-Increase Multiplicative-Decrease (AIMD) like rate control
Controller
Success Rate
Estimator (S)
Throttle Rate
Estimator (T)
Request Rate
Limiter
(Token, R)
Can I send?
Yes / No
200
429
Token available?
Yes / No
Update S
Update T
Update R
Increase R by
fix amount
Decrease R:
Multiply
S/(S+T)
Update R
Controlling Throughput: Traffic Flood
Solution: Rate Control
Router
HTTP
server
Broker Forwarder
GatewayEmitter
Spool File
Rate
Controller
Controlling Throughput: Traffic Flood
Reduce 429
drastically
Controlling Throughput - Summary
 Adaptive rate limit
 Load balancing using penalty
 AIMD like rate control in sender
Challenge: More
 Event duplication
 Zero downtime deployments
 Schema Registry
 and more...
Questions

Data Highway Rainbow - Petabyte Scale Event Collection, Transport & Delivery at Yahoo