#SlimScalding - Less Memory is More Capacity

Event
Ingest
Service
Service A Other
Data
Service B
Feature
Switch
Service
Interaction Events
config
Realtime Bucketing
Counts
RDBMS
resultsDDG
Mobile
devices
https://blog.twitter.com/engineering/en_us/a/2015/twitter-experimentation-technical-overview.html

●
●
○
○
○
●
●
○
○
●
●
●
○
○
○
●
●
●

●
●
●
Word Count in Scalding:
TypedPipe.from(TextLine(inputPath))
.flatMap { line => tokenize(line) }
.groupBy { word => word } // word as key
.size // in each group, get the size
.write(TypedText.tsv[(String,
Long)](outputPath))

● Each Scalding flow has multiple steps in a DAG
● Steps are Hadoop MR jobs
● Users typically configure settings per flow
● Each production flow has a unique ID

● Different steps can have different memory pressures
● myPipe.hashJoin(someRHS) // loads someRHS in memory on
every mapper
● myPipe.sumByKey(...) // performs map-side aggregation
● myPipe.groupBy(...).join(pipe2) // end up caching data in
memory during accumulation
● Within same step data skew can change footprint

● Accommodate different steps in same flow with very different
needs
● Users unsure of the right settings - cargo culting settings
● React to oncall pain at 3am - fix and forget
○ Over estimation - MegaBytes grow
○ Underestimation - GC pain - Millis grow
○ End result is MB*Millis keeps growing

Submit jobs to Hadoop Cluster
Scalding Job

Scalding Job
Submit jobs to Hadoop Cluster
● Store prior runs and counters - HRaven
● Scalding already does Reducer estimation
● Can we leverage this?

● Get N prior run counters - step, map / reduce phases
● Compute max(committed_heap_bytes)
● Exponential smoothing to derive heap sizes
● Derive container size on YARN boundary
● Set configurations per step!

● How do we fallback in case of failures? Unable to look
up history? Memory estimate inaccurate?
● Jobs change all the time
● Underestimation without OOM?
○ GC_Millis / CPU_Millis
● Scalding PR: #1667 - contributions / suggestions
welcome!

io.sort.mb
Map-side
Aggregation
buffers
Competing Settings

● Tuning is hard for users!
● Lots of potential resource usage savings
● Automation is great but needs to be fail-safe

#SlimScalding - Less Memory is More Capacity

#SlimScalding - Less Memory is More Capacity

More Related Content

Similar to #SlimScalding - Less Memory is More Capacity

More from Gera Shegalov

Recently uploaded

#SlimScalding - Less Memory is More Capacity