Hadoop 2 @Twitter, Elephant Scale. Presented at

Hadoop 2 @Twitter,
Elephant Scale
Lohit VijayaRenu Gera Shegalov
@lohitvijayarenu @gerashegalov
@TwitterHadoop
1 / 29 v1.0

About this talk
Share @twitterhadoop’s efforts, experience and learning in
moving thousand users and multi petabyte workloads from
Hadoop 1 to Hadoop 2
@twitterhadoop
2 / 29 v1.0

Use cases
Personalization
Graph analysis, Recommendations, Trends, User/topic modeling
Analytics
a/b testing, user behavior analysis, api analytics
Growth
Network Digest, People Recommendations, Email
Revenue
Engagement prediction, Ad targeting, ads analytics, marketplace optimization
Nielsen Twitter TV Rating
Tweet impressions processing
Backups & Scribe Logs
MySQL backups, Manhattan backups, FrontEnd scribe logs
Many more...
@twitterhadoop
3 / 29 v1.0

Hadoop and Data pipeline
TFE
hadoop real
time
hadoop
processing
hadoop
warehouse
hadoop
cold
hadoop
backupsSearch,
Ads, etc Partners
MySQL
hadoop
hbase
Vertica
Manhatta
n
hadoop
tst
@twitterhadoop
SVN, Git,
...
hadoop
tst
4 / 29 v1.0

Elephant Scale
➔ Tens of thousands Hadoop servers
(Mix of hardware)
➔ Hundreds of thousands of disk drives
➔ Few hundred PB data stored in
HDFS
➔ Hundreds of thousands of daily
hadoop jobs
➔ Tens of millions of daily hadoop tasks
@twitterhadoop
Individual Cluster Stats
➔ More than 3500 nodes
➔ 30-50+ PB data stored in HDFS
➔ 35K RPC/second on NNs
➔ 30K+ jobs per day
➔ 10M+ tasks per day
➔ 6PB+ data crunched per day
5 / 29 v1.0

Hadoop 1 Challenges (Q4-2012)
Growth:
Supporting twitter growth,
Request for new features on
older branch, new JAVA
Scalability:
NameNode files/blocks, NN
Operations, GC pause,
Checkpointing
JobTracker GC pause, task
assignment
Reliability:
SPOF NN and JT, NameNode
restart delays
Efficiency:
Slot utilization, QoS, Multi
Tenant, New features &
frameworks
Maintenance:
Old codebase, Numerous issues
fixed in later versions, dev
branch
. @twitterhadoop
6 / 29 v1.0

Hadoop 2 Configuration (Q1-2013)
NodeManager
DataNode
NodeManager
DataNode
NodeManager
DataNode
YARN ResourceManager
JN JN JN JN JN JN
ViewFS, HDFS Balancer, Admin tools, hRaven, Metrics Alerts
……. …….
logs user tmp Trash
@twitterhadoop
TrashTrash
7 / 29 v1.0

Hadoop 2 Migration (Q2-Q4 2013)
Phase 1 :
Testing
Phase 3 :
Production
Phase 2 :
Semi production
➔ Apache 2.0.3 branch
➔ New Hardware*, New
OS and JVM
➔ Benchmarks and user
jobs (lots of them…)
➔ Dependent
component updates
➔ Data movement
between different
versions
➔ Metrics, Alerts and tools
➔ Production use cases
running in 2 clusters in
parallel.
➔ Tuning/parameter updates
and learnings
➔ Started contributing fixes
back to community
➔ Educating users about new
version and changes
➔ Benefits of Hadoop 2
➔ Stable Apache 2.0.5
release with many
fixes and backports
➔ Multiple internal
releases
➔ Template for new
clusters
➔ Ready to roll Apache
2.3 release
*http://www.slideshare.net/Hadoop_Summit/hadoop-hardware-twitter-size-does-matter
@twitterhadoop
8 / 29 v1.0

CPU Utilization
Hadoop 1 CPU
Utilization for
one day. (45%
peaks)
Hadoop 2 CPU
Utilization for
one day. (85%
peaks)
@twitterhadoop
9 / 29 v1.0

Memory Utilization
Hadoop 1
Memory
Utilization for
one day (68%
peaks)
Hadoop 2
Memory
Utilization for
one day (96%
peaks)
@twitterhadoop
10 / 29 v1.0

Migration Challenge: web-based FS
Need a web-based FS to deal with H1/H2 interactions
● Hftp based on cross-DC LogMover experience
● Apps broken due to no FNF on non-existing paths
HDFS-6143
● Faced challenges cross-version checksums
@twitterhadoop
11 / 29 v1.0

Migration Challenge: hard-coded FS
1000’s of occurrences hdfs://${NN}/path and absolute URIs
● For cluster1 dial hdfs://hadoop-cluster1-nn.dc CNAME
● For cluster2 dial …
Ideal: use logical paths and viewfs as defaultFS
More realistic and faster:
● HDFSCompatibleViewFS HADOOP-9985
@twitterhadoop
12 / 29 v1.0

Migration Challenge: Interoperability
Migration in progress: H1 job requires input from H2
● hftp://OMGwhatNN/has/my/path problem
● ideal: use viewfs on H1 resolving to correct H2-NN
● realistic: see above “hardcoded FS”
● Even if you know OMGwhatNN, is it active?
@twitterhadoop
13 / 29 v1.0

StandbyActive
Cluster
CNAME
H1 client
Active Standby Active Standby
Load client-side mounttable on
the server side:
1. redirect to the right
namespace
2. redirect to active within
namespace
@twitterhadoop
14 / 29 v1.0

Migration: Tools and Ecosystem
● Port/recompile/package:
o Data Access Layer/HCatalog,
o Pig,
o Cascading/Scalding
o ElephantBird
o hadoop-lzo
● PIG-3913 (local mode counters),
● Analytics team fixed PIG-2888 (performance)
● hRaven fixes:
o translation between slot_millis and mb_millis
@twitterhadoop
15 / 29 v1.0

HadOops found and fixed
● ViewFS can’t be used for public DistributedCache (DC)
o HADOOP-10191, YARN-1542
● getFileStatus RPC storm on public DC:
o YARN-1771
● No user-specified progress string in MR-AM UI task
o MAPREDUCE-5550
● Uberized jobs for scheduling small jobs great but ...
o can you kill them? MAPREDUCE-5841
o size correctly for map-only? YARN-1190
@twitterhadoop
16 / 29 v1.0

More HadOops
Incident: a job blacklists nodes by logging terabytes
● need capping, but userlog.limit.kb loses valuable log tail
● RollingFileAppender for MR-AM/tasks MAPREDUCE-
5672
@twitterhadoop
17 / 29 v1.0

Diagnostics improvement
App/Job/Task kill:
● DAG processors/users can say why
o MAPREDUCE-5648, YARN-1551
● MR-AM: “speculation”, “reducer preemption”
o MAPREDUCE-5692, MAPREDUCE-5825
● Thread Dumps
o On task timeout: MAPREDUCE-5044
o On demand from CLI/UI: MAPREDUCE-5784, ...
@twitterhadoop
18 / 29 v1.0

UX/UI improvements
● NameNode state and cluster stats
● App size in MB on RM Apps Page
● RM Scheduler UI improvements: queue descriptions,
bugs min/max resource calc.
● Task Attempt state filtering in MR-AM
HDFS-5928, YARN-1945, HDFS-5296...
@twitterhadoop
19 / 29 v1.0

YARN reliability improvements
● Unhealthy nodes / positive feedback
o drain containers instead of killing: YARN-1996
o don’t rerun maps when all reduces committed: MAPREDUCE-5817
● RM crashes JIRA fixed either just internally or public
o YARN-351, YARN-502
@twitterhadoop
20 / 29 v1.0

MapReduce usability
● Memory.mb as a single tunable: Xmx, sort.mb auto-set
o mb is optimized on case-by-case basis
o MAPREDUCE-5785
● Users want newer artifacts like guava: job.classloader
o MAPREDUCE-5146 / 5751 / 5813 / 5814
● Help users debug
o thread dump on timeout, and on demand via UI
o educate users about heap dumps on OOM and java profiling
@twitterhadoop
21 / 29 v1.0

Multi-DC environment
MR clients across latency boundaries. Submit fast:
● moving split calculation to MR-AM: MAPREDUCE-207
DSCP bit coloring for DataXfer
● HDFS-5175
● Hftp (switched to Apache Commons HttpClient)
DataXfer throttling (client RW)
22 / 29 v1.0

YARN: Beyond Java & MapReduce
● MR-AM and other REST API’s across the stack for easy
integration in non-JVM tools.
● Vowpal Wabbit: (production)
o no extra spanning tree step
● Spark (semi-production)
@twitterhadoop
23 / 29 v1.0

Ongoing Project: Shared Cache
MapReduce function shipping: computation->data
● Teams have jobs with 100’s of jars uploaded via libjars
o Ideal: manage a jar repo on HDFS
o Reference jars via DistributedCache instead of uploading
o Real: currently hard to coordinate
● YARN-1492: Manage artifacts cache transparently
● Measure it:
o YARN-1529: Localization overhead/cache hits NM metrics
o MAPREDUCE-5696: Job localization counters
@twitterhadoop
24 / 29 v1.0

Upcoming Challenges
● Reduce ops complexity:
o grow to 10K+-node clusters
o try to avoid adding more clusters
● Scalability limits for NN, RM
● NN heap sizes: large Java heap vs namespace splitting
● RPC QoS Issues
● NN startup: long initial block report processing
● Integrating non-MR frameworks with hRaven
@twitterhadoop
25 / 29 v1.0

Future Work Ideas
● Productize RM HA and work-preserving restart
● HDFS Readable Standby NN
● Whole DAG in a single NN namespace
● Contribute to HDFS-5477 - Dedicated BM service
● NN SLA: fairshare for RPC queues: HADOOP-10598
● Finer lock granularity in NN
@twitterhadoop
26 / 29 v1.0

Summary: Hadoop 2 @ Twitter
● No JT bottleneck: Lightweight RM + MR-AM
● High compute density with flexible slots
● Reduced NN bottleneck using Federation
● HDFS HA removes the angst to try out new NN configs
● Much closer to upstream to consume/contribute fixes
o Development on 2.3 branch
● Adopting new frameworks on YARN
@twitterhadoop
27 / 29 v1.0

Conclusion
Migrating 1000+ users/use cases is anything but trivial
… however,
● Hadoop 2 made it worthwhile
● Hadoop 2 contributions:
o 40+ patches committed
o ~40 in review
@twitterhadoop
28 / 29 v1.0

Thank you! Questions
@JoinTheFlock about.twitter.com/careers
@TwitterHadoop
Catch up with us in person
@LohitVijayaRenu
@GeraShegalov
@twitterhadoop
29 / 29 v1.0

Hadoop 2 @Twitter, Elephant Scale. Presented at

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop 2 @Twitter, Elephant Scale. Presented at

Similar to Hadoop 2 @Twitter, Elephant Scale. Presented at (20)

More from lohitvijayarenu

More from lohitvijayarenu (14)

Recently uploaded

Recently uploaded (20)

Hadoop 2 @Twitter, Elephant Scale. Presented at

Editor's Notes