Hadoop 2 @Twitter,
Elephant Scale
Lohit VijayaRenu Gera Shegalov
@lohitvijayarenu @gerashegalov
@TwitterHadoop
1 / 29 v1.0
About this talk
Share @twitterhadoop’s efforts, experience and learning in
moving thousand users and multi petabyte worklo...
Use cases
Personalization
Graph analysis, Recommendations, Trends, User/topic modeling
Analytics
a/b testing, user behavio...
Hadoop and Data pipeline
TFE
hadoop real
time
hadoop
processing
hadoop
warehouse
hadoop
cold
hadoop
backupsSearch,
Ads, et...
Elephant Scale
➔ Tens of thousands Hadoop servers
(Mix of hardware)
➔ Hundreds of thousands of disk drives
➔ Few hundred P...
Hadoop 1 Challenges (Q4-2012)
Growth:
Supporting twitter growth,
Request for new features on
older branch, new JAVA
Scalab...
Hadoop 2 Configuration (Q1-2013)
NodeManager
DataNode
NodeManager
DataNode
NodeManager
DataNode
YARN ResourceManager
JN JN...
Hadoop 2 Migration (Q2-Q4 2013)
Phase 1 :
Testing
Phase 3 :
Production
Phase 2 :
Semi production
➔ Apache 2.0.3 branch
➔ N...
CPU Utilization
Hadoop 1 CPU
Utilization for
one day. (45%
peaks)
Hadoop 2 CPU
Utilization for
one day. (85%
peaks)
@twitt...
Memory Utilization
Hadoop 1
Memory
Utilization for
one day (68%
peaks)
Hadoop 2
Memory
Utilization for
one day (96%
peaks)...
Migration Challenge: web-based FS
Need a web-based FS to deal with H1/H2 interactions
● Hftp based on cross-DC LogMover ex...
Migration Challenge: hard-coded FS
1000’s of occurrences hdfs://${NN}/path and absolute URIs
● For cluster1 dial hdfs://ha...
Migration Challenge: Interoperability
Migration in progress: H1 job requires input from H2
● hftp://OMGwhatNN/has/my/path ...
StandbyActive
Cluster
CNAME
H1 client
Active Standby Active Standby
Load client-side mounttable on
the server side:
1. red...
Migration: Tools and Ecosystem
● Port/recompile/package:
o Data Access Layer/HCatalog,
o Pig,
o Cascading/Scalding
o Eleph...
HadOops found and fixed
● ViewFS can’t be used for public DistributedCache (DC)
o HADOOP-10191, YARN-1542
● getFileStatus ...
More HadOops
Incident: a job blacklists nodes by logging terabytes
● need capping, but userlog.limit.kb loses valuable log...
Diagnostics improvement
App/Job/Task kill:
● DAG processors/users can say why
o MAPREDUCE-5648, YARN-1551
● MR-AM: “specul...
UX/UI improvements
● NameNode state and cluster stats
● App size in MB on RM Apps Page
● RM Scheduler UI improvements: que...
YARN reliability improvements
● Unhealthy nodes / positive feedback
o drain containers instead of killing: YARN-1996
o don...
MapReduce usability
● Memory.mb as a single tunable: Xmx, sort.mb auto-set
o mb is optimized on case-by-case basis
o MAPRE...
Multi-DC environment
MR clients across latency boundaries. Submit fast:
● moving split calculation to MR-AM: MAPREDUCE-207...
YARN: Beyond Java & MapReduce
● MR-AM and other REST API’s across the stack for easy
integration in non-JVM tools.
● Vowpa...
Ongoing Project: Shared Cache
MapReduce function shipping: computation->data
● Teams have jobs with 100’s of jars uploaded...
Upcoming Challenges
● Reduce ops complexity:
o grow to 10K+-node clusters
o try to avoid adding more clusters
● Scalabilit...
Future Work Ideas
● Productize RM HA and work-preserving restart
● HDFS Readable Standby NN
● Whole DAG in a single NN nam...
Summary: Hadoop 2 @ Twitter
● No JT bottleneck: Lightweight RM + MR-AM
● High compute density with flexible slots
● Reduce...
Conclusion
Migrating 1000+ users/use cases is anything but trivial
… however,
● Hadoop 2 made it worthwhile
● Hadoop 2 con...
Thank you! Questions
@JoinTheFlock about.twitter.com/careers
@TwitterHadoop
Catch up with us in person
@LohitVijayaRenu
@G...
Upcoming SlideShare
Loading in …5
×

Hadoop 2 @Twitter, Elephant Scale. Presented at

4,881 views

Published on

Published in: Technology
  • Be the first to comment

Hadoop 2 @Twitter, Elephant Scale. Presented at

  1. 1. Hadoop 2 @Twitter, Elephant Scale Lohit VijayaRenu Gera Shegalov @lohitvijayarenu @gerashegalov @TwitterHadoop 1 / 29 v1.0
  2. 2. About this talk Share @twitterhadoop’s efforts, experience and learning in moving thousand users and multi petabyte workloads from Hadoop 1 to Hadoop 2 @twitterhadoop 2 / 29 v1.0
  3. 3. Use cases Personalization Graph analysis, Recommendations, Trends, User/topic modeling Analytics a/b testing, user behavior analysis, api analytics Growth Network Digest, People Recommendations, Email Revenue Engagement prediction, Ad targeting, ads analytics, marketplace optimization Nielsen Twitter TV Rating Tweet impressions processing Backups & Scribe Logs MySQL backups, Manhattan backups, FrontEnd scribe logs Many more... @twitterhadoop 3 / 29 v1.0
  4. 4. Hadoop and Data pipeline TFE hadoop real time hadoop processing hadoop warehouse hadoop cold hadoop backupsSearch, Ads, etc Partners MySQL hadoop hbase Vertica Manhatta n hadoop tst @twitterhadoop SVN, Git, ... hadoop tst 4 / 29 v1.0
  5. 5. Elephant Scale ➔ Tens of thousands Hadoop servers (Mix of hardware) ➔ Hundreds of thousands of disk drives ➔ Few hundred PB data stored in HDFS ➔ Hundreds of thousands of daily hadoop jobs ➔ Tens of millions of daily hadoop tasks @twitterhadoop Individual Cluster Stats ➔ More than 3500 nodes ➔ 30-50+ PB data stored in HDFS ➔ 35K RPC/second on NNs ➔ 30K+ jobs per day ➔ 10M+ tasks per day ➔ 6PB+ data crunched per day 5 / 29 v1.0
  6. 6. Hadoop 1 Challenges (Q4-2012) Growth: Supporting twitter growth, Request for new features on older branch, new JAVA Scalability: NameNode files/blocks, NN Operations, GC pause, Checkpointing JobTracker GC pause, task assignment Reliability: SPOF NN and JT, NameNode restart delays Efficiency: Slot utilization, QoS, Multi Tenant, New features & frameworks Maintenance: Old codebase, Numerous issues fixed in later versions, dev branch . @twitterhadoop 6 / 29 v1.0
  7. 7. Hadoop 2 Configuration (Q1-2013) NodeManager DataNode NodeManager DataNode NodeManager DataNode YARN ResourceManager JN JN JN JN JN JN ViewFS, HDFS Balancer, Admin tools, hRaven, Metrics Alerts ……. ……. logs user tmp Trash @twitterhadoop TrashTrash 7 / 29 v1.0
  8. 8. Hadoop 2 Migration (Q2-Q4 2013) Phase 1 : Testing Phase 3 : Production Phase 2 : Semi production ➔ Apache 2.0.3 branch ➔ New Hardware*, New OS and JVM ➔ Benchmarks and user jobs (lots of them…) ➔ Dependent component updates ➔ Data movement between different versions ➔ Metrics, Alerts and tools ➔ Production use cases running in 2 clusters in parallel. ➔ Tuning/parameter updates and learnings ➔ Started contributing fixes back to community ➔ Educating users about new version and changes ➔ Benefits of Hadoop 2 ➔ Stable Apache 2.0.5 release with many fixes and backports ➔ Multiple internal releases ➔ Template for new clusters ➔ Ready to roll Apache 2.3 release *http://www.slideshare.net/Hadoop_Summit/hadoop-hardware-twitter-size-does-matter @twitterhadoop 8 / 29 v1.0
  9. 9. CPU Utilization Hadoop 1 CPU Utilization for one day. (45% peaks) Hadoop 2 CPU Utilization for one day. (85% peaks) @twitterhadoop 9 / 29 v1.0
  10. 10. Memory Utilization Hadoop 1 Memory Utilization for one day (68% peaks) Hadoop 2 Memory Utilization for one day (96% peaks) @twitterhadoop 10 / 29 v1.0
  11. 11. Migration Challenge: web-based FS Need a web-based FS to deal with H1/H2 interactions ● Hftp based on cross-DC LogMover experience ● Apps broken due to no FNF on non-existing paths HDFS-6143 ● Faced challenges cross-version checksums @twitterhadoop 11 / 29 v1.0
  12. 12. Migration Challenge: hard-coded FS 1000’s of occurrences hdfs://${NN}/path and absolute URIs ● For cluster1 dial hdfs://hadoop-cluster1-nn.dc CNAME ● For cluster2 dial … Ideal: use logical paths and viewfs as defaultFS More realistic and faster: ● HDFSCompatibleViewFS HADOOP-9985 @twitterhadoop 12 / 29 v1.0
  13. 13. Migration Challenge: Interoperability Migration in progress: H1 job requires input from H2 ● hftp://OMGwhatNN/has/my/path problem ● ideal: use viewfs on H1 resolving to correct H2-NN ● realistic: see above “hardcoded FS” ● Even if you know OMGwhatNN, is it active? @twitterhadoop 13 / 29 v1.0
  14. 14. StandbyActive Cluster CNAME H1 client Active Standby Active Standby Load client-side mounttable on the server side: 1. redirect to the right namespace 2. redirect to active within namespace @twitterhadoop 14 / 29 v1.0
  15. 15. Migration: Tools and Ecosystem ● Port/recompile/package: o Data Access Layer/HCatalog, o Pig, o Cascading/Scalding o ElephantBird o hadoop-lzo ● PIG-3913 (local mode counters), ● Analytics team fixed PIG-2888 (performance) ● hRaven fixes: o translation between slot_millis and mb_millis @twitterhadoop 15 / 29 v1.0
  16. 16. HadOops found and fixed ● ViewFS can’t be used for public DistributedCache (DC) o HADOOP-10191, YARN-1542 ● getFileStatus RPC storm on public DC: o YARN-1771 ● No user-specified progress string in MR-AM UI task o MAPREDUCE-5550 ● Uberized jobs for scheduling small jobs great but ... o can you kill them? MAPREDUCE-5841 o size correctly for map-only? YARN-1190 @twitterhadoop 16 / 29 v1.0
  17. 17. More HadOops Incident: a job blacklists nodes by logging terabytes ● need capping, but userlog.limit.kb loses valuable log tail ● RollingFileAppender for MR-AM/tasks MAPREDUCE- 5672 @twitterhadoop 17 / 29 v1.0
  18. 18. Diagnostics improvement App/Job/Task kill: ● DAG processors/users can say why o MAPREDUCE-5648, YARN-1551 ● MR-AM: “speculation”, “reducer preemption” o MAPREDUCE-5692, MAPREDUCE-5825 ● Thread Dumps o On task timeout: MAPREDUCE-5044 o On demand from CLI/UI: MAPREDUCE-5784, ... @twitterhadoop 18 / 29 v1.0
  19. 19. UX/UI improvements ● NameNode state and cluster stats ● App size in MB on RM Apps Page ● RM Scheduler UI improvements: queue descriptions, bugs min/max resource calc. ● Task Attempt state filtering in MR-AM HDFS-5928, YARN-1945, HDFS-5296... @twitterhadoop 19 / 29 v1.0
  20. 20. YARN reliability improvements ● Unhealthy nodes / positive feedback o drain containers instead of killing: YARN-1996 o don’t rerun maps when all reduces committed: MAPREDUCE-5817 ● RM crashes JIRA fixed either just internally or public o YARN-351, YARN-502 @twitterhadoop 20 / 29 v1.0
  21. 21. MapReduce usability ● Memory.mb as a single tunable: Xmx, sort.mb auto-set o mb is optimized on case-by-case basis o MAPREDUCE-5785 ● Users want newer artifacts like guava: job.classloader o MAPREDUCE-5146 / 5751 / 5813 / 5814 ● Help users debug o thread dump on timeout, and on demand via UI o educate users about heap dumps on OOM and java profiling @twitterhadoop 21 / 29 v1.0
  22. 22. Multi-DC environment MR clients across latency boundaries. Submit fast: ● moving split calculation to MR-AM: MAPREDUCE-207 DSCP bit coloring for DataXfer ● HDFS-5175 ● Hftp (switched to Apache Commons HttpClient) DataXfer throttling (client RW) 22 / 29 v1.0
  23. 23. YARN: Beyond Java & MapReduce ● MR-AM and other REST API’s across the stack for easy integration in non-JVM tools. ● Vowpal Wabbit: (production) o no extra spanning tree step ● Spark (semi-production) @twitterhadoop 23 / 29 v1.0
  24. 24. Ongoing Project: Shared Cache MapReduce function shipping: computation->data ● Teams have jobs with 100’s of jars uploaded via libjars o Ideal: manage a jar repo on HDFS o Reference jars via DistributedCache instead of uploading o Real: currently hard to coordinate ● YARN-1492: Manage artifacts cache transparently ● Measure it: o YARN-1529: Localization overhead/cache hits NM metrics o MAPREDUCE-5696: Job localization counters @twitterhadoop 24 / 29 v1.0
  25. 25. Upcoming Challenges ● Reduce ops complexity: o grow to 10K+-node clusters o try to avoid adding more clusters ● Scalability limits for NN, RM ● NN heap sizes: large Java heap vs namespace splitting ● RPC QoS Issues ● NN startup: long initial block report processing ● Integrating non-MR frameworks with hRaven @twitterhadoop 25 / 29 v1.0
  26. 26. Future Work Ideas ● Productize RM HA and work-preserving restart ● HDFS Readable Standby NN ● Whole DAG in a single NN namespace ● Contribute to HDFS-5477 - Dedicated BM service ● NN SLA: fairshare for RPC queues: HADOOP-10598 ● Finer lock granularity in NN @twitterhadoop 26 / 29 v1.0
  27. 27. Summary: Hadoop 2 @ Twitter ● No JT bottleneck: Lightweight RM + MR-AM ● High compute density with flexible slots ● Reduced NN bottleneck using Federation ● HDFS HA removes the angst to try out new NN configs ● Much closer to upstream to consume/contribute fixes o Development on 2.3 branch ● Adopting new frameworks on YARN @twitterhadoop 27 / 29 v1.0
  28. 28. Conclusion Migrating 1000+ users/use cases is anything but trivial … however, ● Hadoop 2 made it worthwhile ● Hadoop 2 contributions: o 40+ patches committed o ~40 in review @twitterhadoop 28 / 29 v1.0
  29. 29. Thank you! Questions @JoinTheFlock about.twitter.com/careers @TwitterHadoop Catch up with us in person @LohitVijayaRenu @GeraShegalov @twitterhadoop 29 / 29 v1.0

×