Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How we lose etu hadoop competition

1,237 views

Published on

The experience about join a Taiwan hadoop deployment competition .

Published in: Software, Technology

How we lose etu hadoop competition

  1. 1. How We Lose Etu Hadoop Competition Evans Ye 2014.6.16 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 1
  2. 2. This April, a Hadoop Competition hosted by Etu was announced 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  3. 3. It’s about hadoop deployment 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  4. 4. I have a dream… to win that 150 grand 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 4
  5. 5. Our Team • Fann Wu, Mammi Chang – Solid Hardware related knowledge – knowing well how to tune performance on hadoop clusters • Evans Ye – Have some experience on developing a automatic hadoop deployment tool 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  6. 6. Agenda • The preliminary – Winning criteria – What we’ve prepared • The final – Winning criteria – What we’ve prepared • Why we lost the competition • Lesson learned 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  7. 7. The preliminary • Deploy a all-in-one hadoop EC2 instance 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  8. 8. Criteria to win the preliminary • namenode daemon exist • put 100MB file up to hdfs • yarn daemons exist • run a pi job • zookeeper daemon exist • hbase daemon exist • run hbase put and scan • run a pig script • run a hive query 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  9. 9. And the most Improtant one, Finish Time 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  10. 10. Prepare for the fight 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 10
  11. 11. What we prepare to do • in order to achieve fastest finish time, we need to practice over and over. – A Vagrant based scripts to simulate the AWS environment – A shell script which will automatically provision all-in-one hadoop 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  12. 12. Vagrant • An open source command line VM provision tool – http://www.vagrantup.com/ • Support Virtualbox, VMware, AWS and more as VM provider • Support shell, puppet, chef on provisioning • previous sharing 6/19/2014 Copyright 2013 Trend Micro Inc.
  13. 13. Vagrant-aws plugin • https://github.com/mitchellh/vagrant-aws • Vagrantfile 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  14. 14. Provision script • Jazz Wang already leaked the script to provision a all-in-one hadoop on Ubuntu in OSDC.TW – package based deployment (you can also started from tarballs) 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  15. 15. Our hack #1 • Use self cloned S3 repo instead of worldwide public repos – avoid SPOF – co-located with Singapore region to speed up network transmission 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  16. 16. Our hack #2 • the evil /usr/lib/hadoop/libexec/init-hdfs.sh 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  17. 17. Our hack #2 • /usr/lib/hadoop/libexec/init-hdfs.sh – A hdfs directories bootstrap script • /user/hbase, /tmp, /var/log/hadoop-yarn/apps… – Execute lots of hadoop shell command • HELL SLOW! – BIGTOP-952 attempt to solve it by calling HDFS API directly using groovy – Our hack is to concatenate similar commands into one command • hadoop fs -mkdir -p /tmp /var/log /tmp/hadoop-yarn • 50  15 calls 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  18. 18. Our hack #3 • run hdfs, hbase, pig, hive test case in parallel – (hdfs test case here) & – (hbase test case here) & – (…) & – wait – send my score 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  19. 19. Pretty good result on the preliminary 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  20. 20. The Final 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 20
  21. 21. Evans: GJ, let’s get some rest • 2 weeks gone 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  22. 22. The Final 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 22
  23. 23. Criteria to win the final • held on 5/31 at Etu’s building 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  24. 24. Criteria to win the final • 部署完整性 (20%) – Zookeeper, HDFS, YARN deployed • 高可用性驗證 (20%) – Namenode HA using Journalnodes • 系統安全性驗證 (10%) – Kerberos enabled • 運行效能 (30%) – DFSIO (write throughput) – Terasort (sort speed) – HBaseEvaluation (Hbase write throughtput) 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  25. 25. Environment • Hardware • Software 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  26. 26. Summarize things we need to do • This time, finish time doesn’t matter. We need to focus on correctness and performance – Choose a hadoop deployment tool which supports • Namenode HA • Kerberos • YARN – Figure out how to get best performance on YARN and Virtualbox 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  27. 27. Choosing the deoloyment tool • Cloudera Manager – You need to install/configure Kerberos by yourself • Ambari – “Claimed” support Kerberos, while actually it does not • Bigtop – Do have Kerberos and namenode HA puppet recipes, but currently is kind of buggy • Hadooppet – Need to implement yarn deployment 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  28. 28. Cloudera Manager … Kerberos installation/configuration is on your own 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  29. 29. Ambari has great UI design, but… 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  30. 30. Comparison 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2 Deployment Tool Namenode HA Kerberos YARN Hadoop distro Troubleshoot ing Cloudera Manager YES NO YES Hadoop 2.3.0 (CDH5) HARD Ambari YES NO (enable failed) YES Hadoop 2.4.0 (HDP2.1) HARD Bigtop NO (NFS) NO (buggy) YES Hadoop 2.0.6- alpha (bigtop-0.7.0) MIDDLE Hadooppet YES YES NO Hadoop 2.3.0 (CDH5) EASY勝 勝
  31. 31. Getting our deployment tool ready 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 31
  32. 32. Trap#1 • Got connection refused from JournalNodes while formatting namenodes • The root cause – When hostname defined in Vagrantfile – It will help to setup VM’s hostname, AND the /etc/hosts – Which lead Journalnodes listening on 127.0.0.1 and results in connection refused error while formatting namenodes • The fix – cat /dev/null > /etc/hosts 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  33. 33. Trap#2 • Kerberos database initialization failed due to timeout exceed • The root cause – Virtualbox has poor entropy performance(Ticket #11297) – Kerberos DB init can not get enough random data – Entropy is often collected from hardware sources for use in cryptography 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  34. 34. Trap#2 • A quick test to get entropy – A xen VM – A virtualbox VM • The fix – Setup havege package which will improve entropy performance • havege official site, Installation 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  35. 35. Performance Tuning 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 35
  36. 36. OS tuning • Disabling Transparent Huge page compaction, THP – echo never > /sys/kernel/mm/redhat_transparent_hugepage/ena bled – impact reported • Hadoop, oracle linux and Splunk… • set vm.swappiness to zero – sysctl -w vm.swappiness=0 – avoid processes to get swapped out despite there is free memory available 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  37. 37. Virtualbox tuning • Raw hard disk access – direct access host disks from guest VM – create a VMDK file to represent the disk/partition – mount it up on the guest through virtualbox GUI – fdisk the newly added disk in guest VM 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  38. 38. YARN tuning • HDFS cache for reads(available since 2.3.0) • YARN: – yarn.nodemanager.resource.memory-mb • Mapreduce: – io.sort.mb – mapreduce.map.memory.mb – mapreduce.map.java.opts – mapreduce.map.speculative – … – Most properties are job specific 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  39. 39. Deployment Architecture 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 39
  40. 40. VMs configuration 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2 RAM CPU DISK daemons VM1 7G 3 vcpus Local disk Namenode Resourcemana ger VM2 7G 3 vcpus Local disk Namenode Resourcemana ger VM3 15G 8 vcpus 1T raw disk *2 Datanode Nodemanager VM4 15G 8 vcpus 1T raw disk *2 Datanode Nodemanager total 44G 22 vcpus 4T for hdfs -
  41. 41. 5/31 The Day 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 41
  42. 42. The check we’re so eager to win 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  43. 43. And the result 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  44. 44. WE LOST 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 44
  45. 45. The reason we lost • VirtualBox sluggish performance on hyper- threading • To avoid that: – Disable hyper-threading – set equal number of cores for host and guest • VMs != physical machines – We all assume that hyper-threading helps a lot on performance, at least it does so on our hadoop cluster 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  46. 46. Poor support for multi-cores • VMs with multiple vCPUs require that all allocated cores be free before processing can begin – Do not configure too many vCPUs for 1 single VM – A strong VM will not perform well as you expect 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  47. 47. The better architecture 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2 RAM CPU DISK daemons VM1 10G 4 vcpus 1T raw disk *1 Namenode Resourcemana ger Datanode Nodemanager VM2 10G 4 vcpus 1T raw disk *1 Namenode Resourcemana ger Datanode Nodemanager VM3 10G 4 vcpus 1T raw disk *1 Datanode Nodemanager VM4 10G 4 vcpus 1T raw disk *1 Datanode Nodemanager total 40G 16 vcpus (equal to physical cores) 4T for hdfs -
  48. 48. How about hadoop performance tuning? • Everybody pretty much using defaults, including the team who win the competition • … 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  49. 49. Lesson learned • Don't judge too soon • Don’t stay up for a week. If so, you can’t make decision wisely • We need better project management – We spent to much time on tuning our deployment tool – We don’t do much tests on different deployment architectures 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  50. 50. Acknowledgments • Thanks to Fann for sorting out those trivial works – packaging the box – cloning repositories – Preparing testing environment • Thanks to Mammi for the great presentation on that day 6/19/2014 Confidential | Copyright 2013 TrendMicro Inc. 2
  51. 51. Q&A 6/19/2014 51Confidential | Copyright 2013 TrendMicro Inc.

×