Building hadoop based big data environment

2,275 views

Published on

Published in: Technology
1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total views
2,275
On SlideShare
0
From Embeds
0
Number of Embeds
23
Actions
Shares
0
Downloads
44
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide

Building hadoop based big data environment

  1. 1. Building Hadoop Based Big Data Environment Evans Ye @ TWHUG 2013/12/14
  2. 2. Who am I • Evans Ye @ • Dumbo Team • http://dumbointaiwan.blogspot.tw/ 12/14/2013 Copyright 2013 Trend Micro Inc.
  3. 3. Agenda • Building your own Hadoop version • Hadoop Deployment • Hadoop release engineering • The development environment • Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  4. 4. Why Build our own version • Add your own patch at any time – From community perspective, they need to take care about backward complicity, which need much more time and effort on it. • Fetch official patches in to current adopted version – You may not upgrade your Hadoop version frequently, But there’s a specific need for that patch. • Flexibility, Business needed features 12/14/2013 Copyright 2013 Trend Micro Inc.
  5. 5. As a Beginner 12/14/2013 Copyright 2013 Trend Micro Inc.
  6. 6. Build Hadoop Infrastructure 12/14/2013 Copyright 2013 Trend Micro Inc. What’s your work?
  7. 7. …. 12/14/2013 Copyright 2013 Trend Micro Inc. I thought you just need to yum install Hadoop.
  8. 8. Brute force • git clone • Make some changes • Builde binary tarball How to do version control? core-site.xml hdfs-site.xml mapred-site.xml … 12/14/2013 Copyright 2013 Trend Micro Inc.
  9. 9. Bigtop 12/14/2013 Copyright 2013 Trend Micro Inc.
  10. 10. How bigtop helps you • Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on. • Vendors: – Build your own Apache Hadoop distribution, customized from Apache Bigtop bits. • Packaging, Deployment, Integration Testing 12/14/2013 Copyright 2013 Trend Micro Inc.
  11. 11. Supported Linux Distro • Ubuntu 10.10 • CentOS 5/6 • Fedora 18 • Mageia 1 • openSUSE 12.2 12/14/2013 Copyright 2013 Trend Micro Inc.
  12. 12. Build • Build hadoop-common (see BUILDING.txt) – hadoop-common$ mvn package –Pdist,docs,src,native -Dtar • Prepare your src tar in bigtop • Bigtop$ make hadoop-rpm 12/14/2013 Copyright 2013 Trend Micro Inc.
  13. 13. Hadoop Deployment 12/14/2013 Copyright 2013 Trend Micro Inc.
  14. 14. Configuration files • Hadoop related config – – – – – – – – – 12/14/2013 core-site.xml hdfs-site.xml mapred-site.xml log4j.properties hadoop-env.sh fair-scheduler.xml rack-topology hadoop-metrics.properties taskcontroller.cfg Copyright 2013 Trend Micro Inc.
  15. 15. Local Directories • Hadoop related file and directory – Namenode metadata • /name/1, /name/2 – Datanode • /data/1, /data/2 , /data/3 , /data/4 – Tasktracker • /mapred/1/local, /mapred/2/local –… 12/14/2013 Copyright 2013 Trend Micro Inc.
  16. 16. More hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  17. 17. Problems to solve • Lots of nodes need to be configured • Less human involved, less mistake made • Configuration changed quite often – adjust fair scheduler – enable/disable short circuit – try more performance improvement configurations 12/14/2013 Copyright 2013 Trend Micro Inc.
  18. 18. Hadooppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  19. 19. What is puppet ? • A IT automation tool to help system administrators automate the many repetitive tasks • You need to only define the desired state 12/14/2013 Copyright 2013 Trend Micro Inc.
  20. 20. What is Hadooppet ? • A general hadoop cluster deployment tool based on puppet • Kerberos / ldap auto configured • A set of hadoop / kerberos management tool • A set of sanity check scripts for trend hadoop related services • Manage configuration on puppetmaster 12/14/2013 Copyright 2013 Trend Micro Inc.
  21. 21. Design • Abstract environment specific configurations in a single configuration file • setup.sh – – – – – – 12/14/2013 namenode_fqdns=(“dev1.example.com” “dev2.example.com”) namenode_dirs=(“/name/1” “/name/2”) namenode_heap=32g map_slots=5 reduce_slots=3 … Copyright 2013 Trend Micro Inc.
  22. 22. Benifits • Can be used to setup any kind of hadoop cluster • When doing main version upgarade, minimal the downtime – hadoop1  hadoop2 Namenode Secondarynamenode 12/14/2013 Copyright 2013 Trend Micro Inc. Active/Standby Namenode Journalnodes ZKFC
  23. 23. Release Engineering 12/14/2013 Copyright 2013 Trend Micro Inc.
  24. 24. Manually • Build src tarball in hadoop-common • Build rpms in bigtop • submit build to release yum repo • yum update on hadoop cluster… 12/14/2013 Copyright 2013 Trend Micro Inc.
  25. 25. Continuous Integration • Setup hadoop-common daily build • Setup Bigtop release Build – should be manually triggered • Setup Hadooppet daily build – Run sanity checks on a REAL CLUSTER 12/14/2013 Copyright 2013 Trend Micro Inc.
  26. 26. Virtualization • Build a Xen Server Cluster 12/14/2013 Copyright 2013 Trend Micro Inc.
  27. 27. 12/14/2013 Copyright 2013 Trend Micro Inc.
  28. 28. give-me-vm • Pycon 2012 – Small Python Tools for Software Release Engineering • An automation tool to manage VM lifecycle • Use Python XenAPI • Create temporary VM for testing by self service • Destroy it when the testing is finished 12/14/2013 Copyright 2013 Trend Micro Inc.
  29. 29. Build auto deployment on Hadooppet • ./give_me_vm.py • setup passphraseless ssh between each VM • set hostname • Install Hadooppet on master • run deployment • run sanity checks • ./destroy_vm.py 12/14/2013 Copyright 2013 Trend Micro Inc.
  30. 30. 12/14/2013 Copyright 2013 Trend Micro Inc.
  31. 31. Development Environment 12/14/2013 Copyright 2013 Trend Micro Inc.
  32. 32. For hadoop service developers… • No enough hadoop client for each developers • Developer can not reach server side while developing hadoop related services • Can not experiment new technology like impala spark flume • CI on Hadoop related services 12/14/2013 Copyright 2013 Trend Micro Inc.
  33. 33. give-me-vm + Hadoop all-in-one VM • Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template • get a Hadoop all-in-one VM via give-me-vm • Services integrate its CI test with hadoop all-in-one VM 12/14/2013 Copyright 2013 Trend Micro Inc.
  34. 34. Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  35. 35. Bigtop puppet • Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  36. 36. Bigtop puppet • Preparation: – A VM with jdk, puppet installed – mkdir –p /data/{1,2} – git clone https://github.com/apache/bigtop.git 12/14/2013 Copyright 2013 Trend Micro Inc.
  37. 37. Conclusion • There’re many great deployment tool exist – Ambari, CM, ETU appliance – Choose suitable distribution by your business need • If you want to do it by yourself – Bigtop can do packaging for you easily – Leverage bigtop puppet module for your deployment 12/14/2013 Copyright 2013 Trend Micro Inc.
  38. 38. Questions?
  39. 39. Thank you !

×