Your SlideShare is downloading. ×
  • Like
Building hadoop based big data environment
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Building hadoop based big data environment

  • 1,184 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,184
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
33
Comments
1
Likes
6

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building Hadoop Based Big Data Environment Evans Ye @ TWHUG 2013/12/14
  • 2. Who am I • Evans Ye @ • Dumbo Team • http://dumbointaiwan.blogspot.tw/ 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 3. Agenda • Building your own Hadoop version • Hadoop Deployment • Hadoop release engineering • The development environment • Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 4. Why Build our own version • Add your own patch at any time – From community perspective, they need to take care about backward complicity, which need much more time and effort on it. • Fetch official patches in to current adopted version – You may not upgrade your Hadoop version frequently, But there’s a specific need for that patch. • Flexibility, Business needed features 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 5. As a Beginner 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 6. Build Hadoop Infrastructure 12/14/2013 Copyright 2013 Trend Micro Inc. What’s your work?
  • 7. …. 12/14/2013 Copyright 2013 Trend Micro Inc. I thought you just need to yum install Hadoop.
  • 8. Brute force • git clone • Make some changes • Builde binary tarball How to do version control? core-site.xml hdfs-site.xml mapred-site.xml … 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 9. Bigtop 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 10. How bigtop helps you • Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on. • Vendors: – Build your own Apache Hadoop distribution, customized from Apache Bigtop bits. • Packaging, Deployment, Integration Testing 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 11. Supported Linux Distro • Ubuntu 10.10 • CentOS 5/6 • Fedora 18 • Mageia 1 • openSUSE 12.2 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 12. Build • Build hadoop-common (see BUILDING.txt) – hadoop-common$ mvn package –Pdist,docs,src,native -Dtar • Prepare your src tar in bigtop • Bigtop$ make hadoop-rpm 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 13. Hadoop Deployment 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 14. Configuration files • Hadoop related config – – – – – – – – – 12/14/2013 core-site.xml hdfs-site.xml mapred-site.xml log4j.properties hadoop-env.sh fair-scheduler.xml rack-topology hadoop-metrics.properties taskcontroller.cfg Copyright 2013 Trend Micro Inc.
  • 15. Local Directories • Hadoop related file and directory – Namenode metadata • /name/1, /name/2 – Datanode • /data/1, /data/2 , /data/3 , /data/4 – Tasktracker • /mapred/1/local, /mapred/2/local –… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 16. More hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 17. Problems to solve • Lots of nodes need to be configured • Less human involved, less mistake made • Configuration changed quite often – adjust fair scheduler – enable/disable short circuit – try more performance improvement configurations 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 18. Hadooppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 19. What is puppet ? • A IT automation tool to help system administrators automate the many repetitive tasks • You need to only define the desired state 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 20. What is Hadooppet ? • A general hadoop cluster deployment tool based on puppet • Kerberos / ldap auto configured • A set of hadoop / kerberos management tool • A set of sanity check scripts for trend hadoop related services • Manage configuration on puppetmaster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 21. Design • Abstract environment specific configurations in a single configuration file • setup.sh – – – – – – 12/14/2013 namenode_fqdns=(“dev1.example.com” “dev2.example.com”) namenode_dirs=(“/name/1” “/name/2”) namenode_heap=32g map_slots=5 reduce_slots=3 … Copyright 2013 Trend Micro Inc.
  • 22. Benifits • Can be used to setup any kind of hadoop cluster • When doing main version upgarade, minimal the downtime – hadoop1  hadoop2 Namenode Secondarynamenode 12/14/2013 Copyright 2013 Trend Micro Inc. Active/Standby Namenode Journalnodes ZKFC
  • 23. Release Engineering 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 24. Manually • Build src tarball in hadoop-common • Build rpms in bigtop • submit build to release yum repo • yum update on hadoop cluster… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 25. Continuous Integration • Setup hadoop-common daily build • Setup Bigtop release Build – should be manually triggered • Setup Hadooppet daily build – Run sanity checks on a REAL CLUSTER 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 26. Virtualization • Build a Xen Server Cluster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 27. 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 28. give-me-vm • Pycon 2012 – Small Python Tools for Software Release Engineering • An automation tool to manage VM lifecycle • Use Python XenAPI • Create temporary VM for testing by self service • Destroy it when the testing is finished 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 29. Build auto deployment on Hadooppet • ./give_me_vm.py • setup passphraseless ssh between each VM • set hostname • Install Hadooppet on master • run deployment • run sanity checks • ./destroy_vm.py 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 30. 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 31. Development Environment 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 32. For hadoop service developers… • No enough hadoop client for each developers • Developer can not reach server side while developing hadoop related services • Can not experiment new technology like impala spark flume • CI on Hadoop related services 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 33. give-me-vm + Hadoop all-in-one VM • Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template • get a Hadoop all-in-one VM via give-me-vm • Services integrate its CI test with hadoop all-in-one VM 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 34. Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 35. Bigtop puppet • Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 36. Bigtop puppet • Preparation: – A VM with jdk, puppet installed – mkdir –p /data/{1,2} – git clone https://github.com/apache/bigtop.git 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 37. Conclusion • There’re many great deployment tool exist – Ambari, CM, ETU appliance – Choose suitable distribution by your business need • If you want to do it by yourself – Bigtop can do packaging for you easily – Leverage bigtop puppet module for your deployment 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 38. Questions?
  • 39. Thank you !