Building hadoop based big data environment
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Building hadoop based big data environment

  • 1,488 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
1,488
On Slideshare
1,485
From Embeds
3
Number of Embeds
2

Actions

Shares
Downloads
33
Comments
1
Likes
5

Embeds 3

https://twitter.com 2
https://www.linkedin.com 1

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Building Hadoop Based Big Data Environment Evans Ye @ TWHUG 2013/12/14
  • 2. Who am I • Evans Ye @ • Dumbo Team • http://dumbointaiwan.blogspot.tw/ 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 3. Agenda • Building your own Hadoop version • Hadoop Deployment • Hadoop release engineering • The development environment • Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 4. Why Build our own version • Add your own patch at any time – From community perspective, they need to take care about backward complicity, which need much more time and effort on it. • Fetch official patches in to current adopted version – You may not upgrade your Hadoop version frequently, But there’s a specific need for that patch. • Flexibility, Business needed features 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 5. As a Beginner 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 6. Build Hadoop Infrastructure 12/14/2013 Copyright 2013 Trend Micro Inc. What’s your work?
  • 7. …. 12/14/2013 Copyright 2013 Trend Micro Inc. I thought you just need to yum install Hadoop.
  • 8. Brute force • git clone • Make some changes • Builde binary tarball How to do version control? core-site.xml hdfs-site.xml mapred-site.xml … 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 9. Bigtop 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 10. How bigtop helps you • Apache Hadoop App developers: – Run pseudo-distributed Hadoop cluster to test your code on. • Vendors: – Build your own Apache Hadoop distribution, customized from Apache Bigtop bits. • Packaging, Deployment, Integration Testing 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 11. Supported Linux Distro • Ubuntu 10.10 • CentOS 5/6 • Fedora 18 • Mageia 1 • openSUSE 12.2 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 12. Build • Build hadoop-common (see BUILDING.txt) – hadoop-common$ mvn package –Pdist,docs,src,native -Dtar • Prepare your src tar in bigtop • Bigtop$ make hadoop-rpm 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 13. Hadoop Deployment 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 14. Configuration files • Hadoop related config – – – – – – – – – 12/14/2013 core-site.xml hdfs-site.xml mapred-site.xml log4j.properties hadoop-env.sh fair-scheduler.xml rack-topology hadoop-metrics.properties taskcontroller.cfg Copyright 2013 Trend Micro Inc.
  • 15. Local Directories • Hadoop related file and directory – Namenode metadata • /name/1, /name/2 – Datanode • /data/1, /data/2 , /data/3 , /data/4 – Tasktracker • /mapred/1/local, /mapred/2/local –… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 16. More hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 17. Problems to solve • Lots of nodes need to be configured • Less human involved, less mistake made • Configuration changed quite often – adjust fair scheduler – enable/disable short circuit – try more performance improvement configurations 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 18. Hadooppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 19. What is puppet ? • A IT automation tool to help system administrators automate the many repetitive tasks • You need to only define the desired state 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 20. What is Hadooppet ? • A general hadoop cluster deployment tool based on puppet • Kerberos / ldap auto configured • A set of hadoop / kerberos management tool • A set of sanity check scripts for trend hadoop related services • Manage configuration on puppetmaster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 21. Design • Abstract environment specific configurations in a single configuration file • setup.sh – – – – – – 12/14/2013 namenode_fqdns=(“dev1.example.com” “dev2.example.com”) namenode_dirs=(“/name/1” “/name/2”) namenode_heap=32g map_slots=5 reduce_slots=3 … Copyright 2013 Trend Micro Inc.
  • 22. Benifits • Can be used to setup any kind of hadoop cluster • When doing main version upgarade, minimal the downtime – hadoop1  hadoop2 Namenode Secondarynamenode 12/14/2013 Copyright 2013 Trend Micro Inc. Active/Standby Namenode Journalnodes ZKFC
  • 23. Release Engineering 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 24. Manually • Build src tarball in hadoop-common • Build rpms in bigtop • submit build to release yum repo • yum update on hadoop cluster… 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 25. Continuous Integration • Setup hadoop-common daily build • Setup Bigtop release Build – should be manually triggered • Setup Hadooppet daily build – Run sanity checks on a REAL CLUSTER 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 26. Virtualization • Build a Xen Server Cluster 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 27. 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 28. give-me-vm • Pycon 2012 – Small Python Tools for Software Release Engineering • An automation tool to manage VM lifecycle • Use Python XenAPI • Create temporary VM for testing by self service • Destroy it when the testing is finished 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 29. Build auto deployment on Hadooppet • ./give_me_vm.py • setup passphraseless ssh between each VM • set hostname • Install Hadooppet on master • run deployment • run sanity checks • ./destroy_vm.py 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 30. 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 31. Development Environment 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 32. For hadoop service developers… • No enough hadoop client for each developers • Developer can not reach server side while developing hadoop related services • Can not experiment new technology like impala spark flume • CI on Hadoop related services 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 33. give-me-vm + Hadoop all-in-one VM • Use Hadooppet to setup a peudo-distributed hadoop VM as Xenserver template • get a Hadoop all-in-one VM via give-me-vm • Services integrate its CI test with hadoop all-in-one VM 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 34. Bigtop puppet 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 35. Bigtop puppet • Bigtop also has a set of puppet scripts to deploy Hadoop ecosystem 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 36. Bigtop puppet • Preparation: – A VM with jdk, puppet installed – mkdir –p /data/{1,2} – git clone https://github.com/apache/bigtop.git 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 37. Conclusion • There’re many great deployment tool exist – Ambari, CM, ETU appliance – Choose suitable distribution by your business need • If you want to do it by yourself – Bigtop can do packaging for you easily – Leverage bigtop puppet module for your deployment 12/14/2013 Copyright 2013 Trend Micro Inc.
  • 38. Questions?
  • 39. Thank you !