Your SlideShare is downloading. ×
Open Source Recipes for Chef Deployments of Hadoop
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Open Source Recipes for Chef Deployments of Hadoop

1,303
views

Published on

Published in: Technology, Self Improvement

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,303
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • What is Chef?
    Will get to that
    Chef recipes – infrastructure-as-code (you get what Bloomberg runs internally)
    BCPC – Bloomberg Clustered Private Cloud – run what we run!

    Big Top distributions:
    Apache
    Hortonworks
    CDH

    Compose-able in that one can build from just HDFS to HBase or full-stack system with ease
  • Developers spend time on airplanes; coffeeshops; who knows where – this allows one to take our stack on a laptop or deploy in Jenkins
    Vendors need to know how to integrate
    Application teams always want to tinker and see if they could do better
    Bare Metal (the PXE to Application stack means performance testing & entire system testing is much faster end-to-end)

    Gold master can be a Vagrant VM!
  • Complexity much? Maybe you wanted to then actually deploy an application too?
    Note this is only six Apache Hadoop components (and there are more than 20!)
  • Ambari supports more components indeed; the Chef model allows us today to deploy for applications and re-tune very quickly across the entire cluster stack (not solely the Hadoop ecosystem)
  • Thanks to the question mark author: http://commons.wikimedia.org/wiki/File:Circle-question-blue.svg
  • Transcript

    • 1. 1 OpensourcerecipesforChefdeploymentsofHadoop / Open source recipes for Chef deployments of Hadoop Clay Baenziger (cbaenziger@bloomberg.net) Hadoop Infrastructure Bloomberg http://bloomberg.github.io/chef-bcpc/hadoop
    • 2. 2 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Why Open Source? • Bloomberg’s Culture – Bringing transparency to our Hadoop – Reference implementation – Third party integration – Hiring • Everyone Here!
    • 3. 3 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Agenda • What is Provided? • Installation and Configuration Management • Customization • High Availability • Integration with the Cloud
    • 4. 4 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger What Is Provided? • Open source Chef recipes – on GitHub • Installation for Bigtop based distributions – Supporting infrastructure too • Same deployment on – VirtualBox – OpenStack – Bare Metal • PXE to application deployment • Compose-able components
    • 5. 5 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Motivation at Bloomberg Multiple Clusters • Development • Data Lake • Low Latency • Data Sensitivity
    • 6. 6 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Deployment Consistency • Developers • Vendors • Application Teams • Bare Metal – Vagrant VM for bootstrap node
    • 7. 7 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Install & Config Management • Hadoop Ecosystem 20+ Apache projects! • Bloomberg Chef-BCPC: – 2,500 lines of XML configurations • Everything–site.xml – 1,000 lines of Hadoop setup Ruby • 10+ Apache Hadoop components setup for you • Core Hadoop, HBase, Hive, Hcatalog, Kafka, Mahout, Oozie, Pig, Sqoop, Zookeeper and more – 1,600 lines of base setup Ruby • Networking, MySQL Galera, Graphite, Zabbix, … – 550 lines of shell scripts
    • 8. 8 OpensourcerecipesforChefdeploymentsofHadoop / HBase HDFS YARN Hive DependsOn Oozie Map Reduce Hive Metastore Hive Server2 Datanodes Region Servers Master(s) Zookeeper Resource Manager(s) Node Managers Depends On History Server Depends On Depends On Hcatalog WebHCa t Depends On Depends On Not Shown: Operating System Relational Database Authentication/Security Load Balancers Monitoring … JournalNodes Namenode(s) WebHDFS HTTPFS Hadoop Complexity ZKFC
    • 9. 9 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Simple Chef • Deployment and configuration management • Ruby code to describes deployment process • Similar to Puppet, Ansible, Salt, CFEngine, etc. • More broad (& raw) than Ambari -- today • Leverages open source Chef Server
    • 10. 10 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger More Chef • Mature Ecosystem: – Community cookbooks – Ruby Gems • Cluster Features: – Node attributes – Searching by roles, recipes, … • Procedural setup via high-level actions • Provides error handling
    • 11. 11 OpensourcerecipesforChefdeploymentsofHadoop / Chef Idioms – Node Definitions{ “name“ : “cluster1-r5n16.example.com”, “chef_environment“ : “cluster1”, “normal“ : { “bcpc“ : { “management“ : { “ip“ : “10.0.1.23”, “interface“ : “eth1”, “netmask“ : “255.255.240.0”, “cidr“ : “10.0.1.0/20”, “gateway“ : “10.0.1.1” }, “floating“ : { “ip“ : “10.0.0.23”, . . . “gateway“ : “10.0.0.1” }, “hadoop”: { “disks” : [ “sdb”, “sdc”, “sdd”, “sde“ ] } } }, “run_list“ : [ “role[Namenode]”, “role[HBaseMaster]” ] }
    • 12. 12 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Chef Idioms – Nodes By Recipe def get_nodes_for(recipe, cookbook="bcpc") results = search(:node,"recipes:#{cookbook}::#{recipe} AND" + "chef_environment:#{node.chef_environment}") results.map!{|x| x['hostname'] == node[:hostname] ? node : x} return results.sort end zk_hosts = get_nodes_for(“zookeeper_server“,”bcpc-hadoop”) Implementation: Usage:
    • 13. 13 OpensourcerecipesforChefdeploymentsofHadoop / Chef Idioms – Service Verificationruby_block "Chef Oozie Up" do iter = 0; block do status=`oozie admin -status 2>&1` while not /NORMAL/ =~ status and $?.to_i status=`oozie admin -status 2>&1` if $?.to_i == 0 Chef::Log.debug("Oozie status is not failing - #{status}") elseif $?.to_i != 0 and iter < 10 sleep(0.5); iter += 1 Chef::Log.debug("Oozie is down - #{status}") else raise Chef::Application.fatal! "Oozie is reported as down" + "for more than 5 seconds -- #{status}" end end Chef::Log.debug("Oozie is up - #{status}") end not_if "oozie admin -status" end PollLoop API Verification Guard Attribute
    • 14. 14 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Chef Idioms – Service Verification * ruby_block[Check Oozie Up] action run [2014-05-27T16:20:31-04:00] INFO: Processing ruby_block[Check Oozie Up] action run (bcpc-hadoop::oozie line 132) [2014-05-27T16:20:31-04:00] DEBUG: Platform ubuntu version 12.04 found [2014-05-27T16:20:32-04:00] DEBUG: Oozie is down - Error: IO_ERROR : java.net.ConnectException: Connection refused [2014-05-27T16:20:36-04:00] DEBUG: Oozie status is not failing - System mode: NORMAL [2014-05-27T16:20:36-04:00] DEBUG: Oozie is up - System mode: NORMAL [2014-05-27T16:20:36-04:00] INFO: ruby_block[Check Oozie Up] called - execute the ruby block Oozie Down Output: PollLoopGuard Verification Verification
    • 15. 15 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Maintenance • Full developer model • Git – Can use all standard developer tools – App teams can merge in their deployment code • Jenkins for static analysis • Can move relatively stateless Chef server – IP Migration – We build on Vagrant boxes – simply move the VM
    • 16. 16 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Customization • Ad-Hoc Cluster Work – Roll-out xfs_recover steps on machines with administrative damage – sudo rules for debugging • Pick-and-choose application versions – HBase 0.98 on a vendor cluster • Application group deployments – HDFS Artifacts – Kafka Topics – HBase Tables
    • 17. 17 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger High Availability Control Points • Networking – NIC Bonding – Virtual IPs • MySQL – Galera • HDFS – Journal Nodes & ZKFC • MapReduce • HBase • Oozie • Hive
    • 18. 18 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger HDFS HA Steps 1. Have a Zookeeper Cluster 2. Stop HDFS Services – Namenode – Datanodes – Anything using HDFS 3. Change Configuration Files – Hdfs-site.xml (on all nodes) 4. Start journal Nodes 5. Format Zookeeper 6. Initialize Shared Edits 7. Start Primary Namenode 8. Bootstrap Standby Namenode 9. Start Standby Namenode 10. Start Zookeeper Fencing Controller 11. Start Datanodes
    • 19. 19 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger Cloud Integration • Multi-Tenancy – Virtual Machines Provide Isolation – Hadoop Ecosystem Provides Multi-Tenancy – Why Run Hadoop on-top of VMs? • Distributed Filesystem – Provides VM Resiliancy – Provides Hadoop Resiliancy • Scheduler – Picks Optimal Hypervisor per Policy – Picks Optimal Node per Policy
    • 20. 20 OpensourcerecipesforChefdeploymentsofHadoop / http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger We Are Hiring
    • 21. 21 OpensourcerecipesforChefdeploymentsofHadoop / Questions/Comments? http://bloomberg.github.io/chef-bcpc/hadoop Clay Baenziger (cbaenziger@bloomberg.net)