Your SlideShare is downloading. ×
0
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Deploying Hadoop-Based Bigdata Environments
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Deploying Hadoop-Based Bigdata Environments

2,207

Published on

Roman Shaposhnik of Cloudera and the Apache Software Foundation talks on "Delopying Hadoop-Based Bigdata Environments: [Tall] Tales from the Frontier" at Puppet Camp Silicon Valley 2012.

Roman Shaposhnik of Cloudera and the Apache Software Foundation talks on "Delopying Hadoop-Based Bigdata Environments: [Tall] Tales from the Frontier" at Puppet Camp Silicon Valley 2012.

Published in: Technology
0 Comments
7 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,207
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
56
Comments
0
Likes
7
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Deploying Hadoop-Based Bigdata Environments Click to edit Master subtitle style “[Tall] Tales From The Frontier”Roman Shaposhnikrvs@apache.org, Cloudera Inc.
  • 2. $ whoami An open source software developer  Linux kernel, C/C++ compilers, FFmpeg, Plan9 A Hadoop and all around UNIX guy root@cloudera  Member of the “Kitchen” team Apache Software Foundation Incubator PMC  [Bigtop], Hadoop Development Tools, Celix, Helix VP of Apache Bigtop 2
  • 3. ZooKeeper (coordination) HUE (web based UI)Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 3
  • 4. ZooKeeper (coordination) HUE (web based UI)Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 4
  • 5. It is a jungle out there Zookeeper  Sqoop  JDK/JRE Hadoop  Oozie  Kerberos  HDFS  Whirr  Ganglia  YARN  Mahout  Nagios  MR1  Flume  JSVC  HTTPFS  Giraph  Tomcat HBase  Hama  Utils Pig  Hue  Postgress Hive  Solr  HTTPD Impala  Crunch 5
  • 6. And the answer is: Puppet[forge] 6
  • 7. One way of using Apache software $ wget http://apache.org/httpd.tar.gz $ tar xzvf httpd.tar.gz $ cd httpd $ ./configure ; make $ make install ERROR: cant write to /usr/local/bin $ sudo make install 7
  • 8. A different way $ sudo apt-get install httpd Would you like to also upgrade your conf? 8
  • 9. Is there apt-get install hadoop ? Hadoop is still in a very active development Hadoop is Java based Hadoop is a distributed application Hadoop is way more than HDFS + MR 9
  • 10. Project-by-project approach “Passively” maintained code  Packaging, OS-level (init.d) Developer-centric view  Edit-compile-debug cycle vs. deployment  Lack of integration testing Differences in distributions/packaging:  Where is this valid: /usr/libexec ? Combinatoric explosion of dependencies 10
  • 11. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question:$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib 11
  • 12. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question:$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/libhbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar 12
  • 13. Remember what Debian did to Linux? GNU Software Linux kernel Linux kernel 13
  • 14. Bigtop is trying to do it with HadoopHadoop Ecosystem Hadoop Linux kernel(Pig, Hive, Mahout) (HDFS + MR)CDH4 beta 1 14
  • 15. Whats there in Bigtop  Build/Packaging infrastructure  RPM, DEB, (tarballs, homebrew/MacPorts)  VirtualBox, VMWare and KVM VMs  Fedora, OpenSUSE, Mageia, CentOS, Ubuntu Puppet deployment infrastrucutre Integration test infrastrucutre (iTest) Bigtop Jenkins:  http://bigtop01.cloudera.org:8080 15
  • 16. And the answer is: Puppet[Bigtop] 16
  • 17. System software deployment Packages vs. Puppet code  package/file/service What is packaging?  dependency tracking  build encapsulation  java packaging  file layout  user creation  service registration 17
  • 18. Does it really work? Java packaging  maven/ivy integration file layout  side-by-side installations of the same package user creation  LDAP/AD provisioning service registration  start on install vs. start on reboot 18
  • 19. Petascale distributed systems  Scale  Yahoo! ~5000 nodes  Deployment orchestration  Kerberos::Host_keytab <| title == "hdfs" |> -> Service["hadoop-hdfs-datanode"] Highly coordinated distributed system  It aint HTTPD/loadbalancer  Rolling upgrades/asynchronous rollbacks 19
  • 20. Back to tarballs and shell?  Whats better for Puppet: fpm or rpm?  What is the role of Puppet?  coordinating the entire system: lack of DSL  converging an isolated node: will it ever work?  a building block for an agent-based system One agent to rule them all?  theres no spoon^H^H^H^H^H^ agent: Whirr  MCollective 20  Cloudera Manager, Ambari
  • 21. Evolution, not perfection! Minimalistic, highly consistent packages  /usr/lib/hadoop, /etc/hadoop/conf (alternative)  fail gracefully: .... || : )  Java packaging is not solved [yet]: symlinks Minimalistic Puppet code  package/file/service  masterless (most of the time)  integration with Whirr BoxGrinder 21
  • 22. The road ahead New kind of configuration management  /etc/hadoop vs Zookeeper New kinds of system packaging  Parcels (tarballs + metadata)  HPS (Hadoop Packaging System) Orchestration: to puppet or not to puppet?  Cloudera Manager  Apache Ambari (incubating)  Reactor 8: http://reactor8.com 22
  • 23. Java Packaging Fate of Java  OpenJDK OSGi  Hadoops view: MAPREDUCE-1700 https://issues.apache.org/jira/browse/MAPREDUCE-1700 Project Jigsaw  Language tie-ins? Really? Linux vendors getting their act together 23
  • 24. Integration testing Clean room provisioning  Those aint unit tests – they trash the system Cluster topology and cluster state discovery  How can puppet help us? Cluster state manipulation  Test-driven orchestration  Chaos Monkey How to be successful in OS co-opetition  Make everything pluggable (and subvert ;-)) 24
  • 25. Anatomy of iTest Versioned, JVM-based test/data artifacts Dependency between test artifacts Matching stack of integration tests Implementation  Maven artifacts, pom files  JUnit test-execution entry point  Groovy for scripting 25
  • 26. Whos the target audience  End users  YOU!  ASF Projects/Bigdata developers  from Avro to Zookeeper  Bigdata solutions vendors  Cloudera, EMC, Hortonworks, Karmasphere DevOPs  Ebay, Yahoo, Facebook, LinkedIn 26
  • 27. Whos on-board? Cloudera  CDH4 is 100% based on Bigtop (hadoop v2)  Available @cloudera.com Canonical  Ubuntu Server: Hadoop and Bigdata blueprint https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop TrendMicro Hortonworks (partially) EMC, EBay (early stages of prototyping) 27
  • 28. Whats happening? A special release: Bigtop 0.3.0-incubating  Hadoop 1.0.1 Last stable release: Bigtop 0.5.0  Hadoop 2.0.2-alpha Next stable release: Bigtop 0.6.0  End of Mar 2013 release  Hadoop 2.0.3-beta  Major focus on developers 28
  • 29. What Bigtop needs from you?  More of you!  Meetup: “Silicon Valley Hands-on Programming” http://www.meetup.com/HandsOnProgrammingEvents/  More infrastructure for build/test  EC2, Supercell, EMC magic cluster, CloudStack  More integration tests  Convince your bosses to commit to Bigtop Validate upstream release using Bigtop 29
  • 30. Contact§ Bigtop home @Apache: • http://incubator.apache.org/bigtop/§ Hangout places: • {dev,user}@bigtop.apache.org • #bigtop on Freenode§ Roman Shaposhnik • rvs@apache.org, rvs@cloudera.com 30

×