Deploying Hadoop-Based Bigdata                  Environments     Click to edit Master subtitle style “[Tall] Tales From Th...
$ whoami   An open source software developer       Linux kernel, C/C++ compilers, FFmpeg, Plan9   A Hadoop and all arou...
ZooKeeper (coordination)       HUE (web based UI)Pig (DQL) Hive (SQL) Impala (SQL) HBase      YARN/MR1         Oozie      ...
ZooKeeper (coordination)       HUE (web based UI)Pig (DQL) Hive (SQL) Impala (SQL) HBase      YARN/MR1         Oozie      ...
It is a jungle out there   Zookeeper         Sqoop       JDK/JRE   Hadoop            Oozie       Kerberos         H...
And the answer is:         Puppet[forge]                     6
One way of using Apache software  $ wget http://apache.org/httpd.tar.gz  $ tar xzvf httpd.tar.gz  $ cd httpd  $ ./configur...
A different way  $ sudo apt-get install httpd  Would you like to also upgrade your conf?                                  ...
Is there apt-get install hadoop ?   Hadoop is still in a very active development   Hadoop is Java based   Hadoop is a d...
Project-by-project approach   “Passively” maintained code       Packaging, OS-level (init.d)   Developer-centric view  ...
Dependencies Inferno:                            Hive 0.8.1          HBase       Hbase (0.92, 0.90)                       ...
Dependencies Inferno:                            Hive 0.8.1          HBase       Hbase (0.92, 0.90)                       ...
Remember what Debian did to Linux? GNU Software             Linux kernel                         Linux kernel             ...
Bigtop is trying to do it with HadoopHadoop Ecosystem              Hadoop                             Linux kernel(Pig, Hi...
Whats there in Bigtop       Build/Packaging infrastructure           RPM, DEB, (tarballs, homebrew/MacPorts)           ...
And the answer is:      Puppet[Bigtop]                     16
System software deployment   Packages vs. Puppet code       package/file/service   What is packaging?       dependency...
Does it really work?   Java packaging       maven/ivy integration   file layout       side-by-side installations of th...
Petascale distributed systems       Scale           Yahoo! ~5000 nodes       Deployment orchestration           Kerber...
Back to tarballs and shell?       Whats better for Puppet: fpm or rpm?       What is the role of Puppet?           coor...
Evolution, not perfection!   Minimalistic, highly consistent packages       /usr/lib/hadoop, /etc/hadoop/conf (alternati...
The road ahead   New kind of configuration management       /etc/hadoop vs Zookeeper   New kinds of system packaging   ...
Java Packaging   Fate of Java       OpenJDK   OSGi       Hadoops view: MAPREDUCE-1700        https://issues.apache.org...
Integration testing   Clean room provisioning       Those aint unit tests – they trash the system   Cluster topology an...
Anatomy of iTest   Versioned, JVM-based test/data artifacts   Dependency between test artifacts   Matching stack of int...
Whos the target audience       End users           YOU!       ASF Projects/Bigdata developers           from Avro to Z...
Whos on-board?   Cloudera       CDH4 is 100% based on Bigtop (hadoop v2)       Available @cloudera.com   Canonical    ...
Whats happening?   A special release: Bigtop 0.3.0-incubating       Hadoop 1.0.1   Last stable release: Bigtop 0.5.0   ...
What Bigtop needs from you?       More of you!           Meetup: “Silicon Valley Hands-on Programming”            http:/...
Contact§    Bigtop home @Apache:    •        http://incubator.apache.org/bigtop/§    Hangout places:    •        {dev,user...
Nächste SlideShare
Wird geladen in …5
×

Deploying Hadoop-Based Bigdata Environments

3.707 Aufrufe
3.504 Aufrufe

Veröffentlicht am

Roman Shaposhnik of Cloudera and the Apache Software Foundation talks on "Delopying Hadoop-Based Bigdata Environments: [Tall] Tales from the Frontier" at Puppet Camp Silicon Valley 2012.

Veröffentlicht in: Technologie
0 Kommentare
8 Gefällt mir
Statistik
Notizen
  • Als Erste(r) kommentieren

Keine Downloads
Aufrufe
Aufrufe insgesamt
3.707
Auf SlideShare
0
Aus Einbettungen
0
Anzahl an Einbettungen
392
Aktionen
Geteilt
0
Downloads
64
Kommentare
0
Gefällt mir
8
Einbettungen 0
Keine Einbettungen

Keine Notizen für die Folie

Deploying Hadoop-Based Bigdata Environments

  1. 1. Deploying Hadoop-Based Bigdata Environments Click to edit Master subtitle style “[Tall] Tales From The Frontier”Roman Shaposhnikrvs@apache.org, Cloudera Inc.
  2. 2. $ whoami An open source software developer  Linux kernel, C/C++ compilers, FFmpeg, Plan9 A Hadoop and all around UNIX guy root@cloudera  Member of the “Kitchen” team Apache Software Foundation Incubator PMC  [Bigtop], Hadoop Development Tools, Celix, Helix VP of Apache Bigtop 2
  3. 3. ZooKeeper (coordination) HUE (web based UI)Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 3
  4. 4. ZooKeeper (coordination) HUE (web based UI)Pig (DQL) Hive (SQL) Impala (SQL) HBase YARN/MR1 Oozie HDFS (filesystem) 4
  5. 5. It is a jungle out there Zookeeper  Sqoop  JDK/JRE Hadoop  Oozie  Kerberos  HDFS  Whirr  Ganglia  YARN  Mahout  Nagios  MR1  Flume  JSVC  HTTPFS  Giraph  Tomcat HBase  Hama  Utils Pig  Hue  Postgress Hive  Solr  HTTPD Impala  Crunch 5
  6. 6. And the answer is: Puppet[forge] 6
  7. 7. One way of using Apache software $ wget http://apache.org/httpd.tar.gz $ tar xzvf httpd.tar.gz $ cd httpd $ ./configure ; make $ make install ERROR: cant write to /usr/local/bin $ sudo make install 7
  8. 8. A different way $ sudo apt-get install httpd Would you like to also upgrade your conf? 8
  9. 9. Is there apt-get install hadoop ? Hadoop is still in a very active development Hadoop is Java based Hadoop is a distributed application Hadoop is way more than HDFS + MR 9
  10. 10. Project-by-project approach “Passively” maintained code  Packaging, OS-level (init.d) Developer-centric view  Edit-compile-debug cycle vs. deployment  Lack of integration testing Differences in distributions/packaging:  Where is this valid: /usr/libexec ? Combinatoric explosion of dependencies 10
  11. 11. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question:$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/lib 11
  12. 12. Dependencies Inferno: Hive 0.8.1 HBase Hbase (0.92, 0.90) HBase HBase Hadoop (1.0, 0.22, 0.23) A million dollar question:$ tar xzvf hive-0.8.1.tar.gz$ ls hive-0.8.1/libhbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar 12
  13. 13. Remember what Debian did to Linux? GNU Software Linux kernel Linux kernel 13
  14. 14. Bigtop is trying to do it with HadoopHadoop Ecosystem Hadoop Linux kernel(Pig, Hive, Mahout) (HDFS + MR)CDH4 beta 1 14
  15. 15. Whats there in Bigtop  Build/Packaging infrastructure  RPM, DEB, (tarballs, homebrew/MacPorts)  VirtualBox, VMWare and KVM VMs  Fedora, OpenSUSE, Mageia, CentOS, Ubuntu Puppet deployment infrastrucutre Integration test infrastrucutre (iTest) Bigtop Jenkins:  http://bigtop01.cloudera.org:8080 15
  16. 16. And the answer is: Puppet[Bigtop] 16
  17. 17. System software deployment Packages vs. Puppet code  package/file/service What is packaging?  dependency tracking  build encapsulation  java packaging  file layout  user creation  service registration 17
  18. 18. Does it really work? Java packaging  maven/ivy integration file layout  side-by-side installations of the same package user creation  LDAP/AD provisioning service registration  start on install vs. start on reboot 18
  19. 19. Petascale distributed systems  Scale  Yahoo! ~5000 nodes  Deployment orchestration  Kerberos::Host_keytab <| title == "hdfs" |> -> Service["hadoop-hdfs-datanode"] Highly coordinated distributed system  It aint HTTPD/loadbalancer  Rolling upgrades/asynchronous rollbacks 19
  20. 20. Back to tarballs and shell?  Whats better for Puppet: fpm or rpm?  What is the role of Puppet?  coordinating the entire system: lack of DSL  converging an isolated node: will it ever work?  a building block for an agent-based system One agent to rule them all?  theres no spoon^H^H^H^H^H^ agent: Whirr  MCollective 20  Cloudera Manager, Ambari
  21. 21. Evolution, not perfection! Minimalistic, highly consistent packages  /usr/lib/hadoop, /etc/hadoop/conf (alternative)  fail gracefully: .... || : )  Java packaging is not solved [yet]: symlinks Minimalistic Puppet code  package/file/service  masterless (most of the time)  integration with Whirr BoxGrinder 21
  22. 22. The road ahead New kind of configuration management  /etc/hadoop vs Zookeeper New kinds of system packaging  Parcels (tarballs + metadata)  HPS (Hadoop Packaging System) Orchestration: to puppet or not to puppet?  Cloudera Manager  Apache Ambari (incubating)  Reactor 8: http://reactor8.com 22
  23. 23. Java Packaging Fate of Java  OpenJDK OSGi  Hadoops view: MAPREDUCE-1700 https://issues.apache.org/jira/browse/MAPREDUCE-1700 Project Jigsaw  Language tie-ins? Really? Linux vendors getting their act together 23
  24. 24. Integration testing Clean room provisioning  Those aint unit tests – they trash the system Cluster topology and cluster state discovery  How can puppet help us? Cluster state manipulation  Test-driven orchestration  Chaos Monkey How to be successful in OS co-opetition  Make everything pluggable (and subvert ;-)) 24
  25. 25. Anatomy of iTest Versioned, JVM-based test/data artifacts Dependency between test artifacts Matching stack of integration tests Implementation  Maven artifacts, pom files  JUnit test-execution entry point  Groovy for scripting 25
  26. 26. Whos the target audience  End users  YOU!  ASF Projects/Bigdata developers  from Avro to Zookeeper  Bigdata solutions vendors  Cloudera, EMC, Hortonworks, Karmasphere DevOPs  Ebay, Yahoo, Facebook, LinkedIn 26
  27. 27. Whos on-board? Cloudera  CDH4 is 100% based on Bigtop (hadoop v2)  Available @cloudera.com Canonical  Ubuntu Server: Hadoop and Bigdata blueprint https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop TrendMicro Hortonworks (partially) EMC, EBay (early stages of prototyping) 27
  28. 28. Whats happening? A special release: Bigtop 0.3.0-incubating  Hadoop 1.0.1 Last stable release: Bigtop 0.5.0  Hadoop 2.0.2-alpha Next stable release: Bigtop 0.6.0  End of Mar 2013 release  Hadoop 2.0.3-beta  Major focus on developers 28
  29. 29. What Bigtop needs from you?  More of you!  Meetup: “Silicon Valley Hands-on Programming” http://www.meetup.com/HandsOnProgrammingEvents/  More infrastructure for build/test  EC2, Supercell, EMC magic cluster, CloudStack  More integration tests  Convince your bosses to commit to Bigtop Validate upstream release using Bigtop 29
  30. 30. Contact§ Bigtop home @Apache: • http://incubator.apache.org/bigtop/§ Hangout places: • {dev,user}@bigtop.apache.org • #bigtop on Freenode§ Roman Shaposhnik • rvs@apache.org, rvs@cloudera.com 30

×