Roman Shaposhnik of Cloudera and the Apache Software Foundation talks on "Delopying Hadoop-Based Bigdata Environments: [Tall] Tales from the Frontier" at Puppet Camp Silicon Valley 2012.
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Β
Deploying Hadoop-Based Bigdata Environments
1. Deploying Hadoop-Based Bigdata
Environments
Click to edit Master subtitle style
β[Tall] Tales From The Frontierβ
Roman Shaposhnik
rvs@apache.org, Cloudera Inc.
2. $ whoami
ξ An open source software developer
ξ Linux kernel, C/C++ compilers, FFmpeg, Plan9
ξ A Hadoop and all around UNIX guy
ξ root@cloudera
ξ Member of the βKitchenβ team
ξ Apache Software Foundation Incubator PMC
ξ [Bigtop], Hadoop Development Tools, Celix, Helix
ξ VP of Apache Bigtop
2
7. One way of using Apache software
$ wget http://apache.org/httpd.tar.gz
$ tar xzvf httpd.tar.gz
$ cd httpd
$ ./configure ; make
$ make install
ERROR: can't write to /usr/local/bin
$ sudo make install
7
8. A different way
$ sudo apt-get install httpd
Would you like to also upgrade your conf?
8
9. Is there apt-get install hadoop ?
ξ Hadoop is still in a very active development
ξ Hadoop is Java based
ξ Hadoop is a distributed application
ξ Hadoop is way more than HDFS + MR
9
10. Project-by-project approach
ξ βPassivelyβ maintained code
ξ Packaging, OS-level (init.d)
ξ Developer-centric view
ξ Edit-compile-debug cycle vs. deployment
ξ Lack of integration testing
ξ Differences in distributions/packaging:
ξ Where is this valid: /usr/libexec ?
ξ Combinatoric explosion of dependencies
10
11. Dependencies Inferno:
Hive 0.8.1
HBase
Hbase (0.92, 0.90)
HBase
HBase
Hadoop (1.0, 0.22, 0.23)
A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz
$ ls hive-0.8.1/lib
11
12. Dependencies Inferno:
Hive 0.8.1
HBase
Hbase (0.92, 0.90)
HBase
HBase
Hadoop (1.0, 0.22, 0.23)
A million dollar question:
$ tar xzvf hive-0.8.1.tar.gz
$ ls hive-0.8.1/lib
hbase-0.89.jar log4j-1.2.15.jar log4j-1.2.16.jar
12
17. System software deployment
ξ Packages vs. Puppet code
ξ package/file/service
ξ What is packaging?
ξ dependency tracking
ξ build encapsulation
ξ java packaging
ξ file layout
ξ user creation
ξ service registration 17
18. Does it really work?
ξ Java packaging
ξ maven/ivy integration
ξ file layout
ξ side-by-side installations of the same package
ξ user creation
ξ LDAP/AD provisioning
ξ service registration
ξ start on install vs. start on reboot
18
19. Petascale distributed systems
ξ Scale
ξ Yahoo! ~5000 nodes
ξ Deployment orchestration
ξ Kerberos::Host_keytab <| title == "hdfs" |> ->
Service["hadoop-hdfs-datanode"]
ξ Highly coordinated distributed system
ξ It ain't HTTPD/loadbalancer
ξ Rolling upgrades/asynchronous rollbacks
19
20. Back to tarballs and shell?
ξ What's better for Puppet: fpm or rpm?
ξ What is the role of Puppet?
ξ coordinating the entire system: lack of DSL
ξ converging an isolated node: will it ever work?
ξ a building block for an agent-based system
ξ One agent to rule them all?
ξ there's no spoon^H^H^H^H^H^ agent: Whirr
ξ MCollective
20
ξ Cloudera Manager, Ambari
21. Evolution, not perfection!
ξ Minimalistic, highly consistent packages
ξ /usr/lib/hadoop, /etc/hadoop/conf (alternative)
ξ fail gracefully: .... || : )
ξ Java packaging is not solved [yet]: symlinks
ξ Minimalistic Puppet code
ξ package/file/service
ξ masterless (most of the time)
ξ integration with Whirr
ξ BoxGrinder 21
22. The road ahead
ξ New kind of configuration management
ξ /etc/hadoop vs Zookeeper
ξ New kinds of system packaging
ξ Parcels (tarballs + metadata)
ξ HPS (Hadoop Packaging System)
ξ Orchestration: to puppet or not to puppet?
ξ Cloudera Manager
ξ Apache Ambari (incubating)
ξ Reactor 8: http://reactor8.com 22
23. Java Packaging
ξ Fate of Java
ξ OpenJDK
ξ OSGi
ξ Hadoop's view: MAPREDUCE-1700
https://issues.apache.org/jira/browse/MAPREDUCE-1700
ξ Project Jigsaw
ξ Language tie-ins? Really?
ξ Linux vendors getting their act together
23
24. Integration testing
ξ Clean room provisioning
ξ Those ain't unit tests β they trash the system
ξ Cluster topology and cluster state discovery
ξ How can puppet help us?
ξ Cluster state manipulation
ξ Test-driven orchestration
ξ Chaos Monkey
ξ How to be successful in OS co-opetition
ξ Make everything pluggable (and subvert ;-)) 24
25. Anatomy of iTest
ξ Versioned, JVM-based test/data artifacts
ξ Dependency between test artifacts
ξ Matching stack of integration tests
ξ Implementation
ξ Maven artifacts, pom files
ξ JUnit test-execution entry point
ξ Groovy for scripting
25
26. Who's the target audience
ξ End users
ξ YOU!
ξ ASF Projects/Bigdata developers
ξ from Avro to Zookeeper
ξ Bigdata solutions vendors
ξ Cloudera, EMC, Hortonworks, Karmasphere
ξ DevOPs
ξ Ebay, Yahoo, Facebook, LinkedIn
26
27. Who's on-board?
ξ Cloudera
ξ CDH4 is 100% based on Bigtop (hadoop v2)
ξ Available @cloudera.com
ξ Canonical
ξ Ubuntu Server: Hadoop and Bigdata blueprint
https://blueprints.launchpad.net/ubuntu/+spec/servercloud-p-hdp-hadoop
ξ TrendMicro
ξ Hortonworks (partially)
ξ EMC, EBay (early stages of prototyping) 27
28. What's happening?
ξ A special release: Bigtop 0.3.0-incubating
ξ Hadoop 1.0.1
ξ Last stable release: Bigtop 0.5.0
ξ Hadoop 2.0.2-alpha
ξ Next stable release: Bigtop 0.6.0
ξ End of Mar 2013 release
ξ Hadoop 2.0.3-beta
ξ Major focus on developers
28
29. What Bigtop needs from you?
ξ More of you!
ξ Meetup: βSilicon Valley Hands-on Programmingβ
http://www.meetup.com/HandsOnProgrammingEvents/
ξ More infrastructure for build/test
ξ EC2, Supercell, EMC magic cluster, CloudStack
ξ More integration tests
ξ Convince your bosses to commit to Bigtop
ξ Validate upstream release using Bigtop
29
30. Contact
Β§
Bigtop home @Apache:
β’
http://incubator.apache.org/bigtop/
Β§
Hangout places:
β’
{dev,user}@bigtop.apache.org
β’
#bigtop on Freenode
Β§
Roman Shaposhnik
β’
rvs@apache.org, rvs@cloudera.com
30