Apache Bigtop: a crash course in
deploying a Hadoop platform
Apache Bigtop: a crash course in
deploying a Hadoop platform
“3 + 7 + 9 + 1”
The complexity of the stack
“3 + 7 + 9 + 1”
The complexity of the stack
Hadoop ecosystem
• Many components
• Gazillions of versions
• Lot of patches if you like it hot and dirty
Presenters
• Dr. Konstantin Boudnik, Roman Shaposhnik
• Initial co-inventors of Apache Bigtop
• Active contributors, commi...
Hadoop ≠ HDFS + MR
• Today's Hadoop is more than storage + MR
• Baby elephant outgrew his cradle
• Hbase
• SQL frontends
•...
Complexity in the extreme
“One to bring them all”
Bigtop is a simple answer
“One to bring them all”
Bigtop is a simple answer
Bigtop stack: take it & go
• Modify a stack BOM
• Build
• Deploy
• Configure w/ Puppet (included)
• Test (scenarios are pr...
Rinse and repeat
Field case studies:
Pivotal
WANdisco
Field case studies:
Pivotal
WANdisco
From 0 to full stack in 28 days:
WANdisco case study
From 0 to full stack in 28 days:
WANdisco case study
New distro in 4 weeks
• Major challenges for a new player:
• Need to have a stable development
platform
• Offering Apache ...
Bigtop to help
• Define component versions in the BOM
• Make component changes if needed
• Run Bigtop build
• Deploy the c...
Hadoop in the cloud
Pivotal case study
Hadoop in the cloud
Pivotal case study
Hadoop @Pivotal: history
• Started at Greenplum (GPHD)
• A stand-alone Hadoop distribution
• Based on a fork of Bigtop 0.3...
Lessons for Pivotal
• Think of Bigtop as “Fedora”
• Become part of the community
• Say “no” to forking
• Work on custom re...
Lessons for Bigtop
• Promote common build and RE infrastructure
• “Codifying” it
• Avoid broken windows syndrome
• Bigtop ...
Road aheadRoad ahead
Packaging/deployment
• Deployment environments:
• vanilla servers == packages ?
• vanilla Vms/containers == JeOS/baking ?
...
Validation
• Investing in iTest
• Easier test management and execution
• Cluster discovery and management
framework
• More...
Growing the ecosystem
• 1st Hadoop distribution including Apache
Spark
• But there's more:
• GridGain
• HBase indexer
• Li...
Growth in last 6 months
• Apache Spark: In-memory analytics
• Phoenix: Hbase SQL frontend
• Groovy
• Unification of user-f...
Community
• 23 committers
• Over 100 contributors
• Quarterly releases
• Hackathons and more
Demo (worth 1k words)
https://bigtop.apache.org/
https://blogs.apache.org/bigtop
https://cwiki.apache.org/confluence/display/BIGTOP
dev@bigtop.a...
Q & AQ & A
Thank you
@c0sin
@rhatr
Thank you
@c0sin
@rhatr
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform
Upcoming SlideShare
Loading in …5
×

Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

2,068 views

Published on

A long time ago in a galaxy far, far away only the chosen few could deploy and operate a fully functional Hadoop cluster. Vendors were taking pride in rationalizing this experience to their customers by creating various distributions including Apache Hadoop. It all changed when Cloudera decided to support Apache Bigtop as the first 100% community driven bigdata management distribution of Apache Hadoop. Today, most major commercial distribution of Apache Hadoop are based on Bigtop. Bigtop has won the Hadoop distributions war and is offering a superset of packaged components. In this talk we will focus on practical advice of how to deploy and start operating a Hadoop cluster using Bigtop’s packages and deployment code. We will dive into the details of using packages of Hadoop ecosystem provided by Bigtop and how to build data management pipelines in support your enterprise applications.

Published in: Software, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,068
On SlideShare
0
From Embeds
0
Number of Embeds
13
Actions
Shares
0
Downloads
38
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Apache Bigtop: a crash course in deploying a Hadoop bigdata management platform

  1. 1. Apache Bigtop: a crash course in deploying a Hadoop platform Apache Bigtop: a crash course in deploying a Hadoop platform
  2. 2. “3 + 7 + 9 + 1” The complexity of the stack “3 + 7 + 9 + 1” The complexity of the stack
  3. 3. Hadoop ecosystem • Many components • Gazillions of versions • Lot of patches if you like it hot and dirty
  4. 4. Presenters • Dr. Konstantin Boudnik, Roman Shaposhnik • Initial co-inventors of Apache Bigtop • Active contributors, committers, and PMCs on multiple Apache TLPs in Hadoop ecosystem • Expertise in • Compilers • Operating systems, Virtual machines, JVM • Distributed systems • Complex software stacks
  5. 5. Hadoop ≠ HDFS + MR • Today's Hadoop is more than storage + MR • Baby elephant outgrew his cradle • Hbase • SQL frontends • In-memory processing • Storage caching • Connectors • DSL languages • <your name is here>
  6. 6. Complexity in the extreme
  7. 7. “One to bring them all” Bigtop is a simple answer “One to bring them all” Bigtop is a simple answer
  8. 8. Bigtop stack: take it & go • Modify a stack BOM • Build • Deploy • Configure w/ Puppet (included) • Test (scenarios are provided / easy to add) • Grab an appliance if short of hardware
  9. 9. Rinse and repeat
  10. 10. Field case studies: Pivotal WANdisco Field case studies: Pivotal WANdisco
  11. 11. From 0 to full stack in 28 days: WANdisco case study From 0 to full stack in 28 days: WANdisco case study
  12. 12. New distro in 4 weeks • Major challenges for a new player: • Need to have a stable development platform • Offering Apache Hadoop certified binaries • Team with no prior expertise in the field • Very complex Hadoop ecosystem landscape
  13. 13. Bigtop to help • Define component versions in the BOM • Make component changes if needed • Run Bigtop build • Deploy the cluster w/ provided puppet • Test the cluster with integration suite • Rinse and repeat as needed • Seamless integration into CI • Easy provisioning and incremental updates
  14. 14. Hadoop in the cloud Pivotal case study Hadoop in the cloud Pivotal case study
  15. 15. Hadoop @Pivotal: history • Started at Greenplum (GPHD) • A stand-alone Hadoop distribution • Based on a fork of Bigtop 0.3-incubating • Hadoop 1.0.1 based ecosystem • No community interaction • Graduated as a PHD at Pivotal • A PivotalONE vision • Based on a fork of Bigtop 0.4 • Hadoop 2.x based ecosystem • Integrated with HAWQ
  16. 16. Lessons for Pivotal • Think of Bigtop as “Fedora” • Become part of the community • Say “no” to forking • Work on custom requirements upstream • Participate in release planning • Leverage Bigtop's infrastructure internally
  17. 17. Lessons for Bigtop • Promote common build and RE infrastructure • “Codifying” it • Avoid broken windows syndrome • Bigtop releases don't patch, but vendors do • Make tests easier to use • Invest in documentation • Whitepapers, demos, etc.
  18. 18. Road aheadRoad ahead
  19. 19. Packaging/deployment • Deployment environments: • vanilla servers == packages ? • vanilla Vms/containers == JeOS/baking ? • specializes VMs == Osv • Baking vs frying • Rolling upgrades • side-by-side install
  20. 20. Validation • Investing in iTest • Easier test management and execution • Cluster discovery and management framework • More test cases • Lowering entry-level barriers • Less rigid user-facing interfaces: Gradle • Mix-n-match built-in integration tests • Becoming a TCK for Hadoop ecosystem • Engaging more into trunk testing
  21. 21. Growing the ecosystem • 1st Hadoop distribution including Apache Spark • But there's more: • GridGain • HBase indexer • Lipstick (not for Pig) • Ambrose • Launch it all to the Stratosphere?
  22. 22. Growth in last 6 months • Apache Spark: In-memory analytics • Phoenix: Hbase SQL frontend • Groovy • Unification of user-facing interfaces • Gradle build system • Received proposal in include GridGain in-memory platform
  23. 23. Community • 23 committers • Over 100 contributors • Quarterly releases • Hackathons and more
  24. 24. Demo (worth 1k words)
  25. 25. https://bigtop.apache.org/ https://blogs.apache.org/bigtop https://cwiki.apache.org/confluence/display/BIGTOP dev@bigtop.apache.org user@bigtop.apache.org @ASFbigtop Come & join us
  26. 26. Q & AQ & A
  27. 27. Thank you @c0sin @rhatr Thank you @c0sin @rhatr

×