Successfully reported this slideshow.
Your SlideShare is downloading. ×

HKG18-220 - State of Big Data on AArch64. - Apache BigTop

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 16 Ad

HKG18-220 - State of Big Data on AArch64. - Apache BigTop

Download to read offline

Session ID: HKG18-220
Session Name: HKG18-220 - State of Big Data on AArch64. - Apache BigTop
Speaker: Ganesh Raju,Naresh Bhat,Jun He
Track: Enterprise


★ Session Summary ★
AArch64 is now a first class citizen with Apache Bigtop community. Apache Bigtop is a project for comprehensive packaging, testing, and configuration of the leading open source big data components. It is what every distro is using to do the build (ODPi, Hortonworks, IBM, Pivotal, etc). This talk will cover overview of the patches submitted to the community, insights into bootstrapping and automating the packaging process, deploying into docker containers using provisioners and newly introduced Sandbox feature. We will touch on the challenges we faced in porting it to AArch64 and running smoke tests. Jun He & Ganesh Raju - Bigtop patches, common areas of code fixes, challenges of porting.and walk through of setup and installation process. Naresh Bhat - Sandbox and Smoke Tests
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-220/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-220.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-220.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong

---------------------------------------------------
Keyword: Enterprise
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961

Session ID: HKG18-220
Session Name: HKG18-220 - State of Big Data on AArch64. - Apache BigTop
Speaker: Ganesh Raju,Naresh Bhat,Jun He
Track: Enterprise


★ Session Summary ★
AArch64 is now a first class citizen with Apache Bigtop community. Apache Bigtop is a project for comprehensive packaging, testing, and configuration of the leading open source big data components. It is what every distro is using to do the build (ODPi, Hortonworks, IBM, Pivotal, etc). This talk will cover overview of the patches submitted to the community, insights into bootstrapping and automating the packaging process, deploying into docker containers using provisioners and newly introduced Sandbox feature. We will touch on the challenges we faced in porting it to AArch64 and running smoke tests. Jun He & Ganesh Raju - Bigtop patches, common areas of code fixes, challenges of porting.and walk through of setup and installation process. Naresh Bhat - Sandbox and Smoke Tests
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-220/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-220.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-220.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong

---------------------------------------------------
Keyword: Enterprise
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961

Advertisement
Advertisement

More Related Content

More from Linaro (20)

Recently uploaded (20)

Advertisement

HKG18-220 - State of Big Data on AArch64. - Apache BigTop

  1. 1. HKG18-220: State of Big Data on AArch64 - Apache BigTop Ganesh Raju, Jun He & Naresh Bhat Big Data Team, LEG
  2. 2. Agenda ● Why and What is Bigtop ● Bigtop patches for AArch64 ● Challenges of porting ● Walk through of setup and installation process ● Demo on Provisioning and Smoke Tests
  3. 3. Why Apache Bigtop? ● Hadoop is a collection of many components ○ Numerous versions (Dependency hell) ○ Lots of patches ○ No stable development environment with certified binaries ○ No proper integrated tests ● With Bigtop - Build, Deploy in cluster with puppet, Configure, Install and Test ● Juju orchestration ● Blueprints ● Seamless integration into CI
  4. 4. What is in Apache Bigtop? ● Output ○ A set of binaries(deb and rpm) just like HDP, ODPi, CDH, etc ○ Docker images ○ Docker sandbox images ● Integration code, Packaging code, Deployment code, Orchestration code ● Validation code ○ Integration tests ■ Clean slate provisioning ■ Dependency integration artifacts ■ Versioned test artifacts ■ Plug and play artifacts ■ JVM-based artifacts ○ Packaging tests ○ Smoke tests ● Continuous Integration
  5. 5. Consumers of Bigtop Some of consumers of Bigtop ● ODPi ● Hortonworks ● Amazon ● Canonical ● EMC ● Pivotal ● Infosys ● Capgemini ● Ebay ● Intel ● TrendMicro ● WANdisco
  6. 6. Bigtop v1.2 A few highlights of the v1.2 series release include: ● 6 Distros, 2 archs (x86, and ppc64le) supported ○ ARM support with v1.3 ● A newly introduced Bigtop Sandbox feature ● A faster Docker Provisioner which is rewritten to fully embrace Docker ecosystem ● OpenJDK 8 support ● Hadoop 2.7.3, Spark 2.1.1, HBase 1.1.9, and Zeppelin 0.72 are used ● And many upgrades of the ecosystem projects (Apex, Crunch, Flume, Ignite, Mahout, Oozie, Phoenix, and many others)
  7. 7. Components of Bigtop alluxio v1.0.1 Greenplum gpdb v5.0.0-alpha.0 Apache pig v0.15.0 Apache ambari v2.5.0 Apache hadoop v2.7.3 Quantcast qfs v1.1.4 Apache apex v3.5.0 Apache hama v0.7.0 Apache solr v4.10.4 groovy v2.4.10 Apache hbase v1.1.3 Apache spark 1.1 v1.6.2 Apache commons - jsvc v1.0.15 Apache hive v1.2.1 Apache spark 2.0 v2.1.1 Apache tomcat v6.0.45 Apache hue v3.11.0 Apache sqoop v1 v1.4.6 bigtop_utils v1.2.0 Apache ignite v1.9.0 Apache sqoop v2 v1.99.4 Apache crunch v0.14.0 Apache kafka v0.10.1.1 Apache tajo v0.11.1 Pig UDF datafu v1.3.0 kite v1.1.0 Apache tez v0.6.2 Apache flink v1.1.3 Apache mahout v0.12.2 ycsb v0.4.0 Apache flume v1.7.0 Apache oozie v4.3.0 Apache zeppelin v0.7.0 Apache giraph v1.1.0 Apache phoenix v4.9.0-HBase-1.1 Apache zookeeper v3.4.6
  8. 8. Contributions from ARM Ecosystem ● AArch64 CI nodes are running on Linaro DevCloud ○ 3 distros are supported: Debian-9, Fedora-26, Ubuntu-16.04 https://ci.bigtop.apache.org/job/Bigtop-trunk-packages/
  9. 9. Contributions from ARM Ecosystem ● Patches to enable components on AArch64 ○ Build ■ Hadoop, Solr, Hbase, Ignite, … ○ Package ■ Hama, Solr, Oozie, Hue, … ○ Deploy ■ Service scripts, Automation scripts, Dockerfiles,… ○ Test ■ SmokeTests and Provisioner settings
  10. 10. Contributions from ARM Ecosystem ● Lessons learned ○ Dependency issues ■ Native binaries: protobuf, phantomjs, … ■ Jars: levedb-jni, ignite-shmem, jffi, … ■ Version mismatch: slf4j, log4j, log4j2, … ○ Repos ■ Official release did not support aarch64 ● Had to create private/local repo
  11. 11. Challenges ● Cyclic references take a lot of effort to fix ● Though most big data companies all use Bigtop, there has not been contributions coming in from them ● With founders moving out end of last year, and lead committers changing focus, the project has lost momentum
  12. 12. Roadmap ● Make Bigtop a 1st class citizen on Kubernetes ● Test out Puppet deployment code in variety of different scenarios including developers spinning up test clusters via Docker deployer ● Improve Docker deployer to be more developer friendly and hook it back into Gradle ● Provide predefined sample stacks for specific use cases. For example: ○ Machine Learning: Hadoop+Spark+Zeppelin ○ Streaming: Hadoop+Kafka+Flink ● Create more tests ● Enable Ambari to install Bigtop stack. Utilize the work done for ODPi
  13. 13. Sandbox ● What is Sandbox? A tool to build and run big data pseudo cluster using Docker ● Command to generate sandbox image $ ./build.sh -a bigtop -o centos-7 -c "hdfs, yarn, spark, ignite" ● You can do a dry run using option “--dryrun” command ● How to run the sandbox image $ docker run -d -p 50070:50070 bigtop/sandbox:centos-7_hdfs
  14. 14. Smoke Tests ● Uses yaml file to configure located under <BIGTOP_SRC_TOP>/provisioner/docker/ ● Components to configure ○ docker image ○ distro type ○ components to install ○ components to test ○ JDK ● Environment check $ ./docker-hadoop.sh -E ● Execution $ cd provisioner/docker $ ./docker-hadoop.sh -C <smoke_test_cfg_yaml> -c <node_count> -s -d
  15. 15. Glossary ● Linaro Collaborate page ● Bigtop wiki page ● Smoke Test Collaborate page ● Smoke Test Results
  16. 16. Thank You #HKG18 HKG18 keynotes and videos on: connect.linaro.org For further information: www.linaro.org

×