Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Gerrit jenkins-big data-continuous-delivery

1,346 views

Published on

See how to use Gerrit and Jenkins together to implement a BigData Continuous Delivery pipeline.

Published in: Technology
  • Be the first to comment

Gerrit jenkins-big data-continuous-delivery

  1. 1. 1 Gerrit and Jenkins for Big Data Continuous Delivery London, UK, June 2015
  2. 2. www.gerritforge.com #jenkinsconf About GerritForge • Founded in 2009 in London • Committed to OpenSource 2
  3. 3. www.gerritforge.com #jenkinsconf The Team Luca Milanesio • Co-founder and Director of GerritForge • over 20 years in Agile Development and ALM • OpenSource contributor to many projects (BigData, Continuous Integration, Git/Gerrit) 3 Antonios Chalkiopulos • Author of Programming MapReduce with Scalding • Open source contributor to many BigData projects • Working on the "land-of-Hadoop' (landoop.com)
  4. 4. www.gerritforge.com #jenkinsconf The Team (2) Tiago Palma • Data Warehouse & Big Data Development • Senior Data Modeler • Big Data infrastructure specialist 4 Stefano Galarraga • 20 years of Agile Development • Middleware, Big Data, Reactive Distributed Systems. • Open Source contributor to many BigData projects.
  5. 5. www.gerritforge.com #jenkinsconf Agenda • Why continuous deployment on BigData? • Our Development Lifecycle ingredients – Gerrit, Jenkins, Mesos, Marathon, CDH / Spark • Topics to address in BigData development – Type of tests (Unit vs. Integration) – Testing the "real thing" (aka the Cluster) • Our BigData virtualised infrastructure – Marathon, Mesos and Dockers all around • Live (minimised) Demo 5
  6. 6. www.gerritforge.com #jenkinsconf WHY? • Early BigData had no process at all = may fail at any time • Mature BigData is mission critical decision maker • Need for more stable sw-engineering methodologies: – Test-Driven Development (Stefano's ScaldingUnit) – Continuous Integration with Jenkins – Integration & Performance testing – Code review and validation 6
  7. 7. www.gerritforge.com #jenkinsconf Code-Review BigData Lifecycle (1) • GIT used by distributed teams (UK, Israel, India) • Topics and Code Review • Jenkins build on every patch-set • Commits reviewed / approved via Gerrit Submit 7
  8. 8. www.gerritforge.com #jenkinsconf Code-Review BigData Lifecycle (2) 8
  9. 9. www.gerritforge.com #jenkinsconf Code-Review BigData Lifecycle (3) • Submitting a Topic automatically does: – all patch-sets merged (semi-atomically) – trigger a longer chain of CI steps – automatically promote a RC if everything passes • Jenkins automation via Gerrit Trigger Plugin 9
  10. 10. www.gerritforge.com #jenkinsconf Ingredients: Gerrit • Git-based Code Review system • Pre-commit review • Allows multiple validation steps (pipeline) • Validation + Integration flags 10
  11. 11. www.gerritforge.com #jenkinsconf Ingredients: Jenkins • Plugins: – Gerrit trigger – Docker build step – Post-build script plugin 11
  12. 12. www.gerritforge.com #jenkinsconf Fitting CDH Into this Picture • Integration Test – Running integration tests into an CDH-enabled docker container – Hadoop/local and Spark/standalone is not enough – Need to test classes serialisation – Validate package fat-jars (libs conflicts with CDH) – Performance on a real cluster 12
  13. 13. www.gerritforge.com #jenkinsconf Fitting CDH Into this Picture • Acceptance / performance test with short-lived CDHs • Solution: Mesos, Marathon and Docker: – Ephemeral clusters with defined capacity – Automatic cluster-config – All controlled via Docker/Mesos 13
  14. 14. www.gerritforge.com #jenkinsconf Mesos + Marathon 14 • Apache Mesos – Abstracts CPU, memory, storage, other compute resources away from machines • Marathon Framework – Runs on top of Mesos – Guarantees that long-running applications never stop – REST API for managing and scaling services
  15. 15. www.gerritforge.com #jenkinsconf CDH Components • CDH 5.4.1 distribution – Apache Spark – Hadoop HDFS – YARN 15
  16. 16. www.gerritforge.com #jenkinsconfSlave Host Integration Test Flow on CDH Cluster 16 Jenkins Master Mesos Master Marathon Private Docker Registry Mesos Slave Docker POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents Marathon Framework receives resource offers from Mesos Master and submits the tasks The task is sent to the Mesos Slave Mesos slave starts the docker container Docker image is fetched from Docker registry if not present in Slave host WaitingforDockers DockersUP Install Cloudera packages via Cloudera Manager API using Python Deploy the ETL, run the ETL and the Integration Tests
  17. 17. www.gerritforge.com #jenkinsconf Unit and Integration Tests sample • Test project: – Test Spark project – ETL from Oracle to HDFS • Unit-test directly on Spark logic • Integration tests for every patch-set: – VERY small dataset just for this demo – CDH and Oracle Docker Images 17
  18. 18. www.gerritforge.com #jenkinsconf O Unit and Integration Tests 18 Hadoop Pseudo- distributed mode Spark Standalone Jenkins Build Job init Submit job Init/read HDFS
  19. 19. #jenkinsconf DEMO Small-scale of BigData Delivery Pipeline 19
  20. 20. www.gerritforge.com #jenkinsconf References • Demo sources https://github.com/GerritForge • Blog: https://gitenterprise.me • Twitter: @GerritReview @GitEnterprise @GerritForge • Learn Gerrit Code Review book: GerritHub.io/book • Get in touch with GerritForge: info@GerritForge.com 20

×