Gerrit + Jenkins = Continuous Delivery For Big Data

Stefano Galarraga
Stefano GalarragaLead Developer - BigData Expert at Crowdmix
1
Gerrit + Jenkins =
Continuous Delivery for Big Data
Mountain View, CA, November 2015
Stefano Galarraga
GerritForge
stefano@gerritforge.com
http://www.gerritforge.com
Real-life case study and future developments
The Team
2
Luca Milanesio
• Co-founder and Director of GerritForge
• over 20 years in Agile Development and ALM
• OpenSource contributor to many projects
(BigData, Continuous Integration, Git/Gerrit)
Antonios Chalkiopulos
• Author of Programming MapReduce with Scalding
• Open source contributor to many BigData projects
• Working on the "land-of-Hadoop' (landoop.com)
Tiago Palma
• Data Warehouse & Big Data Development
• Senior Data Modeler
• Big Data infrastructure specialist
Stefano Galarraga
• 20 years of Agile Development
• Middleware, Big Data, Reactive Distributed Systems.
• Open Source contributor to BigData projects.
Agenda
• What’s special in Big Data
– General lack of support for Unite/Integration testing
– Testing the "real thing" (aka the Cluster)
• Why Gerrit for continuous deployment on BigData?
• Our Development Lifecycle ingredients
– Gerrit, Jenkins, Mesos, Marathon, CDH / Spark
• Gerrit Role and Components
– What did we use, why, what we would like to have
• New developments
– Usint Topics with microservices for “atomic” multi-service changes
• Live (minimised) Demo
• Open points and discussion
3
WHY Gerrit?
• Fast Paced
• Distributed team
• Relatively a “niche” technology
– A lot of “junior” developers
– Need for strong ownership
– Validation rules
– CD => We need to be have green build and consistent code
quality
4
Code-Review Lifecycle
• GIT used by distributed teams (UK, Israel, India)
• Topics and Code Review
• Jenkins build on every patch-set
• Commits reviewed / approved via Gerrit Submit
• Submitting a Topic automatically does:
– all patch-sets merged (semi-atomically)
– trigger a longer chain of CI steps
– automatically promote a RC if everything passes
• Jenkins automation via Gerrit Trigger Plugin
5
Build Steps and Solutions
• Unit tests abstracting from dependencies
• Integration Tests:
– Using Docker to run dependencies on the CI
• “Micro” Hadoop cluster or other dependencies (DBs,
messaging) => Jenkins docker plugin
• When possible “dockerizing” just the required
components and driving them from the test framework
• Performance/Acceptance required a real cluster
6
Fitting CDH Into this Picture
• Acceptance / performance test with short-lived CDHs
• Solution: Mesos, Marathon and Docker:
– Ephemeral clusters with defined capacity
– Automatic cluster-config
– All controlled via Docker/Mesos
• This was quite a long process!!
– mostly because of CDH cluster configuration
7
Mesos + Marathon
8
• Apache Mesos
– Abstracts CPU, memory, storage, other compute
resources away from machines
• Marathon Framework
– Runs on top of Mesos
– Guarantees that long-running applications never
stop
– REST API for managing and scaling services
CDH Components
• CDH 5.4.1 distribution
– Apache Spark
– Hadoop HDFS
– YARN
9
Slave Host
Integration/Performance Test Flow on
CDH Cluster
10
Jenkins
Master
Mesos
Master
Marathon Private
Docker Registry
Mesos
Slave
Docker
POST to Marathon REST
API to start 1 docker
container with Cloudera
Manager and N docker
containers with cloudera
agents
Marathon Framework
receives resource
offers from Mesos
Master and submits
the tasks
The task is sent to the
Mesos Slave
Mesos slave starts
the docker container
Docker image is fetched
from Docker registry if not
present in Slave host
WaitingforDockers
DockersUP
Install Cloudera packages via Cloudera Manager API using Python
Deploy the ETL, run the ETL and the Acceptance Tests
Unit and Integration Tests sample
• Test project:
– Test Spark project
– ETL from Oracle to HDFS
• Unit-test directly on Spark logic
• Integration tests for every patch-set:
– VERY small dataset just for this demo
– CDH and Oracle Docker Images
11
O
Unit and Integration Tests
12
Hadoop Pseudo-
distributed mode
Spark Standalone
Jenkins
Build Job
init
Submit job
Init/read HDFS
DEMO
13
Open Point and Discussion
• Topic based build of multiple artifacts
– Demo implementation is naïve and difficult to maintain
– Race conditions on build of dependent artifacts
• Need more advanced triggering system (zuul might fit)
– Race condition on submit of topic
• Stream event: “topic-submitted” instead/in addition of
many “patch-submitted” event
• Gerrit Trigger plugin should listen to this event to
coordinate
14
Questions?
1 of 15

Recommended

Integrating Git, Gerrit and Jenkins/Hudson with Mylyn by
Integrating Git, Gerrit and Jenkins/Hudson with MylynIntegrating Git, Gerrit and Jenkins/Hudson with Mylyn
Integrating Git, Gerrit and Jenkins/Hudson with MylynSascha Scholz
2.2K views12 slides
Gerrit linuxtag2011 by
Gerrit linuxtag2011Gerrit linuxtag2011
Gerrit linuxtag2011thkoch
4K views42 slides
Gerrit is Getting Native with RPM, Deb and Docker by
Gerrit is Getting Native with RPM, Deb and DockerGerrit is Getting Native with RPM, Deb and Docker
Gerrit is Getting Native with RPM, Deb and DockerLuca Milanesio
3.2K views26 slides
Integrating Git, Gerrit and Jenkins/Hudson with Mylyn by
Integrating Git, Gerrit and Jenkins/Hudson with MylynIntegrating Git, Gerrit and Jenkins/Hudson with Mylyn
Integrating Git, Gerrit and Jenkins/Hudson with MylynSascha Scholz
1K views11 slides
Ultimate DevOps: OpenShift Dedicated With CloudBees Jenkins Platform (Andy Pe... by
Ultimate DevOps: OpenShift Dedicated With CloudBees Jenkins Platform (Andy Pe...Ultimate DevOps: OpenShift Dedicated With CloudBees Jenkins Platform (Andy Pe...
Ultimate DevOps: OpenShift Dedicated With CloudBees Jenkins Platform (Andy Pe...Red Hat Developers
1.8K views34 slides
Git Everyday by
Git EverydayGit Everyday
Git EverydayPerforce
239 views10 slides

More Related Content

What's hot

The Road to Continuous Delivery: Evolution Not Revolution  by
The Road to Continuous Delivery: Evolution Not Revolution The Road to Continuous Delivery: Evolution Not Revolution 
The Road to Continuous Delivery: Evolution Not Revolution Perforce
429 views39 slides
Intro to Git: a hands-on workshop by
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshopCisco DevNet
1.3K views22 slides
CloudFest 2018 Hackathon Project Results Presentation - CFHack18 by
CloudFest 2018 Hackathon Project Results Presentation - CFHack18CloudFest 2018 Hackathon Project Results Presentation - CFHack18
CloudFest 2018 Hackathon Project Results Presentation - CFHack18Jeffrey J. Hardy
428 views47 slides
OpenShift In a Nutshell - Episode 02 - Architecture by
OpenShift In a Nutshell - Episode 02 - ArchitectureOpenShift In a Nutshell - Episode 02 - Architecture
OpenShift In a Nutshell - Episode 02 - ArchitectureBehnam Loghmani
993 views18 slides
GitHub for partners by
GitHub for partnersGitHub for partners
GitHub for partnersLorenzo Barbieri
123 views23 slides
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021) by
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Roberto Pérez Alcolea
998 views48 slides

What's hot(20)

The Road to Continuous Delivery: Evolution Not Revolution  by Perforce
The Road to Continuous Delivery: Evolution Not Revolution The Road to Continuous Delivery: Evolution Not Revolution 
The Road to Continuous Delivery: Evolution Not Revolution 
Perforce429 views
Intro to Git: a hands-on workshop by Cisco DevNet
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshop
Cisco DevNet1.3K views
CloudFest 2018 Hackathon Project Results Presentation - CFHack18 by Jeffrey J. Hardy
CloudFest 2018 Hackathon Project Results Presentation - CFHack18CloudFest 2018 Hackathon Project Results Presentation - CFHack18
CloudFest 2018 Hackathon Project Results Presentation - CFHack18
Jeffrey J. Hardy428 views
OpenShift In a Nutshell - Episode 02 - Architecture by Behnam Loghmani
OpenShift In a Nutshell - Episode 02 - ArchitectureOpenShift In a Nutshell - Episode 02 - Architecture
OpenShift In a Nutshell - Episode 02 - Architecture
Behnam Loghmani993 views
Introduction To Flink by Knoldus Inc.
Introduction To FlinkIntroduction To Flink
Introduction To Flink
Knoldus Inc.201 views
Jenkins2 - Coding Continuous Delivery Pipelines by Brent Laster
Jenkins2 - Coding Continuous Delivery PipelinesJenkins2 - Coding Continuous Delivery Pipelines
Jenkins2 - Coding Continuous Delivery Pipelines
Brent Laster941 views
Devops CI-CD pipeline with Containers by NuSpace
Devops CI-CD pipeline with ContainersDevops CI-CD pipeline with Containers
Devops CI-CD pipeline with Containers
NuSpace1.2K views
Advanced GitHub Enterprise Administration by Lars Schneider
Advanced GitHub Enterprise AdministrationAdvanced GitHub Enterprise Administration
Advanced GitHub Enterprise Administration
Lars Schneider2.3K views
Build your android app with gradle by Swain Loda
Build your android app with gradleBuild your android app with gradle
Build your android app with gradle
Swain Loda186 views
Java and DevOps: Supercharge Your Delivery Pipeline with Containers by Red Hat Developers
Java and DevOps: Supercharge Your Delivery Pipeline with ContainersJava and DevOps: Supercharge Your Delivery Pipeline with Containers
Java and DevOps: Supercharge Your Delivery Pipeline with Containers
Red Hat Developers766 views
Introduction to GitHub Actions - How to easily automate and integrate with Gi... by All Things Open
Introduction to GitHub Actions - How to easily automate and integrate with Gi...Introduction to GitHub Actions - How to easily automate and integrate with Gi...
Introduction to GitHub Actions - How to easily automate and integrate with Gi...
All Things Open109 views
Selecting a Container Image Registry for Production - Microservices Meetup Fe... by Ritesh Patel
Selecting a Container Image Registry for Production - Microservices Meetup Fe...Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Ritesh Patel459 views
KubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture by Bob Killen
KubeCon EU 2021 Keynote: Shaping Kubernetes Community CultureKubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
KubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
Bob Killen32 views
Collaborating on GitHub for Open Source Documentation by Anne Gentle
Collaborating on GitHub for Open Source DocumentationCollaborating on GitHub for Open Source Documentation
Collaborating on GitHub for Open Source Documentation
Anne Gentle985 views

Similar to Gerrit + Jenkins = Continuous Delivery For Big Data

Gerrit jenkins-big data-continuous-delivery by
Gerrit jenkins-big data-continuous-deliveryGerrit jenkins-big data-continuous-delivery
Gerrit jenkins-big data-continuous-deliveryLuca Milanesio
1.7K views20 slides
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects by
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data ProjectsJUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data ProjectsCloudBees
291 views20 slides
DockerCon 15 Keynote - Day 2 by
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2Docker, Inc.
20.2K views97 slides
Lessons Learned Running Hadoop and Spark in Docker Containers by
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersBlueData, Inc.
18.3K views34 slides
DockerPenang Meetup#1 by
DockerPenang Meetup#1DockerPenang Meetup#1
DockerPenang Meetup#1Sujay Pillai
168 views43 slides
Tips For Maintaining OSS Projects by
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTaro L. Saito
314 views21 slides

Similar to Gerrit + Jenkins = Continuous Delivery For Big Data(20)

Gerrit jenkins-big data-continuous-delivery by Luca Milanesio
Gerrit jenkins-big data-continuous-deliveryGerrit jenkins-big data-continuous-delivery
Gerrit jenkins-big data-continuous-delivery
Luca Milanesio1.7K views
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects by CloudBees
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data ProjectsJUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
CloudBees291 views
DockerCon 15 Keynote - Day 2 by Docker, Inc.
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
Docker, Inc.20.2K views
Lessons Learned Running Hadoop and Spark in Docker Containers by BlueData, Inc.
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc. 18.3K views
DockerPenang Meetup#1 by Sujay Pillai
DockerPenang Meetup#1DockerPenang Meetup#1
DockerPenang Meetup#1
Sujay Pillai168 views
Tips For Maintaining OSS Projects by Taro L. Saito
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
Taro L. Saito314 views
Leveraging docker for hadoop build automation and big data stack provisioning by Evans Ye
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye863 views
Leveraging Docker for Hadoop build automation and Big Data stack provisioning by DataWorks Summit
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit1.4K views
How bigtop leveraged docker for build automation and one click hadoop provis... by Evans Ye
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye3.4K views
Docker at and with SignalFx by SignalFx
Docker at and with SignalFxDocker at and with SignalFx
Docker at and with SignalFx
SignalFx800 views
DevOps for Big Data - Data 360 2014 Conference by Grid Dynamics
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics3.2K views
Containers and Microservices for Realists by Oracle Developers
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for Realists
Oracle Developers1.7K views
Containers and microservices for realists by Karthik Gaekwad
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realists
Karthik Gaekwad2.2K views
Docker for the enterprise by Bert Poller
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller798 views
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni... by Cisco DevNet
DEVNET-1169	CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...DEVNET-1169	CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
Cisco DevNet1.8K views
10 tips for Cloud Native Security by Karthik Gaekwad
10 tips for Cloud Native Security10 tips for Cloud Native Security
10 tips for Cloud Native Security
Karthik Gaekwad261 views
Devoxx 2016 - Docker Nuts and Bolts by Patrick Chanezon
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
Patrick Chanezon1.4K views
DevSecOps in a cloudnative world by Karthik Gaekwad
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
Karthik Gaekwad412 views

Recently uploaded

Generic or specific? Making sensible software design decisions by
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisionsBert Jan Schrijver
6 views60 slides
Unleash The Monkeys by
Unleash The MonkeysUnleash The Monkeys
Unleash The MonkeysJacob Duijzer
8 views28 slides
SAP FOR CONTRACT MANUFACTURING.pdf by
SAP FOR CONTRACT MANUFACTURING.pdfSAP FOR CONTRACT MANUFACTURING.pdf
SAP FOR CONTRACT MANUFACTURING.pdfVirendra Rai, PMP
13 views2 slides
Airline Booking Software by
Airline Booking SoftwareAirline Booking Software
Airline Booking SoftwareSharmiMehta
6 views26 slides
Advanced API Mocking Techniques by
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking TechniquesDimpy Adhikary
19 views11 slides
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...Deltares
14 views23 slides

Recently uploaded(20)

Generic or specific? Making sensible software design decisions by Bert Jan Schrijver
Generic or specific? Making sensible software design decisionsGeneric or specific? Making sensible software design decisions
Generic or specific? Making sensible software design decisions
Airline Booking Software by SharmiMehta
Airline Booking SoftwareAirline Booking Software
Airline Booking Software
SharmiMehta6 views
Advanced API Mocking Techniques by Dimpy Adhikary
Advanced API Mocking TechniquesAdvanced API Mocking Techniques
Advanced API Mocking Techniques
Dimpy Adhikary19 views
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko... by Deltares
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
DSD-INT 2023 Simulation of Coastal Hydrodynamics and Water Quality in Hong Ko...
Deltares14 views
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft... by Deltares
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...
DSD-INT 2023 Process-based modelling of salt marsh development coupling Delft...
Deltares7 views
Sprint 226 by ManageIQ
Sprint 226Sprint 226
Sprint 226
ManageIQ5 views
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx by animuscrm
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
2023-November-Schneider Electric-Meetup-BCN Admin Group.pptx
animuscrm15 views
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with... by sparkfabrik
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
20231129 - Platform @ localhost 2023 - Application-driven infrastructure with...
sparkfabrik7 views
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs by Deltares
DSD-INT 2023 The Danube Hazardous Substances Model - KovacsDSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
DSD-INT 2023 The Danube Hazardous Substances Model - Kovacs
Deltares10 views
360 graden fabriek by info33492
360 graden fabriek360 graden fabriek
360 graden fabriek
info33492122 views
Copilot Prompting Toolkit_All Resources.pdf by Riccardo Zamana
Copilot Prompting Toolkit_All Resources.pdfCopilot Prompting Toolkit_All Resources.pdf
Copilot Prompting Toolkit_All Resources.pdf
Riccardo Zamana10 views

Gerrit + Jenkins = Continuous Delivery For Big Data

  • 1. 1 Gerrit + Jenkins = Continuous Delivery for Big Data Mountain View, CA, November 2015 Stefano Galarraga GerritForge stefano@gerritforge.com http://www.gerritforge.com Real-life case study and future developments
  • 2. The Team 2 Luca Milanesio • Co-founder and Director of GerritForge • over 20 years in Agile Development and ALM • OpenSource contributor to many projects (BigData, Continuous Integration, Git/Gerrit) Antonios Chalkiopulos • Author of Programming MapReduce with Scalding • Open source contributor to many BigData projects • Working on the "land-of-Hadoop' (landoop.com) Tiago Palma • Data Warehouse & Big Data Development • Senior Data Modeler • Big Data infrastructure specialist Stefano Galarraga • 20 years of Agile Development • Middleware, Big Data, Reactive Distributed Systems. • Open Source contributor to BigData projects.
  • 3. Agenda • What’s special in Big Data – General lack of support for Unite/Integration testing – Testing the "real thing" (aka the Cluster) • Why Gerrit for continuous deployment on BigData? • Our Development Lifecycle ingredients – Gerrit, Jenkins, Mesos, Marathon, CDH / Spark • Gerrit Role and Components – What did we use, why, what we would like to have • New developments – Usint Topics with microservices for “atomic” multi-service changes • Live (minimised) Demo • Open points and discussion 3
  • 4. WHY Gerrit? • Fast Paced • Distributed team • Relatively a “niche” technology – A lot of “junior” developers – Need for strong ownership – Validation rules – CD => We need to be have green build and consistent code quality 4
  • 5. Code-Review Lifecycle • GIT used by distributed teams (UK, Israel, India) • Topics and Code Review • Jenkins build on every patch-set • Commits reviewed / approved via Gerrit Submit • Submitting a Topic automatically does: – all patch-sets merged (semi-atomically) – trigger a longer chain of CI steps – automatically promote a RC if everything passes • Jenkins automation via Gerrit Trigger Plugin 5
  • 6. Build Steps and Solutions • Unit tests abstracting from dependencies • Integration Tests: – Using Docker to run dependencies on the CI • “Micro” Hadoop cluster or other dependencies (DBs, messaging) => Jenkins docker plugin • When possible “dockerizing” just the required components and driving them from the test framework • Performance/Acceptance required a real cluster 6
  • 7. Fitting CDH Into this Picture • Acceptance / performance test with short-lived CDHs • Solution: Mesos, Marathon and Docker: – Ephemeral clusters with defined capacity – Automatic cluster-config – All controlled via Docker/Mesos • This was quite a long process!! – mostly because of CDH cluster configuration 7
  • 8. Mesos + Marathon 8 • Apache Mesos – Abstracts CPU, memory, storage, other compute resources away from machines • Marathon Framework – Runs on top of Mesos – Guarantees that long-running applications never stop – REST API for managing and scaling services
  • 9. CDH Components • CDH 5.4.1 distribution – Apache Spark – Hadoop HDFS – YARN 9
  • 10. Slave Host Integration/Performance Test Flow on CDH Cluster 10 Jenkins Master Mesos Master Marathon Private Docker Registry Mesos Slave Docker POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents Marathon Framework receives resource offers from Mesos Master and submits the tasks The task is sent to the Mesos Slave Mesos slave starts the docker container Docker image is fetched from Docker registry if not present in Slave host WaitingforDockers DockersUP Install Cloudera packages via Cloudera Manager API using Python Deploy the ETL, run the ETL and the Acceptance Tests
  • 11. Unit and Integration Tests sample • Test project: – Test Spark project – ETL from Oracle to HDFS • Unit-test directly on Spark logic • Integration tests for every patch-set: – VERY small dataset just for this demo – CDH and Oracle Docker Images 11
  • 12. O Unit and Integration Tests 12 Hadoop Pseudo- distributed mode Spark Standalone Jenkins Build Job init Submit job Init/read HDFS
  • 14. Open Point and Discussion • Topic based build of multiple artifacts – Demo implementation is naïve and difficult to maintain – Race conditions on build of dependent artifacts • Need more advanced triggering system (zuul might fit) – Race condition on submit of topic • Stream event: “topic-submitted” instead/in addition of many “patch-submitted” event • Gerrit Trigger plugin should listen to this event to coordinate 14