SlideShare a Scribd company logo
1
Gerrit + Jenkins =
Continuous Delivery for Big Data
Mountain View, CA, November 2015
Stefano Galarraga
GerritForge
stefano@gerritforge.com
http://www.gerritforge.com
Real-life case study and future developments
The Team
2
Luca Milanesio
• Co-founder and Director of GerritForge
• over 20 years in Agile Development and ALM
• OpenSource contributor to many projects
(BigData, Continuous Integration, Git/Gerrit)
Antonios Chalkiopulos
• Author of Programming MapReduce with Scalding
• Open source contributor to many BigData projects
• Working on the "land-of-Hadoop' (landoop.com)
Tiago Palma
• Data Warehouse & Big Data Development
• Senior Data Modeler
• Big Data infrastructure specialist
Stefano Galarraga
• 20 years of Agile Development
• Middleware, Big Data, Reactive Distributed Systems.
• Open Source contributor to BigData projects.
Agenda
• What’s special in Big Data
– General lack of support for Unite/Integration testing
– Testing the "real thing" (aka the Cluster)
• Why Gerrit for continuous deployment on BigData?
• Our Development Lifecycle ingredients
– Gerrit, Jenkins, Mesos, Marathon, CDH / Spark
• Gerrit Role and Components
– What did we use, why, what we would like to have
• New developments
– Usint Topics with microservices for “atomic” multi-service changes
• Live (minimised) Demo
• Open points and discussion
3
WHY Gerrit?
• Fast Paced
• Distributed team
• Relatively a “niche” technology
– A lot of “junior” developers
– Need for strong ownership
– Validation rules
– CD => We need to be have green build and consistent code
quality
4
Code-Review Lifecycle
• GIT used by distributed teams (UK, Israel, India)
• Topics and Code Review
• Jenkins build on every patch-set
• Commits reviewed / approved via Gerrit Submit
• Submitting a Topic automatically does:
– all patch-sets merged (semi-atomically)
– trigger a longer chain of CI steps
– automatically promote a RC if everything passes
• Jenkins automation via Gerrit Trigger Plugin
5
Build Steps and Solutions
• Unit tests abstracting from dependencies
• Integration Tests:
– Using Docker to run dependencies on the CI
• “Micro” Hadoop cluster or other dependencies (DBs,
messaging) => Jenkins docker plugin
• When possible “dockerizing” just the required
components and driving them from the test framework
• Performance/Acceptance required a real cluster
6
Fitting CDH Into this Picture
• Acceptance / performance test with short-lived CDHs
• Solution: Mesos, Marathon and Docker:
– Ephemeral clusters with defined capacity
– Automatic cluster-config
– All controlled via Docker/Mesos
• This was quite a long process!!
– mostly because of CDH cluster configuration
7
Mesos + Marathon
8
• Apache Mesos
– Abstracts CPU, memory, storage, other compute
resources away from machines
• Marathon Framework
– Runs on top of Mesos
– Guarantees that long-running applications never
stop
– REST API for managing and scaling services
CDH Components
• CDH 5.4.1 distribution
– Apache Spark
– Hadoop HDFS
– YARN
9
Slave Host
Integration/Performance Test Flow on
CDH Cluster
10
Jenkins
Master
Mesos
Master
Marathon Private
Docker Registry
Mesos
Slave
Docker
POST to Marathon REST
API to start 1 docker
container with Cloudera
Manager and N docker
containers with cloudera
agents
Marathon Framework
receives resource
offers from Mesos
Master and submits
the tasks
The task is sent to the
Mesos Slave
Mesos slave starts
the docker container
Docker image is fetched
from Docker registry if not
present in Slave host
WaitingforDockers
DockersUP
Install Cloudera packages via Cloudera Manager API using Python
Deploy the ETL, run the ETL and the Acceptance Tests
Unit and Integration Tests sample
• Test project:
– Test Spark project
– ETL from Oracle to HDFS
• Unit-test directly on Spark logic
• Integration tests for every patch-set:
– VERY small dataset just for this demo
– CDH and Oracle Docker Images
11
O
Unit and Integration Tests
12
Hadoop Pseudo-
distributed mode
Spark Standalone
Jenkins
Build Job
init
Submit job
Init/read HDFS
DEMO
13
Open Point and Discussion
• Topic based build of multiple artifacts
– Demo implementation is naïve and difficult to maintain
– Race conditions on build of dependent artifacts
• Need more advanced triggering system (zuul might fit)
– Race condition on submit of topic
• Stream event: “topic-submitted” instead/in addition of
many “patch-submitted” event
• Gerrit Trigger plugin should listen to this event to
coordinate
14
Questions?

More Related Content

What's hot

The Road to Continuous Delivery: Evolution Not Revolution 
The Road to Continuous Delivery: Evolution Not Revolution The Road to Continuous Delivery: Evolution Not Revolution 
The Road to Continuous Delivery: Evolution Not Revolution 
Perforce
 
Intro to Git: a hands-on workshop
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshop
Cisco DevNet
 
CloudFest 2018 Hackathon Project Results Presentation - CFHack18
CloudFest 2018 Hackathon Project Results Presentation - CFHack18CloudFest 2018 Hackathon Project Results Presentation - CFHack18
CloudFest 2018 Hackathon Project Results Presentation - CFHack18
Jeffrey J. Hardy
 
OpenShift In a Nutshell - Episode 02 - Architecture
OpenShift In a Nutshell - Episode 02 - ArchitectureOpenShift In a Nutshell - Episode 02 - Architecture
OpenShift In a Nutshell - Episode 02 - Architecture
Behnam Loghmani
 
GitHub for partners
GitHub for partnersGitHub for partners
GitHub for partners
Lorenzo Barbieri
 
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Roberto Pérez Alcolea
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
Knoldus Inc.
 
Jenkins2 - Coding Continuous Delivery Pipelines
Jenkins2 - Coding Continuous Delivery PipelinesJenkins2 - Coding Continuous Delivery Pipelines
Jenkins2 - Coding Continuous Delivery Pipelines
Brent Laster
 
Devops CI-CD pipeline with Containers
Devops CI-CD pipeline with ContainersDevops CI-CD pipeline with Containers
Devops CI-CD pipeline with Containers
NuSpace
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
HYS Enterprise
 
Advanced GitHub Enterprise Administration
Advanced GitHub Enterprise AdministrationAdvanced GitHub Enterprise Administration
Advanced GitHub Enterprise Administration
Lars Schneider
 
Build your android app with gradle
Build your android app with gradleBuild your android app with gradle
Build your android app with gradle
Swain Loda
 
Java and DevOps: Supercharge Your Delivery Pipeline with Containers
Java and DevOps: Supercharge Your Delivery Pipeline with ContainersJava and DevOps: Supercharge Your Delivery Pipeline with Containers
Java and DevOps: Supercharge Your Delivery Pipeline with Containers
Red Hat Developers
 
Introduction to GitHub Actions - How to easily automate and integrate with Gi...
Introduction to GitHub Actions - How to easily automate and integrate with Gi...Introduction to GitHub Actions - How to easily automate and integrate with Gi...
Introduction to GitHub Actions - How to easily automate and integrate with Gi...
All Things Open
 
Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Selecting a Container Image Registry for Production - Microservices Meetup Fe...Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Ritesh Patel
 
Introduction to Git for developers
Introduction to Git for developersIntroduction to Git for developers
Introduction to Git for developers
Dmitry Guyvoronsky
 
Building a universal search interface for the Cloud
Building a universal search interface for the CloudBuilding a universal search interface for the Cloud
Building a universal search interface for the Cloud
Vietnam Open Infrastructure User Group
 
KubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
KubeCon EU 2021 Keynote: Shaping Kubernetes Community CultureKubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
KubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
Bob Killen
 
The Automated Monolith
The Automated MonolithThe Automated Monolith
The Automated Monolith
Haufe-Lexware GmbH & Co KG
 
Collaborating on GitHub for Open Source Documentation
Collaborating on GitHub for Open Source DocumentationCollaborating on GitHub for Open Source Documentation
Collaborating on GitHub for Open Source Documentation
Anne Gentle
 

What's hot (20)

The Road to Continuous Delivery: Evolution Not Revolution 
The Road to Continuous Delivery: Evolution Not Revolution The Road to Continuous Delivery: Evolution Not Revolution 
The Road to Continuous Delivery: Evolution Not Revolution 
 
Intro to Git: a hands-on workshop
Intro to Git: a hands-on workshopIntro to Git: a hands-on workshop
Intro to Git: a hands-on workshop
 
CloudFest 2018 Hackathon Project Results Presentation - CFHack18
CloudFest 2018 Hackathon Project Results Presentation - CFHack18CloudFest 2018 Hackathon Project Results Presentation - CFHack18
CloudFest 2018 Hackathon Project Results Presentation - CFHack18
 
OpenShift In a Nutshell - Episode 02 - Architecture
OpenShift In a Nutshell - Episode 02 - ArchitectureOpenShift In a Nutshell - Episode 02 - Architecture
OpenShift In a Nutshell - Episode 02 - Architecture
 
GitHub for partners
GitHub for partnersGitHub for partners
GitHub for partners
 
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
Leveraging Gradle @ Netflix (Madrid GUG Feb 2, 2021)
 
Introduction To Flink
Introduction To FlinkIntroduction To Flink
Introduction To Flink
 
Jenkins2 - Coding Continuous Delivery Pipelines
Jenkins2 - Coding Continuous Delivery PipelinesJenkins2 - Coding Continuous Delivery Pipelines
Jenkins2 - Coding Continuous Delivery Pipelines
 
Devops CI-CD pipeline with Containers
Devops CI-CD pipeline with ContainersDevops CI-CD pipeline with Containers
Devops CI-CD pipeline with Containers
 
GitLab for CI/CD process
GitLab for CI/CD processGitLab for CI/CD process
GitLab for CI/CD process
 
Advanced GitHub Enterprise Administration
Advanced GitHub Enterprise AdministrationAdvanced GitHub Enterprise Administration
Advanced GitHub Enterprise Administration
 
Build your android app with gradle
Build your android app with gradleBuild your android app with gradle
Build your android app with gradle
 
Java and DevOps: Supercharge Your Delivery Pipeline with Containers
Java and DevOps: Supercharge Your Delivery Pipeline with ContainersJava and DevOps: Supercharge Your Delivery Pipeline with Containers
Java and DevOps: Supercharge Your Delivery Pipeline with Containers
 
Introduction to GitHub Actions - How to easily automate and integrate with Gi...
Introduction to GitHub Actions - How to easily automate and integrate with Gi...Introduction to GitHub Actions - How to easily automate and integrate with Gi...
Introduction to GitHub Actions - How to easily automate and integrate with Gi...
 
Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Selecting a Container Image Registry for Production - Microservices Meetup Fe...Selecting a Container Image Registry for Production - Microservices Meetup Fe...
Selecting a Container Image Registry for Production - Microservices Meetup Fe...
 
Introduction to Git for developers
Introduction to Git for developersIntroduction to Git for developers
Introduction to Git for developers
 
Building a universal search interface for the Cloud
Building a universal search interface for the CloudBuilding a universal search interface for the Cloud
Building a universal search interface for the Cloud
 
KubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
KubeCon EU 2021 Keynote: Shaping Kubernetes Community CultureKubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
KubeCon EU 2021 Keynote: Shaping Kubernetes Community Culture
 
The Automated Monolith
The Automated MonolithThe Automated Monolith
The Automated Monolith
 
Collaborating on GitHub for Open Source Documentation
Collaborating on GitHub for Open Source DocumentationCollaborating on GitHub for Open Source Documentation
Collaborating on GitHub for Open Source Documentation
 

Similar to Gerrit + Jenkins = Continuous Delivery For Big Data

JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data ProjectsJUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
CloudBees
 
Gerrit jenkins-big data-continuous-delivery
Gerrit jenkins-big data-continuous-deliveryGerrit jenkins-big data-continuous-delivery
Gerrit jenkins-big data-continuous-delivery
Luca Milanesio
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
Docker, Inc.
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
DockerPenang Meetup#1
DockerPenang Meetup#1DockerPenang Meetup#1
DockerPenang Meetup#1
Sujay Pillai
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
Taro L. Saito
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
DataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
Evans Ye
 
Docker {at,with} SignalFx
Docker {at,with} SignalFxDocker {at,with} SignalFx
Docker {at,with} SignalFx
Maxime Petazzoni
 
Docker at and with SignalFx
Docker at and with SignalFxDocker at and with SignalFx
Docker at and with SignalFx
SignalFx
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
Grid Dynamics
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for Realists
Oracle Developers
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realists
Karthik Gaekwad
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
Bert Poller
 
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
DEVNET-1169	CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...DEVNET-1169	CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
Cisco DevNet
 
10 tips for Cloud Native Security
10 tips for Cloud Native Security10 tips for Cloud Native Security
10 tips for Cloud Native Security
Karthik Gaekwad
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
Patrick Chanezon
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
Karthik Gaekwad
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
GlobalLogic Ukraine
 

Similar to Gerrit + Jenkins = Continuous Delivery For Big Data (20)

JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data ProjectsJUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
JUC Europe 2015: Jenkins Pipeline for Continuous Delivery of Big Data Projects
 
Gerrit jenkins-big data-continuous-delivery
Gerrit jenkins-big data-continuous-deliveryGerrit jenkins-big data-continuous-delivery
Gerrit jenkins-big data-continuous-delivery
 
DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2DockerCon 15 Keynote - Day 2
DockerCon 15 Keynote - Day 2
 
Lessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker ContainersLessons Learned Running Hadoop and Spark in Docker Containers
Lessons Learned Running Hadoop and Spark in Docker Containers
 
DockerPenang Meetup#1
DockerPenang Meetup#1DockerPenang Meetup#1
DockerPenang Meetup#1
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
 
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioningLeveraging Docker for Hadoop build automation and Big Data stack provisioning
Leveraging Docker for Hadoop build automation and Big Data stack provisioning
 
Leveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioningLeveraging docker for hadoop build automation and big data stack provisioning
Leveraging docker for hadoop build automation and big data stack provisioning
 
How bigtop leveraged docker for build automation and one click hadoop provis...
How bigtop leveraged docker for build automation and  one click hadoop provis...How bigtop leveraged docker for build automation and  one click hadoop provis...
How bigtop leveraged docker for build automation and one click hadoop provis...
 
Docker {at,with} SignalFx
Docker {at,with} SignalFxDocker {at,with} SignalFx
Docker {at,with} SignalFx
 
Docker at and with SignalFx
Docker at and with SignalFxDocker at and with SignalFx
Docker at and with SignalFx
 
DevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 ConferenceDevOps for Big Data - Data 360 2014 Conference
DevOps for Big Data - Data 360 2014 Conference
 
Containers and Microservices for Realists
Containers and Microservices for RealistsContainers and Microservices for Realists
Containers and Microservices for Realists
 
Containers and microservices for realists
Containers and microservices for realistsContainers and microservices for realists
Containers and microservices for realists
 
Docker for the enterprise
Docker for the enterpriseDocker for the enterprise
Docker for the enterprise
 
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
DEVNET-1169	CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...DEVNET-1169	CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
DEVNET-1169 CI/CT/CD on a Micro Services Applications using Docker, Salt & Ni...
 
10 tips for Cloud Native Security
10 tips for Cloud Native Security10 tips for Cloud Native Security
10 tips for Cloud Native Security
 
Devoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and BoltsDevoxx 2016 - Docker Nuts and Bolts
Devoxx 2016 - Docker Nuts and Bolts
 
DevSecOps in a cloudnative world
DevSecOps in a cloudnative worldDevSecOps in a cloudnative world
DevSecOps in a cloudnative world
 
Modern Web-site Development Pipeline
Modern Web-site Development PipelineModern Web-site Development Pipeline
Modern Web-site Development Pipeline
 

Recently uploaded

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 

Recently uploaded (20)

May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 

Gerrit + Jenkins = Continuous Delivery For Big Data

  • 1. 1 Gerrit + Jenkins = Continuous Delivery for Big Data Mountain View, CA, November 2015 Stefano Galarraga GerritForge stefano@gerritforge.com http://www.gerritforge.com Real-life case study and future developments
  • 2. The Team 2 Luca Milanesio • Co-founder and Director of GerritForge • over 20 years in Agile Development and ALM • OpenSource contributor to many projects (BigData, Continuous Integration, Git/Gerrit) Antonios Chalkiopulos • Author of Programming MapReduce with Scalding • Open source contributor to many BigData projects • Working on the "land-of-Hadoop' (landoop.com) Tiago Palma • Data Warehouse & Big Data Development • Senior Data Modeler • Big Data infrastructure specialist Stefano Galarraga • 20 years of Agile Development • Middleware, Big Data, Reactive Distributed Systems. • Open Source contributor to BigData projects.
  • 3. Agenda • What’s special in Big Data – General lack of support for Unite/Integration testing – Testing the "real thing" (aka the Cluster) • Why Gerrit for continuous deployment on BigData? • Our Development Lifecycle ingredients – Gerrit, Jenkins, Mesos, Marathon, CDH / Spark • Gerrit Role and Components – What did we use, why, what we would like to have • New developments – Usint Topics with microservices for “atomic” multi-service changes • Live (minimised) Demo • Open points and discussion 3
  • 4. WHY Gerrit? • Fast Paced • Distributed team • Relatively a “niche” technology – A lot of “junior” developers – Need for strong ownership – Validation rules – CD => We need to be have green build and consistent code quality 4
  • 5. Code-Review Lifecycle • GIT used by distributed teams (UK, Israel, India) • Topics and Code Review • Jenkins build on every patch-set • Commits reviewed / approved via Gerrit Submit • Submitting a Topic automatically does: – all patch-sets merged (semi-atomically) – trigger a longer chain of CI steps – automatically promote a RC if everything passes • Jenkins automation via Gerrit Trigger Plugin 5
  • 6. Build Steps and Solutions • Unit tests abstracting from dependencies • Integration Tests: – Using Docker to run dependencies on the CI • “Micro” Hadoop cluster or other dependencies (DBs, messaging) => Jenkins docker plugin • When possible “dockerizing” just the required components and driving them from the test framework • Performance/Acceptance required a real cluster 6
  • 7. Fitting CDH Into this Picture • Acceptance / performance test with short-lived CDHs • Solution: Mesos, Marathon and Docker: – Ephemeral clusters with defined capacity – Automatic cluster-config – All controlled via Docker/Mesos • This was quite a long process!! – mostly because of CDH cluster configuration 7
  • 8. Mesos + Marathon 8 • Apache Mesos – Abstracts CPU, memory, storage, other compute resources away from machines • Marathon Framework – Runs on top of Mesos – Guarantees that long-running applications never stop – REST API for managing and scaling services
  • 9. CDH Components • CDH 5.4.1 distribution – Apache Spark – Hadoop HDFS – YARN 9
  • 10. Slave Host Integration/Performance Test Flow on CDH Cluster 10 Jenkins Master Mesos Master Marathon Private Docker Registry Mesos Slave Docker POST to Marathon REST API to start 1 docker container with Cloudera Manager and N docker containers with cloudera agents Marathon Framework receives resource offers from Mesos Master and submits the tasks The task is sent to the Mesos Slave Mesos slave starts the docker container Docker image is fetched from Docker registry if not present in Slave host WaitingforDockers DockersUP Install Cloudera packages via Cloudera Manager API using Python Deploy the ETL, run the ETL and the Acceptance Tests
  • 11. Unit and Integration Tests sample • Test project: – Test Spark project – ETL from Oracle to HDFS • Unit-test directly on Spark logic • Integration tests for every patch-set: – VERY small dataset just for this demo – CDH and Oracle Docker Images 11
  • 12. O Unit and Integration Tests 12 Hadoop Pseudo- distributed mode Spark Standalone Jenkins Build Job init Submit job Init/read HDFS
  • 14. Open Point and Discussion • Topic based build of multiple artifacts – Demo implementation is naïve and difficult to maintain – Race conditions on build of dependent artifacts • Need more advanced triggering system (zuul might fit) – Race condition on submit of topic • Stream event: “topic-submitted” instead/in addition of many “patch-submitted” event • Gerrit Trigger plugin should listen to this event to coordinate 14