SlideShare a Scribd company logo
1 of 45
Download to read offline
Continuous Integration
for Spark Apps
Hi, I’m Sean!
© 2015 Uncharted Software Inc.
It’s hard to test Spark Apps :(
© 2015 Uncharted Software Inc.
Case Study: Uncharted Spark Pipeline
© 2015 Uncharted Software Inc.
Case Study: Uncharted Spark Pipeline
Some key issues:
● Ensure reliability
● Prevent regressions
● Maintain compatibility with multiple versions of Spark
● Open-source - need a quick and easy way to evaluate PRs
© 2015 Uncharted Software Inc.
What is Continuous Integration?
© 2015 Uncharted Software Inc.
“Continuous Integration (CI) is a development practice that requires developers to integrate
code into a shared repository several times a day. Each check-in is then verified by an
automated build, allowing teams to detect problems early.”
-- ThoughtWorks
© 2015 Uncharted Software Inc.
“Continuous Integration (CI) is a development practice that is pretty damned
important for writing quality software.”
-- Me
© 2015 Uncharted Software Inc.
So, What is Continuous Integration?
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
5. Build (and test) All The Branches
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
5. Build (and test) All The Branches
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
}duh.
© 2015 Uncharted Software Inc.
Best Practices, courtesy of Wikipedia
1. Maintain a code repository (Git)
2. Automate the build (Gradle)
3. Tests should be part of the build (ScalaTest)
4. Commit/push feature branches often
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
}...less duh.
© 2015 Uncharted Software Inc.
Why are these difficult with Apache Spark?
5. Build (and test) All The Branches
6. Test in a clone of the production
environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
What is a Spark App?
© 2015 Uncharted Software Inc.
What is a Spark app?
Source
JAR
Spark ?
This thing.
JAR
© 2015 Uncharted Software Inc.
And...
Source
JAR
Spark ?
We need to test this
JAR
© 2015 Uncharted Software Inc.
But...
Source
JAR
ScalaTest
Scala RE
By default, we have this
JAR
(boom)
© 2015 Uncharted Software Inc.
v1: Squish Spark inside ScalaTest
Source
JAR
ScalaTest
with
SparkContext
So, we try this
JAR
it works!
(sort of)
© 2015 Uncharted Software Inc.
it works!
(sort of)
© 2015 Uncharted Software Inc.
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
v2: Squish ScalaTest into Spark
Source
Test
JAR
Tests Main.scala
Spark
JAR
Test
JAR
Test
Output
JAR
© 2015 Uncharted Software Inc.
Main.scala
© 2015 Uncharted Software Inc.
6. Test in a clone of the production environment
© 2015 Uncharted Software Inc.
Progress?
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
What now?
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Docker Container
(uncharted/sparklet)
v3: Squish Spark and Test JAR into Docker
Test
Output
Source
Test
JAR
Tests Main.scala
Spark
JAR
JAR
Test
JAR
© 2015 Uncharted Software Inc.
test.sh
© 2015 Uncharted Software Inc.
build.gradle (excerpt)
© 2015 Uncharted Software Inc.
Progress?
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Travis CI VM
Docker Container
v4: Squish Docker into Travis CI
Test
Output
Source
Test
JAR
Tests Main.scala
Spark
JAR
JAR
Test
JAR
© 2015 Uncharted Software Inc.
.travis.yml
© 2015 Uncharted Software Inc.
Voilà!
© 2015 Uncharted Software Inc.
Progress?
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
© 2015 Uncharted Software Inc.
© 2015 Uncharted Software Inc.
All done!
5. Build (and test) All The Branches
6. Test in a clone of the production environment
7. Keep the build fast
8. Everyone can see the results of builds
© 2015 Uncharted Software Inc.
Next Steps?
Alpine Linux
docker-compose
Windows (dev environment) support
python
© 2015 Uncharted Software Inc.
Questions?
https://github.com/unchartedsoftware/sparkpipe-core
https://github.com/Ghnuberath
@Ghnuberath
https://hub.docker.com/r/uncharted/sparklet/
smcintyre@unchartedsoftware.com

More Related Content

What's hot

TDC 2016 Floripa - Testando APIs REST com Supertest e Promises
TDC 2016 Floripa - Testando APIs REST com Supertest e PromisesTDC 2016 Floripa - Testando APIs REST com Supertest e Promises
TDC 2016 Floripa - Testando APIs REST com Supertest e PromisesStefan Teixeira
 
Fundamentals of DevOps and CI/CD
Fundamentals of DevOps and CI/CDFundamentals of DevOps and CI/CD
Fundamentals of DevOps and CI/CDBatyr Nuryyev
 
Ágiles 2016 - Using open source tools to support Continuous Delivery
Ágiles 2016 - Using open source tools to support Continuous DeliveryÁgiles 2016 - Using open source tools to support Continuous Delivery
Ágiles 2016 - Using open source tools to support Continuous DeliveryStefan Teixeira
 
Latinoware 2016 - Continuous Delivery com ferramentas open source
Latinoware 2016 - Continuous Delivery com ferramentas open sourceLatinoware 2016 - Continuous Delivery com ferramentas open source
Latinoware 2016 - Continuous Delivery com ferramentas open sourceStefan Teixeira
 
TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...
TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...
TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...Stefan Teixeira
 
Continuous Integration With Jenkins Docker SQL Server
Continuous Integration With Jenkins Docker SQL ServerContinuous Integration With Jenkins Docker SQL Server
Continuous Integration With Jenkins Docker SQL ServerChris Adkin
 
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneySetting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneyWeaveworks
 
Scrum Gathering Portugal 2016 - Containerizing Tests with Docker
Scrum Gathering Portugal 2016 - Containerizing Tests with DockerScrum Gathering Portugal 2016 - Containerizing Tests with Docker
Scrum Gathering Portugal 2016 - Containerizing Tests with DockerStefan Teixeira
 
DevOps Transformation in Technical
DevOps Transformation in TechnicalDevOps Transformation in Technical
DevOps Transformation in TechnicalOpsta
 
The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014
The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014
The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014Puppet
 
6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...
6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...
6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...Stefan Teixeira
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersWeaveworks
 
Unit Testing in R with Testthat - HRUG
Unit Testing in R with Testthat - HRUGUnit Testing in R with Testthat - HRUG
Unit Testing in R with Testthat - HRUGegoodwintx
 
Git with t for teams
Git with t for teamsGit with t for teams
Git with t for teamsSven Peters
 
20170807 - How to Fail Your TDD Rollout - A Train Wreck Story
20170807 - How to Fail Your TDD Rollout - A Train Wreck Story20170807 - How to Fail Your TDD Rollout - A Train Wreck Story
20170807 - How to Fail Your TDD Rollout - A Train Wreck StoryChris Edwards, P.Eng.
 
Achieving Continuous Delivery with Puppet
Achieving Continuous Delivery with PuppetAchieving Continuous Delivery with Puppet
Achieving Continuous Delivery with PuppetDevoteam Revolve
 
DevOps 及 TDD 開發流程哲學
DevOps 及 TDD 開發流程哲學DevOps 及 TDD 開發流程哲學
DevOps 及 TDD 開發流程哲學謝 宗穎
 
Scaling your CI Pipeline with Docker and Concourse
Scaling your CI Pipeline with Docker and ConcourseScaling your CI Pipeline with Docker and Concourse
Scaling your CI Pipeline with Docker and ConcourseChris Edwards, P.Eng.
 

What's hot (20)

TDC 2016 Floripa - Testando APIs REST com Supertest e Promises
TDC 2016 Floripa - Testando APIs REST com Supertest e PromisesTDC 2016 Floripa - Testando APIs REST com Supertest e Promises
TDC 2016 Floripa - Testando APIs REST com Supertest e Promises
 
Fundamentals of DevOps and CI/CD
Fundamentals of DevOps and CI/CDFundamentals of DevOps and CI/CD
Fundamentals of DevOps and CI/CD
 
Ágiles 2016 - Using open source tools to support Continuous Delivery
Ágiles 2016 - Using open source tools to support Continuous DeliveryÁgiles 2016 - Using open source tools to support Continuous Delivery
Ágiles 2016 - Using open source tools to support Continuous Delivery
 
Latinoware 2016 - Continuous Delivery com ferramentas open source
Latinoware 2016 - Continuous Delivery com ferramentas open sourceLatinoware 2016 - Continuous Delivery com ferramentas open source
Latinoware 2016 - Continuous Delivery com ferramentas open source
 
TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...
TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...
TDC 2016 SP - Continuous Delivery para aplicações Java com ferramentas open-s...
 
Continuous Integration With Jenkins Docker SQL Server
Continuous Integration With Jenkins Docker SQL ServerContinuous Integration With Jenkins Docker SQL Server
Continuous Integration With Jenkins Docker SQL Server
 
True Git
True Git True Git
True Git
 
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison DowdneySetting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
Setting up Notifications, Alerts & Webhooks with Flux v2 by Alison Dowdney
 
Scrum Gathering Portugal 2016 - Containerizing Tests with Docker
Scrum Gathering Portugal 2016 - Containerizing Tests with DockerScrum Gathering Portugal 2016 - Containerizing Tests with Docker
Scrum Gathering Portugal 2016 - Containerizing Tests with Docker
 
DevOps Transformation in Technical
DevOps Transformation in TechnicalDevOps Transformation in Technical
DevOps Transformation in Technical
 
The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014
The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014
The Seven Habits of Highly Effective Puppet Users - PuppetConf 2014
 
6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...
6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...
6º Encontro do Grupo de Testes Carioca - Testes em um contexto de Continuous ...
 
Hands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm UsersHands-on GitOps Patterns for Helm Users
Hands-on GitOps Patterns for Helm Users
 
Unit Testing in R with Testthat - HRUG
Unit Testing in R with Testthat - HRUGUnit Testing in R with Testthat - HRUG
Unit Testing in R with Testthat - HRUG
 
Git Power Routines
Git Power RoutinesGit Power Routines
Git Power Routines
 
Git with t for teams
Git with t for teamsGit with t for teams
Git with t for teams
 
20170807 - How to Fail Your TDD Rollout - A Train Wreck Story
20170807 - How to Fail Your TDD Rollout - A Train Wreck Story20170807 - How to Fail Your TDD Rollout - A Train Wreck Story
20170807 - How to Fail Your TDD Rollout - A Train Wreck Story
 
Achieving Continuous Delivery with Puppet
Achieving Continuous Delivery with PuppetAchieving Continuous Delivery with Puppet
Achieving Continuous Delivery with Puppet
 
DevOps 及 TDD 開發流程哲學
DevOps 及 TDD 開發流程哲學DevOps 及 TDD 開發流程哲學
DevOps 及 TDD 開發流程哲學
 
Scaling your CI Pipeline with Docker and Concourse
Scaling your CI Pipeline with Docker and ConcourseScaling your CI Pipeline with Docker and Concourse
Scaling your CI Pipeline with Docker and Concourse
 

Similar to Continuous Integration for Spark Apps by Sean McIntyre

Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Diego Zuluaga
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys
Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code DeploysOur DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys
Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code DeploysDynatrace
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsSOASTA
 
Continuous integration and delivery for java based web applications
Continuous integration and delivery for java based web applicationsContinuous integration and delivery for java based web applications
Continuous integration and delivery for java based web applicationsSunil Dalal
 
Software Factory - Overview
Software Factory - OverviewSoftware Factory - Overview
Software Factory - Overviewslides_teltools
 
Continous integration and delivery for single page applications
Continous integration and delivery for single page applicationsContinous integration and delivery for single page applications
Continous integration and delivery for single page applicationsSunil Dalal
 
Continuous Delivery for IT Operations Teams
Continuous Delivery for IT Operations TeamsContinuous Delivery for IT Operations Teams
Continuous Delivery for IT Operations TeamsMark Rendell
 
From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]Dynatrace
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersCisco DevNet
 
Continuous delivery is more than dev ops
Continuous delivery is more than dev opsContinuous delivery is more than dev ops
Continuous delivery is more than dev opsAgile Montréal
 
Mobile App Quality Roadmap for DevTest Teams
Mobile App Quality Roadmap for DevTest TeamsMobile App Quality Roadmap for DevTest Teams
Mobile App Quality Roadmap for DevTest TeamsPerfecto by Perforce
 
Continuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationContinuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationMark Rendell
 
Code Coverage for Total Security in Application Migrations
Code Coverage for Total Security in Application MigrationsCode Coverage for Total Security in Application Migrations
Code Coverage for Total Security in Application MigrationsDana Luther
 
Adrian marinica continuous integration in the visual studio world
Adrian marinica   continuous integration in the visual studio worldAdrian marinica   continuous integration in the visual studio world
Adrian marinica continuous integration in the visual studio worldCodecamp Romania
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsSOASTA
 
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 .Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 Tikal Knowledge
 
SAPUI5/OpenUI5 - Continuous Integration
SAPUI5/OpenUI5 - Continuous IntegrationSAPUI5/OpenUI5 - Continuous Integration
SAPUI5/OpenUI5 - Continuous IntegrationPeter Muessig
 
Unit Tests and Test Seams for abap Hamburg June 2017 presented
Unit Tests and Test Seams for abap Hamburg June 2017   presentedUnit Tests and Test Seams for abap Hamburg June 2017   presented
Unit Tests and Test Seams for abap Hamburg June 2017 presentedRainer Winkler
 

Similar to Continuous Integration for Spark Apps by Sean McIntyre (20)

Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0Apigee deploy grunt plugin.1.0
Apigee deploy grunt plugin.1.0
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys
Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code DeploysOur DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys
Our DevOps Journey: 6 Month Waterfalls to 1 Hour Code Deploys
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and Jenkins
 
Continuous integration and delivery for java based web applications
Continuous integration and delivery for java based web applicationsContinuous integration and delivery for java based web applications
Continuous integration and delivery for java based web applications
 
Software Factory - Overview
Software Factory - OverviewSoftware Factory - Overview
Software Factory - Overview
 
Continous integration and delivery for single page applications
Continous integration and delivery for single page applicationsContinous integration and delivery for single page applications
Continous integration and delivery for single page applications
 
Continuous Delivery for IT Operations Teams
Continuous Delivery for IT Operations TeamsContinuous Delivery for IT Operations Teams
Continuous Delivery for IT Operations Teams
 
From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]From 0 to DevOps in 80 Days [Webinar Replay]
From 0 to DevOps in 80 Days [Webinar Replay]
 
Emulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API ProvidersEmulators as an Emerging Best Practice for API Providers
Emulators as an Emerging Best Practice for API Providers
 
Continuous delivery is more than dev ops
Continuous delivery is more than dev opsContinuous delivery is more than dev ops
Continuous delivery is more than dev ops
 
Mobile App Quality Roadmap for DevTest Teams
Mobile App Quality Roadmap for DevTest TeamsMobile App Quality Roadmap for DevTest Teams
Mobile App Quality Roadmap for DevTest Teams
 
Continuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationContinuous Delivery with a PaaS Application
Continuous Delivery with a PaaS Application
 
Code Coverage for Total Security in Application Migrations
Code Coverage for Total Security in Application MigrationsCode Coverage for Total Security in Application Migrations
Code Coverage for Total Security in Application Migrations
 
Adrian marinica continuous integration in the visual studio world
Adrian marinica   continuous integration in the visual studio worldAdrian marinica   continuous integration in the visual studio world
Adrian marinica continuous integration in the visual studio world
 
Continuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and JenkinsContinuous Load Testing with CloudTest and Jenkins
Continuous Load Testing with CloudTest and Jenkins
 
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013 .Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
.Net OSS Ci & CD with Jenkins - JUC ISRAEL 2013
 
Spring Boot
Spring BootSpring Boot
Spring Boot
 
SAPUI5/OpenUI5 - Continuous Integration
SAPUI5/OpenUI5 - Continuous IntegrationSAPUI5/OpenUI5 - Continuous Integration
SAPUI5/OpenUI5 - Continuous Integration
 
Unit Tests and Test Seams for abap Hamburg June 2017 presented
Unit Tests and Test Seams for abap Hamburg June 2017   presentedUnit Tests and Test Seams for abap Hamburg June 2017   presented
Unit Tests and Test Seams for abap Hamburg June 2017 presented
 

More from Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

More from Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Recently uploaded

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhYasamin16
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxellehsormae
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 

Recently uploaded (20)

modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhhThiophen Mechanism khhjjjjjjjhhhhhhhhhhh
Thiophen Mechanism khhjjjjjjjhhhhhhhhhhh
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Vision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptxVision, Mission, Goals and Objectives ppt..pptx
Vision, Mission, Goals and Objectives ppt..pptx
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 

Continuous Integration for Spark Apps by Sean McIntyre

  • 2. Hi, I’m Sean! © 2015 Uncharted Software Inc.
  • 3. It’s hard to test Spark Apps :( © 2015 Uncharted Software Inc.
  • 4. Case Study: Uncharted Spark Pipeline © 2015 Uncharted Software Inc.
  • 5. Case Study: Uncharted Spark Pipeline Some key issues: ● Ensure reliability ● Prevent regressions ● Maintain compatibility with multiple versions of Spark ● Open-source - need a quick and easy way to evaluate PRs © 2015 Uncharted Software Inc.
  • 6. What is Continuous Integration? © 2015 Uncharted Software Inc.
  • 7. “Continuous Integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is then verified by an automated build, allowing teams to detect problems early.” -- ThoughtWorks © 2015 Uncharted Software Inc.
  • 8. “Continuous Integration (CI) is a development practice that is pretty damned important for writing quality software.” -- Me © 2015 Uncharted Software Inc.
  • 9. So, What is Continuous Integration? © 2015 Uncharted Software Inc.
  • 10. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) © 2015 Uncharted Software Inc.
  • 11. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) © 2015 Uncharted Software Inc.
  • 12. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) © 2015 Uncharted Software Inc.
  • 13. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often © 2015 Uncharted Software Inc.
  • 14. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches © 2015 Uncharted Software Inc.
  • 15. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment © 2015 Uncharted Software Inc.
  • 16. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast © 2015 Uncharted Software Inc.
  • 17. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 18. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds }duh. © 2015 Uncharted Software Inc.
  • 19. Best Practices, courtesy of Wikipedia 1. Maintain a code repository (Git) 2. Automate the build (Gradle) 3. Tests should be part of the build (ScalaTest) 4. Commit/push feature branches often 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds }...less duh. © 2015 Uncharted Software Inc.
  • 20. Why are these difficult with Apache Spark? 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 21. What is a Spark App? © 2015 Uncharted Software Inc.
  • 22. What is a Spark app? Source JAR Spark ? This thing. JAR © 2015 Uncharted Software Inc.
  • 23. And... Source JAR Spark ? We need to test this JAR © 2015 Uncharted Software Inc.
  • 24. But... Source JAR ScalaTest Scala RE By default, we have this JAR (boom) © 2015 Uncharted Software Inc.
  • 25. v1: Squish Spark inside ScalaTest Source JAR ScalaTest with SparkContext So, we try this JAR it works! (sort of) © 2015 Uncharted Software Inc.
  • 26. it works! (sort of) © 2015 Uncharted Software Inc.
  • 27. 6. Test in a clone of the production environment © 2015 Uncharted Software Inc.
  • 28. v2: Squish ScalaTest into Spark Source Test JAR Tests Main.scala Spark JAR Test JAR Test Output JAR © 2015 Uncharted Software Inc.
  • 30. 6. Test in a clone of the production environment © 2015 Uncharted Software Inc.
  • 31. Progress? 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 32. What now? 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 33. Docker Container (uncharted/sparklet) v3: Squish Spark and Test JAR into Docker Test Output Source Test JAR Tests Main.scala Spark JAR JAR Test JAR © 2015 Uncharted Software Inc.
  • 34. test.sh © 2015 Uncharted Software Inc.
  • 35. build.gradle (excerpt) © 2015 Uncharted Software Inc.
  • 36. Progress? 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 37. Travis CI VM Docker Container v4: Squish Docker into Travis CI Test Output Source Test JAR Tests Main.scala Spark JAR JAR Test JAR © 2015 Uncharted Software Inc.
  • 39. Voilà! © 2015 Uncharted Software Inc.
  • 40. Progress? 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 41. © 2015 Uncharted Software Inc.
  • 42. © 2015 Uncharted Software Inc.
  • 43. All done! 5. Build (and test) All The Branches 6. Test in a clone of the production environment 7. Keep the build fast 8. Everyone can see the results of builds © 2015 Uncharted Software Inc.
  • 44. Next Steps? Alpine Linux docker-compose Windows (dev environment) support python © 2015 Uncharted Software Inc.