SlideShare a Scribd company logo
1 of 15
Download to read offline
Building Machine Learning
applications locally with Spark
21/06/2017
Joel Pinho Lucas
Agenda
• Problems and Motivation
• Spark and MLlib overview
• Launching applications in a Spark cluster
• Simulating a Spark cluster using Docker
• Demo: deploying a Spark cluster in a local machine
• Unit tests for Spark jobs
2
3
• How to setup a Spark cluster (infra + configuration)?
• Test and/or Debug a Spark job
• All team should have the same environment
4
• Lightweight cluster
• One machine
• Same environment for all team
• Deployed easily in any platform
Run Spark Locally with docker
5
• Easy to develop (API in Java, Scala, Python, R)
• High Quality algorithms
http://spark.apache.org/mllib/
• Fast to run
• Lazy evaluation
• In memory Storage
6
http://spark.apache.org/docs/2.1.0/cluster-overview.html
Spark Execution Model
Cluster Types
• Standalone
• Apache Mesos
• HadoopYarn
7
8
Starting a Cluster Manually
Manually Submitting an Application
Choose your Docker Image
(or build your own and share)
9
Some available Spark Docker
Images
10
• https://github.com/big-data-europe/docker-spark
• https://hub.docker.com/r/internavenue/centos-spark/
• https://github.com/sequenceiq/docker-spark
• https://github.com/epahomov/docker-spark
• https://www.anchormen.nl/spark-docker/
• https://github.com/gettyimages/docker-spark
• https://hub.docker.com/r/bigdatauniversity/spark/
http://github.com/joelplucas/docker-spark 11
Example to Run
• MLlib's FP-Growth algorithm
• Data from the digital publishing domain
• Problem: to find frequent patterns from navigation profiles
• Write results in MongoDB
http://github.com/joelplucas/fpgrowth-spark-example
12
The Dataset
13
Unit Testing using Spark Testing Base
• Launched in Strata NYC 2015 by Holden Karau (and maintained by the community)
• Supports unit tests in Java, Scala and Python
14
Q&A - Contact
‣ Linkedin: http://br.linkedin.com/in/joelplucas/
‣ Email: joelpl@gmail.com
15

More Related Content

What's hot

"Walk in a distributed systems park with Orleans" Евгений Бобров
"Walk in a distributed systems park with Orleans" Евгений Бобров"Walk in a distributed systems park with Orleans" Евгений Бобров
"Walk in a distributed systems park with Orleans" Евгений Бобров
Fwdays
 

What's hot (20)

Massively Scaleable .NET Web Services with Project Orleans
Massively Scaleable .NET Web Services with Project OrleansMassively Scaleable .NET Web Services with Project Orleans
Massively Scaleable .NET Web Services with Project Orleans
 
Deploying Personalized Learning Labs using Docker Swarm by Nate Aune and Bria...
Deploying Personalized Learning Labs using Docker Swarm by Nate Aune and Bria...Deploying Personalized Learning Labs using Docker Swarm by Nate Aune and Bria...
Deploying Personalized Learning Labs using Docker Swarm by Nate Aune and Bria...
 
Actors Set the Stage for Project Orleans
Actors Set the Stage for Project OrleansActors Set the Stage for Project Orleans
Actors Set the Stage for Project Orleans
 
"Walk in a distributed systems park with Orleans" Евгений Бобров
"Walk in a distributed systems park with Orleans" Евгений Бобров"Walk in a distributed systems park with Orleans" Евгений Бобров
"Walk in a distributed systems park with Orleans" Евгений Бобров
 
Docker in the Cloud
Docker in the CloudDocker in the Cloud
Docker in the Cloud
 
Splunk user group - automating Splunk with Ansible
Splunk user group - automating Splunk with AnsibleSplunk user group - automating Splunk with Ansible
Splunk user group - automating Splunk with Ansible
 
Sas 2015 event_driven
Sas 2015 event_drivenSas 2015 event_driven
Sas 2015 event_driven
 
Fast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda FunctionsFast Deployments to Multiple Golang Lambda Functions
Fast Deployments to Multiple Golang Lambda Functions
 
Hashicorp Products Overview
Hashicorp Products OverviewHashicorp Products Overview
Hashicorp Products Overview
 
Microsoft ASP.NET 5 - The new kid on the block
Microsoft ASP.NET 5 - The new kid on the block Microsoft ASP.NET 5 - The new kid on the block
Microsoft ASP.NET 5 - The new kid on the block
 
So Easy, A Ten Year Old Can Do It by Zeph Gardler
So Easy, A Ten Year Old Can Do It by Zeph GardlerSo Easy, A Ten Year Old Can Do It by Zeph Gardler
So Easy, A Ten Year Old Can Do It by Zeph Gardler
 
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National PoliceiSense Java Summit 2017 - Microservices in action at the Dutch National Police
iSense Java Summit 2017 - Microservices in action at the Dutch National Police
 
In
InIn
In
 
Introducing .NET Core Open Source
Introducing .NET Core Open SourceIntroducing .NET Core Open Source
Introducing .NET Core Open Source
 
DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop DevOpsCon Cloud Workshop
DevOpsCon Cloud Workshop
 
7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows7 steps to simplifying your AI workflows
7 steps to simplifying your AI workflows
 
Get There meetup March 2018 - Microservices in action at the Dutch National P...
Get There meetup March 2018 - Microservices in action at the Dutch National P...Get There meetup March 2018 - Microservices in action at the Dutch National P...
Get There meetup March 2018 - Microservices in action at the Dutch National P...
 
Dublin JUG February 2018 - Microservices in action at the Dutch National Police
Dublin JUG February 2018 - Microservices in action at the Dutch National PoliceDublin JUG February 2018 - Microservices in action at the Dutch National Police
Dublin JUG February 2018 - Microservices in action at the Dutch National Police
 
Actors evolved- Rotem Hermon
Actors evolved- Rotem HermonActors evolved- Rotem Hermon
Actors evolved- Rotem Hermon
 
The Architect Way
The Architect WayThe Architect Way
The Architect Way
 

Similar to Building machine learning applications locally with spark

Learning Oracle with Oracle VM VirtualBox
Learning Oracle with Oracle VM VirtualBoxLearning Oracle with Oracle VM VirtualBox
Learning Oracle with Oracle VM VirtualBox
Leighton Nelson
 

Similar to Building machine learning applications locally with spark (20)

Productionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan ChanProductionizing Spark and the REST Job Server- Evan Chan
Productionizing Spark and the REST Job Server- Evan Chan
 
Productionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job ServerProductionizing Spark and the Spark Job Server
Productionizing Spark and the Spark Job Server
 
Getting started with SparkSQL - Desert Code Camp 2016
Getting started with SparkSQL  - Desert Code Camp 2016Getting started with SparkSQL  - Desert Code Camp 2016
Getting started with SparkSQL - Desert Code Camp 2016
 
Java in a world of containers
Java in a world of containersJava in a world of containers
Java in a world of containers
 
Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018Java in a World of Containers - DockerCon 2018
Java in a World of Containers - DockerCon 2018
 
Java script nirvana in netbeans [con5679]
Java script nirvana in netbeans [con5679]Java script nirvana in netbeans [con5679]
Java script nirvana in netbeans [con5679]
 
Why to docker
Why to dockerWhy to docker
Why to docker
 
Docker Introduction
Docker IntroductionDocker Introduction
Docker Introduction
 
Learning Oracle with Oracle VM VirtualBox
Learning Oracle with Oracle VM VirtualBoxLearning Oracle with Oracle VM VirtualBox
Learning Oracle with Oracle VM VirtualBox
 
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
Spark as a Platform to Support Multi-Tenancy and Many Kinds of Data Applicati...
 
Webinar: Creating an Effective Docker Build Pipeline for Java Apps
Webinar: Creating an Effective Docker Build Pipeline for Java AppsWebinar: Creating an Effective Docker Build Pipeline for Java Apps
Webinar: Creating an Effective Docker Build Pipeline for Java Apps
 
Spark SQL & Machine Learning - A Practical Demonstration
Spark SQL & Machine Learning - A Practical DemonstrationSpark SQL & Machine Learning - A Practical Demonstration
Spark SQL & Machine Learning - A Practical Demonstration
 
Lessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On DockerLessons Learned From Running Spark On Docker
Lessons Learned From Running Spark On Docker
 
Getting started with Apache Spark
Getting started with Apache SparkGetting started with Apache Spark
Getting started with Apache Spark
 
Run your Java code on Cloud Foundry
Run your Java code on Cloud FoundryRun your Java code on Cloud Foundry
Run your Java code on Cloud Foundry
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
 
Spring to Image
Spring to ImageSpring to Image
Spring to Image
 
Using the Splunk Java SDK
Using the Splunk Java SDKUsing the Splunk Java SDK
Using the Splunk Java SDK
 
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetupAutoscaling Spark on AWS EC2 - 11th Spark London meetup
Autoscaling Spark on AWS EC2 - 11th Spark London meetup
 

More from Joel Pinho Lucas

Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...
Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...
Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...
Joel Pinho Lucas
 
Boas práticas de desenvolvimento para Jupyter Notebooks
Boas práticas de desenvolvimento para Jupyter NotebooksBoas práticas de desenvolvimento para Jupyter Notebooks
Boas práticas de desenvolvimento para Jupyter Notebooks
Joel Pinho Lucas
 
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibDiscovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Joel Pinho Lucas
 

More from Joel Pinho Lucas (7)

Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...
Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...
Engajando usuários nos portais de conteúdo digital da Globo através de Sistem...
 
Boas práticas de desenvolvimento para Jupyter Notebooks
Boas práticas de desenvolvimento para Jupyter NotebooksBoas práticas de desenvolvimento para Jupyter Notebooks
Boas práticas de desenvolvimento para Jupyter Notebooks
 
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlibDiscovering Lookalike audiences at scale for digital publishing with Spark MLlib
Discovering Lookalike audiences at scale for digital publishing with Spark MLlib
 
Casos de Uso de Big Data e Ciência de Dados no Mercado
 Casos de Uso de Big Data e Ciência de Dados no Mercado Casos de Uso de Big Data e Ciência de Dados no Mercado
Casos de Uso de Big Data e Ciência de Dados no Mercado
 
Utilizando Machine Learning e Java para classificar o conteúdo de páginas Web
Utilizando Machine Learning e Java para classificar o conteúdo de páginas WebUtilizando Machine Learning e Java para classificar o conteúdo de páginas Web
Utilizando Machine Learning e Java para classificar o conteúdo de páginas Web
 
Conceitos e práticas em Sistemas de Recomendação
Conceitos e práticas em Sistemas de RecomendaçãoConceitos e práticas em Sistemas de Recomendação
Conceitos e práticas em Sistemas de Recomendação
 
Bigdata gameverse
Bigdata gameverseBigdata gameverse
Bigdata gameverse
 

Recently uploaded

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 

Recently uploaded (20)

FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Building machine learning applications locally with spark