Building a Distributed Data Pipeline

Tom Lous
Tom LousFreelance Big Data & Machine Learning Software Engineer
BUILDING A
DISTRIBUTED
MACHINE LEARNING AT SCALE
BACKGROUND
DATA
▸Data is everywhere
▸Data, unapplied, is useless
▸How can we turn high volume & velocity data into value?
BACKGROUND
PIPELINE
▸Process the data continuously
▸Apply several processing steps
COLLECT MODEL DEPLOY INTEGRA
TE
SOLUTION
ANALYSE THE STOCK MARKET
YAHOO.C
OM
YAHOO.C
OM
(PREFETCHED)
COLLECTO
R
MESSAGE
BROKER
STREAMIN
G STORAGE
MODEL
MACHINE
LEARNING
MLlibWEBSERVI
CE
USER /
CLIENTS
DEMO
DEMO (FINGERS CROSSED)
DONE
QUESTIONS?
▸?
1 of 6

Recommended

Building a Data Ingestion & Processing Pipeline with Spark & Airflow by
Building a Data Ingestion & Processing Pipeline with Spark & AirflowBuilding a Data Ingestion & Processing Pipeline with Spark & Airflow
Building a Data Ingestion & Processing Pipeline with Spark & AirflowTom Lous
3K views18 slides
Hadoop summit-ams-2014-04-03 by
Hadoop summit-ams-2014-04-03Hadoop summit-ams-2014-04-03
Hadoop summit-ams-2014-04-03SDanzanvilliersCriteo
17.8K views26 slides
Turning Numbers into Knowledge: A Statistics Dashboard by
Turning Numbers into Knowledge: A Statistics DashboardTurning Numbers into Knowledge: A Statistics Dashboard
Turning Numbers into Knowledge: A Statistics DashboardWiLS
1K views17 slides
Workers and Worker Patterns at Scale by
Workers and Worker Patterns at ScaleWorkers and Worker Patterns at Scale
Workers and Worker Patterns at ScaleChad Arimura
1.4K views15 slides
Analyzing and processing FInancial Market Data on AWS with Kinesis - AWS Pop ... by
Analyzing and processing FInancial Market Data on AWS with Kinesis - AWS Pop ...Analyzing and processing FInancial Market Data on AWS with Kinesis - AWS Pop ...
Analyzing and processing FInancial Market Data on AWS with Kinesis - AWS Pop ...Florian Benz
80 views36 slides
ADLA Batch system by
ADLA Batch systemADLA Batch system
ADLA Batch systemXuân Thu Nguyễn
36 views18 slides

More Related Content

What's hot

MatlabTutorial by
MatlabTutorialMatlabTutorial
MatlabTutorialStephen Fox
112 views16 slides
Building Serverless Machine Learning Models in the Cloud [PyData DC] by
Building Serverless Machine Learning Models in the Cloud [PyData DC]Building Serverless Machine Learning Models in the Cloud [PyData DC]
Building Serverless Machine Learning Models in the Cloud [PyData DC]Alex Casalboni
645 views25 slides
Introducing Apache Airflow and how we are using it by
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using itBruno Faria
10.8K views18 slides
Android App Performance by
Android App PerformanceAndroid App Performance
Android App PerformanceAltaf ur Rehman
222 views14 slides
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli... by
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Flink Forward
699 views20 slides
kickstart your org into automation - ansible workshops by
kickstart your org into automation - ansible workshopskickstart your org into automation - ansible workshops
kickstart your org into automation - ansible workshopsIlkka Tengvall
291 views7 slides

What's hot(9)

Building Serverless Machine Learning Models in the Cloud [PyData DC] by Alex Casalboni
Building Serverless Machine Learning Models in the Cloud [PyData DC]Building Serverless Machine Learning Models in the Cloud [PyData DC]
Building Serverless Machine Learning Models in the Cloud [PyData DC]
Alex Casalboni645 views
Introducing Apache Airflow and how we are using it by Bruno Faria
Introducing Apache Airflow and how we are using itIntroducing Apache Airflow and how we are using it
Introducing Apache Airflow and how we are using it
Bruno Faria10.8K views
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli... by Flink Forward
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Streaming Event Time Partitioning with Apache Flink and Apache Iceberg - Juli...
Flink Forward699 views
kickstart your org into automation - ansible workshops by Ilkka Tengvall
kickstart your org into automation - ansible workshopskickstart your org into automation - ansible workshops
kickstart your org into automation - ansible workshops
Ilkka Tengvall291 views
Embracing Serverless with Google by Joseph Lust
Embracing Serverless with GoogleEmbracing Serverless with Google
Embracing Serverless with Google
Joseph Lust907 views
Quarterly Technology Briefing, Manchester, UK September 2013 by Thoughtworks
Quarterly Technology Briefing, Manchester, UK September 2013Quarterly Technology Briefing, Manchester, UK September 2013
Quarterly Technology Briefing, Manchester, UK September 2013
Thoughtworks2.8K views

Viewers also liked

Data Driven Action : A Primer on Data Science by
Data Driven Action : A Primer on Data ScienceData Driven Action : A Primer on Data Science
Data Driven Action : A Primer on Data ScienceSrivatsan Ramanujam
891 views103 slides
Transforming Data to Unlock Its Latent Value by
Transforming Data to Unlock Its Latent ValueTransforming Data to Unlock Its Latent Value
Transforming Data to Unlock Its Latent ValueTony Ojeda
543 views74 slides
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o... by
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...Ilkay Altintas, Ph.D.
552 views35 slides
Big datalab by
Big datalabBig datalab
Big datalabDavid Chen
1.3K views27 slides
Gartner Predictions for Hadoop by
Gartner Predictions for HadoopGartner Predictions for Hadoop
Gartner Predictions for HadoopBruno Aziza
3.5K views8 slides
Big Data Analytics Principles by
Big Data Analytics PrinciplesBig Data Analytics Principles
Big Data Analytics PrinciplesBruno Aziza
4.1K views32 slides

Viewers also liked(18)

Transforming Data to Unlock Its Latent Value by Tony Ojeda
Transforming Data to Unlock Its Latent ValueTransforming Data to Unlock Its Latent Value
Transforming Data to Unlock Its Latent Value
Tony Ojeda543 views
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o... by Ilkay Altintas, Ph.D.
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
A Workflow-Driven Discovery and Training Ecosystem for Distributed Analysis o...
Big datalab by David Chen
Big datalabBig datalab
Big datalab
David Chen1.3K views
Gartner Predictions for Hadoop by Bruno Aziza
Gartner Predictions for HadoopGartner Predictions for Hadoop
Gartner Predictions for Hadoop
Bruno Aziza3.5K views
Big Data Analytics Principles by Bruno Aziza
Big Data Analytics PrinciplesBig Data Analytics Principles
Big Data Analytics Principles
Bruno Aziza4.1K views
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal by Srivatsan Ramanujam
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Pivotal Data Labs - Technology and Tools in our Data Scientist's Arsenal
Srivatsan Ramanujam2.8K views
The Laws of Data Science Gravity by Bruno Aziza
The Laws of Data Science GravityThe Laws of Data Science Gravity
The Laws of Data Science Gravity
Bruno Aziza742 views
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services by Marlabs
Marlabs Capabilities Overview: DWBI, Analytics and Big Data ServicesMarlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs834 views
Big Data for the CMO by Bruno Aziza
Big Data for the CMOBig Data for the CMO
Big Data for the CMO
Bruno Aziza10.5K views
Googling the Error Message by Tom Lous
Googling the Error MessageGoogling the Error Message
Googling the Error Message
Tom Lous2.7K views
Process Mining based on the Internet of Events by Rising Media Ltd.
Process Mining based on the Internet of EventsProcess Mining based on the Internet of Events
Process Mining based on the Internet of Events
Rising Media Ltd.2.4K views
Apache Flink: Real-World Use Cases for Streaming Analytics by Slim Baltagi
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
Slim Baltagi12.5K views
Predictive Analytics: Context and Use Cases by Kimberley Mitchell
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
Kimberley Mitchell19.2K views
Predictive Analytics - An Overview by MachinePulse
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
MachinePulse32.5K views

Recently uploaded

JioEngage_Presentation.pptx by
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptxadmin125455
9 views4 slides
The Path to DevOps by
The Path to DevOpsThe Path to DevOps
The Path to DevOpsJohn Valentino
6 views6 slides
nintendo_64.pptx by
nintendo_64.pptxnintendo_64.pptx
nintendo_64.pptxpaiga02016
7 views7 slides
predicting-m3-devopsconMunich-2023.pptx by
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptxTier1 app
10 views24 slides
tecnologia18.docx by
tecnologia18.docxtecnologia18.docx
tecnologia18.docxnosi6702
6 views5 slides
Bootstrapping vs Venture Capital.pptx by
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptxZeljko Svedic
16 views17 slides

Recently uploaded(20)

JioEngage_Presentation.pptx by admin125455
JioEngage_Presentation.pptxJioEngage_Presentation.pptx
JioEngage_Presentation.pptx
admin1254559 views
predicting-m3-devopsconMunich-2023.pptx by Tier1 app
predicting-m3-devopsconMunich-2023.pptxpredicting-m3-devopsconMunich-2023.pptx
predicting-m3-devopsconMunich-2023.pptx
Tier1 app10 views
tecnologia18.docx by nosi6702
tecnologia18.docxtecnologia18.docx
tecnologia18.docx
nosi67026 views
Bootstrapping vs Venture Capital.pptx by Zeljko Svedic
Bootstrapping vs Venture Capital.pptxBootstrapping vs Venture Capital.pptx
Bootstrapping vs Venture Capital.pptx
Zeljko Svedic16 views
How to build dyanmic dashboards and ensure they always work by Wiiisdom
How to build dyanmic dashboards and ensure they always workHow to build dyanmic dashboards and ensure they always work
How to build dyanmic dashboards and ensure they always work
Wiiisdom16 views
Transport Management System - Shipment & Container Tracking by Freightoscope
Transport Management System - Shipment & Container TrackingTransport Management System - Shipment & Container Tracking
Transport Management System - Shipment & Container Tracking
Freightoscope 6 views
Understanding HTML terminology by artembondar5
Understanding HTML terminologyUnderstanding HTML terminology
Understanding HTML terminology
artembondar58 views
Advanced API Mocking Techniques Using Wiremock by Dimpy Adhikary
Advanced API Mocking Techniques Using WiremockAdvanced API Mocking Techniques Using Wiremock
Advanced API Mocking Techniques Using Wiremock
Dimpy Adhikary5 views
Streamlining Your Business Operations with Enterprise Application Integration... by Flexsin
Streamlining Your Business Operations with Enterprise Application Integration...Streamlining Your Business Operations with Enterprise Application Integration...
Streamlining Your Business Operations with Enterprise Application Integration...
Flexsin 5 views
FOSSLight Community Day 2023-11-30 by Shane Coughlan
FOSSLight Community Day 2023-11-30FOSSLight Community Day 2023-11-30
FOSSLight Community Day 2023-11-30
Shane Coughlan8 views
Automated Testing of Microsoft Power BI Reports by RTTS
Automated Testing of Microsoft Power BI ReportsAutomated Testing of Microsoft Power BI Reports
Automated Testing of Microsoft Power BI Reports
RTTS11 views

Building a Distributed Data Pipeline