SlideShare a Scribd company logo
GM/DM 1:1 1
IN FOUR SIMPLE STEPS,
ETL CLICKSTREAM TO
DATA PRODUCTS
(NO ENGINEER NEEDED!)
SENIOR DATA SCIENTIST|JOSH JANZEN
GM/DM 1:1 2
JOSH JANZEN
SENIOR DATA SCIENTIST
Degrees from:
Data Science Tools:
About:
Life Time champions a healthy
and happy life for its members
across 138 destinations in 38
major markets in the U.S. and
Canada
GM/DM 1:1
1. DATA FEED 2. EXPLORE 3. ETL/ML 4. DEPLOY
Ø FTP to S3 w/bucket
credentials
Ø Sample data and
explore
Ø Find columns of
interest
Ø ETL columns of
interest
Ø Apply ML
algorithms
Ø Create web APIs
with Azure ML
Ø Interactive Web
Apps
GM/DM 1:1 4
STEP
ØFTP to S3 w/bucket
credentials
ØStart off as batch (nightly)
1. DATA FEEDeffort
25%
50%
75%
100%
progress
GM/DM 1:1 5
STEP
ØSample data and explore
ØFind columns of interest
2. EXPLOREeffort
25%
50%
75%
100%
progress
GM/DM 1:1 6
STEP 2. EXPLOREeffort
25%
50%
75%
100%
progress
Func RemoveNullColumns:
for column in dataframe:
if column is null:
remove column
Int threshold = 2
Func RemoveLowVariationColumns:
for column in dataframe:
if count(distinct values) in column < threshold:
remove column
GM/DM 1:1 7
STEP 2. EXPLOREeffort
25%
50%
75%
100%
progress
GM/DM 1:1 8
STEP
ØETL columns of interest
ØApply ML algorithms
3. ETL/MLeffort
25%
50%
75%
100%
progress
GM/DM 1:1 9
STEP
Auto-scaling of Cluster Size
3. ETL/MLeffort
25%
50%
75%
100%
progress
GM/DM 1:1 10
STEP 3. ETL/MLeffort
25%
50%
75%
100%
progress
event_date_time user_id action page_name os
11/15/18 7:25AM u_345 Menu_click Home Android
11/15/18 7:26AM u_345 NULL ScheduleClass Android
Array files_etl_complete = [‘raw_clicks_12_01_18’,‘raw_clicks_12_02_18’ …]
Func DetectNewDataTMS:
for file in raw_clicks_bucket:
if file NOT EXISTS in files_etl_complete:
PeformETL(file)
files_etl_complete.append(file)
GM/DM 1:1 11
STEP 3. ETL/MLeffort
25%
50%
75%
100%
progress
Images may be subject to copyright
source: https://johnolamendy.wordpress.com/2015/10/14/collaborative-filtering-in-apache-spark/
GM/DM 1:1 12
STEP
ØCreate web APIs with Azure ML
ØInteractive Web Apps
4. DEPLOYeffort
25%
50%
75%
100%
progress
GM/DM 1:1 13
STEP 4. DEPLOYeffort
25%
50%
75%
100%
progress
Images may be subject to copyright
source: https://wikiazure.com/artificial-intelligence/predict-temperature-using-azure-machine-learning/
GM/DM 1:1 14
STEP 4. DEPLOYeffort
25%
50%
75%
100%
progress
GM/DM 1:1 15
TIPS/TRICKS
Images may be subject to copyright
source: https://gifer.com/en/7kRO

More Related Content

Similar to In Four Simple Steps, ETL Clickstream to Data Product APIs (no Engineer needed!)

How to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FMEHow to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FME
Safe Software
 
Using The Master Genealogist - Basics
Using The Master Genealogist - BasicsUsing The Master Genealogist - Basics
Using The Master Genealogist - Basics
Teresa Pask
 
NDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business NeedsNDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business Needs
Torben Hoffmann
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
DataconomyGmbH
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
Dataconomy Media
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
Erudite
 
Toad
ToadToad
Toad
Kai Liu
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Lucidworks
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
jade_22
 
Mastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native DatabasesMastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native Databases
Safe Software
 
Neethu_Abraham
Neethu_AbrahamNeethu_Abraham
Neethu_Abraham
Neethu Abraham
 
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Rodrigo Radtke de Souza
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
Sumo Logic
 
UEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The HoodUEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The Hood
Ivanti
 
70-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 201370-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 2013
Nikki0014
 
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsUnleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Safe Software
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
Databricks
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache Spark
Databricks
 
Ken Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FMEKen Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FME
GIM_nv
 
Magical Methods for Batch Data Processing
Magical Methods for Batch Data ProcessingMagical Methods for Batch Data Processing
Magical Methods for Batch Data Processing
Safe Software
 

Similar to In Four Simple Steps, ETL Clickstream to Data Product APIs (no Engineer needed!) (20)

How to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FMEHow to Efficiently Transform Non-Spatial Data using FME
How to Efficiently Transform Non-Spatial Data using FME
 
Using The Master Genealogist - Basics
Using The Master Genealogist - BasicsUsing The Master Genealogist - Basics
Using The Master Genealogist - Basics
 
NDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business NeedsNDC London 2014: Erlang Patterns Matching Business Needs
NDC London 2014: Erlang Patterns Matching Business Needs
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
Utilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword researchUtilizing the natural langauage toolkit for keyword research
Utilizing the natural langauage toolkit for keyword research
 
Toad
ToadToad
Toad
 
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
Learning to Rank in Solr: Presented by Michael Nilsson & Diego Ceccarelli, Bl...
 
Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01Kettleetltool 090522005630-phpapp01
Kettleetltool 090522005630-phpapp01
 
Mastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native DatabasesMastering Data Management: Leveraging FME for Cloud Native Databases
Mastering Data Management: Leveraging FME for Cloud Native Databases
 
Neethu_Abraham
Neethu_AbrahamNeethu_Abraham
Neethu_Abraham
 
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
Data Warehouse 2.0: Master Techniques for EPM Guys (Powered by ODI)
 
Welcome Webinar Slides
Welcome Webinar SlidesWelcome Webinar Slides
Welcome Webinar Slides
 
UEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The HoodUEMB270: Software Distribution Under The Hood
UEMB270: Software Distribution Under The Hood
 
70-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 201370-342 Advanced Solutions of Microsoft Exchange Server 2013
70-342 Advanced Solutions of Microsoft Exchange Server 2013
 
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration WorkflowsUnleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
Unleashing the Power of OpenAI GPT-3 in FME Data Integration Workflows
 
Apache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easierApache Spark Performance is too hard. Let's make it easier
Apache Spark Performance is too hard. Let's make it easier
 
Filtering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache SparkFiltering vs Enriching Data in Apache Spark
Filtering vs Enriching Data in Apache Spark
 
Ken Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FMEKen Bragg: Batch data processing in FME
Ken Bragg: Batch data processing in FME
 
Magical Methods for Batch Data Processing
Magical Methods for Batch Data ProcessingMagical Methods for Batch Data Processing
Magical Methods for Batch Data Processing
 

Recently uploaded

PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
Rebecca Bilbro
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
Timothy Spann
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
Vineet
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
perranet1
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
mukulupadhayay1
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
Vineet
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
dataschool1
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
Márton Kodok
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
keesa2
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
GeorgiiSteshenko
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
zsafxbf
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
ugydym
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
nhutnguyen355078
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
uevausa
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
osoyvvf
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
agdhot
 

Recently uploaded (20)

PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
PyData London 2024: Mistakes were made (Dr. Rebecca Bilbro)
 
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
06-20-2024-AI Camp Meetup-Unstructured Data and Vector Databases
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
Senior Engineering Sample EM DOE - Sheet1.pdf
Senior Engineering Sample EM DOE  - Sheet1.pdfSenior Engineering Sample EM DOE  - Sheet1.pdf
Senior Engineering Sample EM DOE - Sheet1.pdf
 
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdfreading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
reading_sample_sap_press_operational_data_provisioning_with_sap_bw4hana (1).pdf
 
Q4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slideQ4FY24 Investor-Presentation.pdf bank slide
Q4FY24 Investor-Presentation.pdf bank slide
 
Data Scientist Machine Learning Profiles .pdf
Data Scientist Machine Learning  Profiles .pdfData Scientist Machine Learning  Profiles .pdf
Data Scientist Machine Learning Profiles .pdf
 
A gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented GenerationA gentle exploration of Retrieval Augmented Generation
A gentle exploration of Retrieval Augmented Generation
 
Build applications with generative AI on Google Cloud
Build applications with generative AI on Google CloudBuild applications with generative AI on Google Cloud
Build applications with generative AI on Google Cloud
 
一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理一比一原版悉尼大学毕业证如何办理
一比一原版悉尼大学毕业证如何办理
 
Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)Telemetry Solution for Gaming (AWS Summit'24)
Telemetry Solution for Gaming (AWS Summit'24)
 
一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理一比一原版莱斯大学毕业证(rice毕业证)如何办理
一比一原版莱斯大学毕业证(rice毕业证)如何办理
 
一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理一比一原版南昆士兰大学毕业证如何办理
一比一原版南昆士兰大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdfOverview IFM June 2024 Consumer Confidence INDEX Report.pdf
Overview IFM June 2024 Consumer Confidence INDEX Report.pdf
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
一比一原版(uom毕业证书)曼彻斯特大学毕业证如何办理
 
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
一比一原版加拿大麦吉尔大学毕业证(mcgill毕业证书)如何办理
 

In Four Simple Steps, ETL Clickstream to Data Product APIs (no Engineer needed!)

  • 1. GM/DM 1:1 1 IN FOUR SIMPLE STEPS, ETL CLICKSTREAM TO DATA PRODUCTS (NO ENGINEER NEEDED!) SENIOR DATA SCIENTIST|JOSH JANZEN
  • 2. GM/DM 1:1 2 JOSH JANZEN SENIOR DATA SCIENTIST Degrees from: Data Science Tools: About: Life Time champions a healthy and happy life for its members across 138 destinations in 38 major markets in the U.S. and Canada
  • 3. GM/DM 1:1 1. DATA FEED 2. EXPLORE 3. ETL/ML 4. DEPLOY Ø FTP to S3 w/bucket credentials Ø Sample data and explore Ø Find columns of interest Ø ETL columns of interest Ø Apply ML algorithms Ø Create web APIs with Azure ML Ø Interactive Web Apps
  • 4. GM/DM 1:1 4 STEP ØFTP to S3 w/bucket credentials ØStart off as batch (nightly) 1. DATA FEEDeffort 25% 50% 75% 100% progress
  • 5. GM/DM 1:1 5 STEP ØSample data and explore ØFind columns of interest 2. EXPLOREeffort 25% 50% 75% 100% progress
  • 6. GM/DM 1:1 6 STEP 2. EXPLOREeffort 25% 50% 75% 100% progress Func RemoveNullColumns: for column in dataframe: if column is null: remove column Int threshold = 2 Func RemoveLowVariationColumns: for column in dataframe: if count(distinct values) in column < threshold: remove column
  • 7. GM/DM 1:1 7 STEP 2. EXPLOREeffort 25% 50% 75% 100% progress
  • 8. GM/DM 1:1 8 STEP ØETL columns of interest ØApply ML algorithms 3. ETL/MLeffort 25% 50% 75% 100% progress
  • 9. GM/DM 1:1 9 STEP Auto-scaling of Cluster Size 3. ETL/MLeffort 25% 50% 75% 100% progress
  • 10. GM/DM 1:1 10 STEP 3. ETL/MLeffort 25% 50% 75% 100% progress event_date_time user_id action page_name os 11/15/18 7:25AM u_345 Menu_click Home Android 11/15/18 7:26AM u_345 NULL ScheduleClass Android Array files_etl_complete = [‘raw_clicks_12_01_18’,‘raw_clicks_12_02_18’ …] Func DetectNewDataTMS: for file in raw_clicks_bucket: if file NOT EXISTS in files_etl_complete: PeformETL(file) files_etl_complete.append(file)
  • 11. GM/DM 1:1 11 STEP 3. ETL/MLeffort 25% 50% 75% 100% progress Images may be subject to copyright source: https://johnolamendy.wordpress.com/2015/10/14/collaborative-filtering-in-apache-spark/
  • 12. GM/DM 1:1 12 STEP ØCreate web APIs with Azure ML ØInteractive Web Apps 4. DEPLOYeffort 25% 50% 75% 100% progress
  • 13. GM/DM 1:1 13 STEP 4. DEPLOYeffort 25% 50% 75% 100% progress Images may be subject to copyright source: https://wikiazure.com/artificial-intelligence/predict-temperature-using-azure-machine-learning/
  • 14. GM/DM 1:1 14 STEP 4. DEPLOYeffort 25% 50% 75% 100% progress
  • 15. GM/DM 1:1 15 TIPS/TRICKS Images may be subject to copyright source: https://gifer.com/en/7kRO