SlideShare a Scribd company logo
1 of 27
Download to read offline
Live predictions with
schemaless data at scale
Ondrej Brichta
CONFIDENTIAL
WE ARE HIRING
Our problem at Exponea
What we have:
● Terabytes of customer data
● Means to process all the data fast
● Customers with business understanding (or our consultants)
What we needed:
● Probability classificator
○ For any chosen event or attribute
○ Able to process any given data and return a reasonable output without human interaction
Our problem at Exponea
One specific problem:
● How to choose which is the best banner to show right now?
● Web page personalization?
Solution?
In session prediction
● Predict probability of buying for each customer live when they are online
What steps do we need to do and automate?
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
1. Business understanding
Key questions:
● What problem do we have?
● What are our use cases?
● Whom are we creating this model for?
1. Business understanding
Key questions:
● What problem do we have? - We want to optimize our campaigns, ...
● What are our use cases? - Should I show this banner?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
1. Business understanding
Key questions:
● What problem do we have? - We want to optimize our campaigns, ...
● What are our use cases? - Should I show this banner? ...send this email?
○ Translate this use case into machine learning - predict whether customer will buy something
● Whom are we creating this model for? - NOT FOR US, our customers!
Hard or impossible to automate, but!
We can delegate this to customers or consultants as easier problems.
● Create few templates to choose from
○ At least few easy to use and few advanced to modify the model
2. Data understanding
● Closely related to business understanding
● Most important part of machine learning!
● Key questions
○ What data do i have?
○ Is the data valuable?
○ What do I want to predict?
○ Should I train on all data or only subset? <- crucial
● Changes here can influence outcome by the largest amount
2. Data understanding
Automation?
Yes, but difficult...
2. Data understanding
● Have pre sets for each template
○ We have in session predictions just for this problem
● Experiment with variables (date ranges, data filters,...)
○ We have variables set to best values for all projects. Can be manually adjusted by user
● Have a model providing unselected variables
3. Data preparation
● Cleaning the data - have some common structure for easy selection
● Derive attributes - have general easy aggregates (count, first, last,...)
● Dataset balancing - easy rules to determine which technique to use
● Aggregates - have a model which selects which complex aggregates to use
● Reducing dimensionality - feature selection
● Mapping data to “clean” values - most frequent items, vector indexer
● Casting data to vectors - vector assembler
3. Data preparation
3. Data preparation
● Customers connect from different platforms
● Customers log in after few events
3. Data preparation
3. Data preparation
4. Modeling
4. Modeling
5. Evaluation
● How good is my model?
● ROC, F1, Lift curve
5. Evaluation
6. Deployment
How do you want to use it? Must it be fast? Online or Offline?
6. Deployment
For us, Fast and Online!
6. Deployment
6. Deployment
Other deployments:
● Fast and Offline -> pre computed results for each customer
● Slow and Online -> improved results with slower algorithms
● Slow and Offline -> you messed up. Rework your solution.
App overview
Ask me anything
ondrej.brichta@exponea.com
CONFIDENTIAL
WE ARE HIRING

More Related Content

Similar to Live predictions with schemaless data at scale. MLMU Kosice, Exponea

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makerszekeLabs Technologies
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...Dataconomy Media
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engineSomya Anand
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemVMware Tanzu
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Neal Lathia
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016MLconf
 
Key Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMKey Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMProduct School
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Albert Y. C. Chen
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or realityAwantik Das
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolLouis Cialdella
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMProduct School
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the EnterpriseSrinath Perera
 
How to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMHow to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMProduct School
 
Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution ROIVENUE™
 
Panaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerPanaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerProduct School
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerProduct School
 

Similar to Live predictions with schemaless data at scale. MLMU Kosice, Exponea (20)

Moving from BI to AI : For decision makers
Moving from BI to AI : For decision makersMoving from BI to AI : For decision makers
Moving from BI to AI : For decision makers
 
"What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual..."What we learned from 5 years of building a data science software that actual...
"What we learned from 5 years of building a data science software that actual...
 
A journey of ai driven analytics insights engine
A journey of  ai driven analytics insights engineA journey of  ai driven analytics insights engine
A journey of ai driven analytics insights engine
 
Using Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation SystemUsing Data Science to Build an End-to-End Recommendation System
Using Data Science to Build an End-to-End Recommendation System
 
Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)Using machine learning for customer service (Data Talks Club)
Using machine learning for customer service (Data Talks Club)
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
 
Key Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PMKey Tactics for a Successful Product Launch by Kespry Senior PM
Key Tactics for a Successful Product Launch by Kespry Senior PM
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
DS Life Cycle
DS Life CycleDS Life Cycle
DS Life Cycle
 
Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0Making better use of Data and AI in Industry 4.0
Making better use of Data and AI in Industry 4.0
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Better Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product SchoolBetter Living Through Analytics - Louis Cialdella Product School
Better Living Through Analytics - Louis Cialdella Product School
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Data science Applications in the Enterprise
Data science Applications in the EnterpriseData science Applications in the Enterprise
Data science Applications in the Enterprise
 
How to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PMHow to Use Visual Graphics for User Stories by Sony Crackle PM
How to Use Visual Graphics for User Stories by Sony Crackle PM
 
Data Integration and Marketing Attribution
Data Integration and Marketing Attribution Data Integration and Marketing Attribution
Data Integration and Marketing Attribution
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
3 types of monitoring for 2020
3 types of monitoring for 20203 types of monitoring for 2020
3 types of monitoring for 2020
 
Panaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product ManagerPanaseer Sr. PM on How to Use Data as a Product Manager
Panaseer Sr. PM on How to Use Data as a Product Manager
 
How to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product ManagerHow to Use Data for Product Decisions by YouTube Product Manager
How to Use Data for Product Decisions by YouTube Product Manager
 

Recently uploaded

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookmanojkuma9823
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Bookvip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
vip Sarai Rohilla Call Girls 9999965857 Call or WhatsApp Now Book
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAmazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

Live predictions with schemaless data at scale. MLMU Kosice, Exponea

  • 1. Live predictions with schemaless data at scale Ondrej Brichta
  • 3. Our problem at Exponea What we have: ● Terabytes of customer data ● Means to process all the data fast ● Customers with business understanding (or our consultants) What we needed: ● Probability classificator ○ For any chosen event or attribute ○ Able to process any given data and return a reasonable output without human interaction
  • 4. Our problem at Exponea One specific problem: ● How to choose which is the best banner to show right now? ● Web page personalization? Solution? In session prediction ● Predict probability of buying for each customer live when they are online
  • 5. What steps do we need to do and automate? 1. Business understanding 2. Data understanding 3. Data preparation 4. Modeling 5. Evaluation 6. Deployment
  • 6. 1. Business understanding Key questions: ● What problem do we have? ● What are our use cases? ● Whom are we creating this model for?
  • 7. 1. Business understanding Key questions: ● What problem do we have? - We want to optimize our campaigns, ... ● What are our use cases? - Should I show this banner? ○ Translate this use case into machine learning - predict whether customer will buy something ● Whom are we creating this model for? - NOT FOR US, our customers!
  • 8. 1. Business understanding Key questions: ● What problem do we have? - We want to optimize our campaigns, ... ● What are our use cases? - Should I show this banner? ...send this email? ○ Translate this use case into machine learning - predict whether customer will buy something ● Whom are we creating this model for? - NOT FOR US, our customers! Hard or impossible to automate, but! We can delegate this to customers or consultants as easier problems. ● Create few templates to choose from ○ At least few easy to use and few advanced to modify the model
  • 9. 2. Data understanding ● Closely related to business understanding ● Most important part of machine learning! ● Key questions ○ What data do i have? ○ Is the data valuable? ○ What do I want to predict? ○ Should I train on all data or only subset? <- crucial ● Changes here can influence outcome by the largest amount
  • 11. 2. Data understanding ● Have pre sets for each template ○ We have in session predictions just for this problem ● Experiment with variables (date ranges, data filters,...) ○ We have variables set to best values for all projects. Can be manually adjusted by user ● Have a model providing unselected variables
  • 12. 3. Data preparation ● Cleaning the data - have some common structure for easy selection ● Derive attributes - have general easy aggregates (count, first, last,...) ● Dataset balancing - easy rules to determine which technique to use ● Aggregates - have a model which selects which complex aggregates to use ● Reducing dimensionality - feature selection ● Mapping data to “clean” values - most frequent items, vector indexer ● Casting data to vectors - vector assembler
  • 15. ● Customers connect from different platforms ● Customers log in after few events 3. Data preparation
  • 19. 5. Evaluation ● How good is my model? ● ROC, F1, Lift curve
  • 21. 6. Deployment How do you want to use it? Must it be fast? Online or Offline?
  • 22. 6. Deployment For us, Fast and Online!
  • 24. 6. Deployment Other deployments: ● Fast and Offline -> pre computed results for each customer ● Slow and Online -> improved results with slower algorithms ● Slow and Offline -> you messed up. Rework your solution.