SlideShare a Scribd company logo
Customer segmentation
an excuse to use Machine Learning ;-)
● Julio Martinez
● Web developer since 2001
● 2 years working at Ulabox
● Machine Learning hobbyist
● Find me: @liopic
Who am I?
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
Preparing the workshop
My 2017 objective: M.L.
● Motivation
○ It’s the new hot thing
○ AlphaGo beat Lee Sedol, March 2016
● Some background, but need to learn more
1. Choose the way
○ Coursera’s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
○ @work is better than @home
Learning about Machine Learning
Customer clusters @work, aka “the excuse”
● There is a non-programmer Business Analysis Department
● Groups of customers based on periodicity + amount spent
○ Example: people that buy once per month, 100€ ticket
○ Useful for business reports
○ Not so useful for UX, CRM
● Groups by behavior? Clustering orders!
Boring!
1. With past data -> make a ML model
○ clean data
○ choose a ML algorithm/s
○ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
○ deploy pipeline
○ update model
101 Machine Learning: the method
● Supervised
○ data + labels(result)
● Unsupervised
○ just data
● Reinforcement
○ function to optimize
101 Machine Learning: type of problems
Supervised learning
TRAINING SET
cat cat person
TEST SET
???
Unsupervised learning
TRAINING SET
TEST SET
There is NO test
● Try to extract features (information, shapes): similar and different
● Uses:
○ Clustering
○ Anomaly detection (it doesn’t look “normal”)
○ Dimensional reduction
○ Transfer features, projections ...
Unsupervised learning
● Use:
○ grouping
○ quantization
● Algorithms:
○ k-means
○ DBSCAN
Clustering
● need: how many clusters
k-means
● need: how many samples at minimum, tune other params
DBSCAN: Density-based spatial clustering of applications with noise
So, ready to hack?
But wait a moment!
● Data preparation
○ Keep same order of magnitude, usually [0,1]
○ Remove noise
○ Other processes
■ Binarize data, categorical features
● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
■ Process missing data
Before algorithms: data!
● Explore the data
○ Images are richer than numbers
■ “We get more orders at 22h” vs.
● Ask domain experts
○ Understand normal & border cases
■ The step at 14h is the web cutoff time
Before algorithms: data!
● Explore and optimize the data
○ Features that count, feature engineering
○ Avoid the “curse of dimensionality”
● Start small, understandable, useful
● Find excuses to try it, and sell it!
Lessons learned
Now, let’s hack!
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let’s hack
Thank you!

More Related Content

Similar to Customer segmentation scbcn17

Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Nikhil Dandekar
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
Daniel Marcous
 
Clean Code
Clean CodeClean Code
Clean Code
NeeleEilers
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Xavier Amatriain
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
NASIG
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
Marsan Ma
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
Product School
 
Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to Scrum
Bixlabs
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Aaron Saray
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project Presentation
Can Köklü
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
Raul Eulogio
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
MLconf
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
Xavier Amatriain
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
Xavier Amatriain
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
Umesh Prasad
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
freshdatabos
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
Owen Zhang
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering Primer
Georg Buske
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
Mihnea Giurgea
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
Mukesh Singh
 

Similar to Customer segmentation scbcn17 (20)

Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)Scaling Recommendations at Quora (RecSys talk 9/16/2016)
Scaling Recommendations at Quora (RecSys talk 9/16/2016)
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Clean Code
Clean CodeClean Code
Clean Code
 
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
Strata 2016 -  Lessons Learned from building real-life Machine Learning SystemsStrata 2016 -  Lessons Learned from building real-life Machine Learning Systems
Strata 2016 - Lessons Learned from building real-life Machine Learning Systems
 
Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments Analyzing workflows and improving communication across departments
Analyzing workflows and improving communication across departments
 
Embedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking systemEmbedded based retrieval in modern search ranking system
Embedded based retrieval in modern search ranking system
 
What Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PMWhat Are the Basics of Product Manager Interviews by Google PM
What Are the Basics of Product Manager Interviews by Google PM
 
Introduction to Scrum
Introduction to ScrumIntroduction to Scrum
Introduction to Scrum
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
Bimbo Final Project Presentation
Bimbo Final Project PresentationBimbo Final Project Presentation
Bimbo Final Project Presentation
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
Xavier Amatriain, VP of Engineering, Quora at MLconf SF - 11/13/15
 
10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems10 more lessons learned from building Machine Learning systems
10 more lessons learned from building Machine Learning systems
 
10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf10 more lessons learned from building Machine Learning systems - MLConf
10 more lessons learned from building Machine Learning systems - MLConf
 
Search@flipkart
Search@flipkartSearch@flipkart
Search@flipkart
 
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang)  - 2014 Boston Data FestivalWinning Data Science Competitions (Owen Zhang)  - 2014 Boston Data Festival
Winning Data Science Competitions (Owen Zhang) - 2014 Boston Data Festival
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
 
Software Engineering Primer
Software Engineering PrimerSoftware Engineering Primer
Software Engineering Primer
 
Big data @ uber vu (1)
Big data @ uber vu (1)Big data @ uber vu (1)
Big data @ uber vu (1)
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 

More from Julio Martinez

Buscando un trabajo en un pajar
Buscando un trabajo en un pajarBuscando un trabajo en un pajar
Buscando un trabajo en un pajar
Julio Martinez
 
Remote working effectively
Remote working effectivelyRemote working effectively
Remote working effectively
Julio Martinez
 
Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017
Julio Martinez
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
Julio Martinez
 
Some OOP paradigms & SOLID
Some OOP paradigms & SOLIDSome OOP paradigms & SOLID
Some OOP paradigms & SOLID
Julio Martinez
 
Introduction to Clean Code
Introduction to Clean CodeIntroduction to Clean Code
Introduction to Clean Code
Julio Martinez
 
Professional development
Professional developmentProfessional development
Professional development
Julio Martinez
 
Code metrics in PHP
Code metrics in PHPCode metrics in PHP
Code metrics in PHP
Julio Martinez
 

More from Julio Martinez (8)

Buscando un trabajo en un pajar
Buscando un trabajo en un pajarBuscando un trabajo en un pajar
Buscando un trabajo en un pajar
 
Remote working effectively
Remote working effectivelyRemote working effectively
Remote working effectively
 
Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017Conclusion of the Seminary UPC 2017
Conclusion of the Seminary UPC 2017
 
Introduction to Docker
Introduction to DockerIntroduction to Docker
Introduction to Docker
 
Some OOP paradigms & SOLID
Some OOP paradigms & SOLIDSome OOP paradigms & SOLID
Some OOP paradigms & SOLID
 
Introduction to Clean Code
Introduction to Clean CodeIntroduction to Clean Code
Introduction to Clean Code
 
Professional development
Professional developmentProfessional development
Professional development
 
Code metrics in PHP
Code metrics in PHPCode metrics in PHP
Code metrics in PHP
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
saastr
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStrDeep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
Deep Dive: Getting Funded with Jason Jason Lemkin Founder & CEO @ SaaStr
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 

Customer segmentation scbcn17

  • 1. Customer segmentation an excuse to use Machine Learning ;-)
  • 2.
  • 3. ● Julio Martinez ● Web developer since 2001 ● 2 years working at Ulabox ● Machine Learning hobbyist ● Find me: @liopic Who am I?
  • 4. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop
  • 5. My 2017 objective: M.L. ● Motivation ○ It’s the new hot thing ○ AlphaGo beat Lee Sedol, March 2016 ● Some background, but need to learn more
  • 6. 1. Choose the way ○ Coursera’s vs. books vs. workshops vs. posts 2. Find an excuse to apply it ○ @work is better than @home Learning about Machine Learning
  • 7. Customer clusters @work, aka “the excuse” ● There is a non-programmer Business Analysis Department ● Groups of customers based on periodicity + amount spent ○ Example: people that buy once per month, 100€ ticket ○ Useful for business reports ○ Not so useful for UX, CRM ● Groups by behavior? Clustering orders! Boring!
  • 8. 1. With past data -> make a ML model ○ clean data ○ choose a ML algorithm/s ○ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) ○ deploy pipeline ○ update model 101 Machine Learning: the method
  • 9. ● Supervised ○ data + labels(result) ● Unsupervised ○ just data ● Reinforcement ○ function to optimize 101 Machine Learning: type of problems
  • 10. Supervised learning TRAINING SET cat cat person TEST SET ???
  • 12. ● Try to extract features (information, shapes): similar and different ● Uses: ○ Clustering ○ Anomaly detection (it doesn’t look “normal”) ○ Dimensional reduction ○ Transfer features, projections ... Unsupervised learning
  • 13. ● Use: ○ grouping ○ quantization ● Algorithms: ○ k-means ○ DBSCAN Clustering
  • 14. ● need: how many clusters k-means
  • 15. ● need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise
  • 16. So, ready to hack? But wait a moment!
  • 17. ● Data preparation ○ Keep same order of magnitude, usually [0,1] ○ Remove noise ○ Other processes ■ Binarize data, categorical features ● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 ■ Process missing data Before algorithms: data!
  • 18. ● Explore the data ○ Images are richer than numbers ■ “We get more orders at 22h” vs. ● Ask domain experts ○ Understand normal & border cases ■ The step at 14h is the web cutoff time Before algorithms: data!
  • 19. ● Explore and optimize the data ○ Features that count, feature engineering ○ Avoid the “curse of dimensionality” ● Start small, understandable, useful ● Find excuses to try it, and sell it! Lessons learned
  • 21. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let’s hack