Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Customer segmentation scbcn17

385 views

Published on

Workshop introduction. Software Craftsmanship Conference in Barcelona, October 2017.

Published in: Technology
  • Be the first to comment

Customer segmentation scbcn17

  1. 1. Customer segmentation an excuse to use Machine Learning ;-)
  2. 2. ● Julio Martinez ● Web developer since 2001 ● 2 years working at Ulabox ● Machine Learning hobbyist ● Find me: @liopic Who am I?
  3. 3. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ Preparing the workshop
  4. 4. My 2017 objective: M.L. ● Motivation ○ It’s the new hot thing ○ AlphaGo beat Lee Sedol, March 2016 ● Some background, but need to learn more
  5. 5. 1. Choose the way ○ Coursera’s vs. books vs. workshops vs. posts 2. Find an excuse to apply it ○ @work is better than @home Learning about Machine Learning
  6. 6. Customer clusters @work, aka “the excuse” ● There is a non-programmer Business Analysis Department ● Groups of customers based on periodicity + amount spent ○ Example: people that buy once per month, 100€ ticket ○ Useful for business reports ○ Not so useful for UX, CRM ● Groups by behavior? Clustering orders! Boring!
  7. 7. 1. With past data -> make a ML model ○ clean data ○ choose a ML algorithm/s ○ tune the algorithm, with testing 2. With new data -> use model to predict (or give new info) ○ deploy pipeline ○ update model 101 Machine Learning: the method
  8. 8. ● Supervised ○ data + labels(result) ● Unsupervised ○ just data ● Reinforcement ○ function to optimize 101 Machine Learning: type of problems
  9. 9. Supervised learning TRAINING SET cat cat person TEST SET ???
  10. 10. Unsupervised learning TRAINING SET TEST SET There is NO test
  11. 11. ● Try to extract features (information, shapes): similar and different ● Uses: ○ Clustering ○ Anomaly detection (it doesn’t look “normal”) ○ Dimensional reduction ○ Transfer features, projections ... Unsupervised learning
  12. 12. ● Use: ○ grouping ○ quantization ● Algorithms: ○ k-means ○ DBSCAN Clustering
  13. 13. ● need: how many clusters k-means
  14. 14. ● need: how many samples at minimum, tune other params DBSCAN: Density-based spatial clustering of applications with noise
  15. 15. So, ready to hack? But wait a moment!
  16. 16. ● Data preparation ○ Keep same order of magnitude, usually [0,1] ○ Remove noise ○ Other processes ■ Binarize data, categorical features ● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0 ■ Process missing data Before algorithms: data!
  17. 17. ● Explore the data ○ Images are richer than numbers ■ “We get more orders at 22h” vs. ● Ask domain experts ○ Understand normal & border cases ■ The step at 14h is the web cutoff time Before algorithms: data!
  18. 18. ● Explore and optimize the data ○ Features that count, feature engineering ○ Avoid the “curse of dimensionality” ● Start small, understandable, useful ● Find excuses to try it, and sell it! Lessons learned
  19. 19. Now, let’s hack!
  20. 20. 1. docker pull jupyter/scipy-notebook 2. git clone git@github.com:ulabox/datasets 3. git clone git@github.com:liopic/scbcn17-customer-segmentation 4. cp datasets/data/*.csv scbcn17-customer-segmentation/ 5. cd scbcn17-customer-segmentation 6. ./jupyter.sh 7. Open the link in your browser and open the Workshop.ipynb file Let’s hack
  21. 21. Thank you!

×