My 2017 objective: M.L.
● Motivation
○ It’s the new hot thing
○ AlphaGo beat Lee Sedol, March 2016
● Some background, but need to learn more
1. Choose the way
○ Coursera’s vs. books vs. workshops vs. posts
2. Find an excuse to apply it
○ @work is better than @home
Learning about Machine Learning
Customer clusters @work, aka “the excuse”
● There is a non-programmer Business Analysis Department
● Groups of customers based on periodicity + amount spent
○ Example: people that buy once per month, 100€ ticket
○ Useful for business reports
○ Not so useful for UX, CRM
● Groups by behavior? Clustering orders!
Boring!
1. With past data -> make a ML model
○ clean data
○ choose a ML algorithm/s
○ tune the algorithm, with testing
2. With new data -> use model to predict (or give new info)
○ deploy pipeline
○ update model
101 Machine Learning: the method
● Supervised
○ data + labels(result)
● Unsupervised
○ just data
● Reinforcement
○ function to optimize
101 Machine Learning: type of problems
● Data preparation
○ Keep same order of magnitude, usually [0,1]
○ Remove noise
○ Other processes
■ Binarize data, categorical features
● weekday, ex. 4 -> 0, 0, 0, 1, 0, 0, 0
■ Process missing data
Before algorithms: data!
● Explore the data
○ Images are richer than numbers
■ “We get more orders at 22h” vs.
● Ask domain experts
○ Understand normal & border cases
■ The step at 14h is the web cutoff time
Before algorithms: data!
● Explore and optimize the data
○ Features that count, feature engineering
○ Avoid the “curse of dimensionality”
● Start small, understandable, useful
● Find excuses to try it, and sell it!
Lessons learned
1. docker pull jupyter/scipy-notebook
2. git clone git@github.com:ulabox/datasets
3. git clone git@github.com:liopic/scbcn17-customer-segmentation
4. cp datasets/data/*.csv scbcn17-customer-segmentation/
5. cd scbcn17-customer-segmentation
6. ./jupyter.sh
7. Open the link in your browser and open the Workshop.ipynb file
Let’s hack