Machine Learning
Bayram Annakov, Empatika Open
Types of ML
Supervised
machine learning
Supervised learning
x
o
x
xx
x
x
o
o
o
o
o
Classification
Unsupervised
learning
Unsupervised
learning
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2
Users clustering
Process
Baby first steps
• GOAL: better purchase conversion from Trial Emails
• Knowledge: internal Empatika Open
• Whats next?
• Load new data & apply
First results - I’m genius!
KNN: 97% score on test set
… & first disappointment
Too many Negative - unbalanced dataset
Balanced: 64%
Better do something: email conversion 2 times less
Start from scratch
+
How:
1. Small dataset (own)
• Time
• Value
2. Balanced
3. Don’t hurry
lots of answers
Process
Data
Model 1 Model 2 Model N………
Results 1 Results 2 Results N
Reducing
features
size
Scaling
…
other data
stuffBest result
Train model
(parameters)
………
Results
Test dataset
New dataset
30% better
7% better
Email conversion 2 times better Vs. previous model
416 different inputs
Next level
• Features
• Volume
• Understand Model parameters
• Train model harder (24/7)
• Whole picture: not only 1 score, but Precision,
Recall, f1-score, etc.
Lessons & Knowledge source
• Think about features (valuable VS lots VS less) balance
• Models are sensitive to different data
• Model tuning is important, but long road
• Sources:
• O’REILLY: Introduction to Machine Learning with Python
• scikit-learn.org
• Github
Be patient
Process
Data collection & preparation
Modeling
Training
Evaluation
Data preparation!!!
Tasks
Images
classification
Rhythmic Gymnastics
Rhythmic Gymnastics
Approach
• Collect data

Simple iPhone app that helps draw and export
• Prepare data

Image = Grid. Each cell = 1 (black) or 0 (white)

Convert Grid to Line

Image = 000100011000011100011…
• Train + Analyze

Until satisfied with the score
Prepare data
import skimage

import numpy
Train and Analyze
1. K-neighbors 78%
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=1)
clf.fit(x_train, y_train)
clf.score(x_test, y_test)
Train and Analyze
2. K-neighbors + PCA 81%
from sklearn.decomposition import PCA
pca = PCA(n_components=40, whiten=True)
pca.fit(x_train)
x_train_pca = pca.transform(x_train)
x_test_pca = pca.transform(x_test)
//repeat KNN
May be someone has
already solved it?
MNIST
http://yann.lecun.com/exdb/mnist/
3. SVM 90%
from sklearn import svm
classifier = svm.SVC(gamma=0.001)
classifier.fit(x_train, y_train)
predicted = classifier.predict(x_test)
Neural networks?
Neuron
Perceptron
Multi-layered
(deep)
Problems with
images
Too big vectors
(200x200x3 =
120,000)
Pixel position
matters
Convolution
Pooling (sub-
sampling)
CNN
Object recognition
ImageNet
Faces recognition
Eigenfaces
LFW
Recommendation
systems
NLP
Bag-of-words
Data is key
Competitive
Advantage?
Costs
Why?
CPU vs GPU
Opportunities
Better than
Google?
Attributes
Proprietary data sets
Domain-specific tasks
Domain-specific knowledge
So,
Useful links
“The Master Algorithm”
Andrew Ng “AI is new electricity”
fast.ai course
“Introduction to ML with Python”
“Python Machine Learning”
one more thing…
Please donate any sum to any fund
Plans
3 universities in Paris
Crowdfunding
Platform
Not only academics
New Tech
How you can help?
Finances
Introductions
Ideas
Expertise
Media
Tech
even frequent flyer miles :)
Thanks
Lucy Evstratova
+79165884397
Unicore.pro
AlfaBank
4154 8120 0093 9516

Sberbank
4276 3800 1234 3302
Ачворвоы выовпывп ывп ыврп

Machine Learning - Empatika Open