Machine Learning - Empatika Open

Machine Learning
Bayram Annakov, Empatika Open

Supervised learning
x
o
x
xx
x
x
o
o
o
o
o

Unsupervised
learning
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2

Baby ﬁrst steps
• GOAL: better purchase conversion from Trial Emails
• Knowledge: internal Empatika Open
• Whats next?
• Load new data & apply

First results - I’m genius!
KNN: 97% score on test set

… & ﬁrst disappointment
Too many Negative - unbalanced dataset
Balanced: 64%
Better do something: email conversion 2 times less

Start from scratch
+
How:
1. Small dataset (own)
• Time
• Value
2. Balanced
3. Don’t hurry
lots of answers

Process
Data
Model 1 Model 2 Model N………
Results 1 Results 2 Results N
Reducing
features
size
Scaling
…
other data
stuffBest result
Train model
(parameters)
………

Results
Test dataset
New dataset
30% better
7% better
Email conversion 2 times better Vs. previous model
416 different inputs

Next level
• Features
• Volume
• Understand Model parameters
• Train model harder (24/7)
• Whole picture: not only 1 score, but Precision,
Recall, f1-score, etc.

Lessons & Knowledge source
• Think about features (valuable VS lots VS less) balance
• Models are sensitive to different data
• Model tuning is important, but long road
• Sources:
• O’REILLY: Introduction to Machine Learning with Python
• scikit-learn.org
• Github

Process
Data collection & preparation
Modeling
Training
Evaluation

Approach
• Collect data 
Simple iPhone app that helps draw and export
• Prepare data 
Image = Grid. Each cell = 1 (black) or 0 (white) 
Convert Grid to Line 
Image = 000100011000011100011…
• Train + Analyze 
Until satisﬁed with the score

Prepare data
import skimage 
import numpy

Train and Analyze
1. K-neighbors 78%
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=1)
clf.fit(x_train, y_train)
clf.score(x_test, y_test)

Train and Analyze
2. K-neighbors + PCA 81%
from sklearn.decomposition import PCA
pca = PCA(n_components=40, whiten=True)
pca.fit(x_train)
x_train_pca = pca.transform(x_train)
x_test_pca = pca.transform(x_test)
//repeat KNN

May be someone has
already solved it?

MNIST
http://yann.lecun.com/exdb/mnist/

3. SVM 90%
from sklearn import svm
classifier = svm.SVC(gamma=0.001)
classifier.fit(x_train, y_train)
predicted = classifier.predict(x_test)

Problems with
images
Too big vectors
(200x200x3 =
120,000)
Pixel position
matters

Attributes
Proprietary data sets
Domain-specific tasks
Domain-specific knowledge

Useful links
“The Master Algorithm”
Andrew Ng “AI is new electricity”
fast.ai course
“Introduction to ML with Python”
“Python Machine Learning”

Please donate any sum to any fund

Plans
3 universities in Paris
Crowdfunding
Platform

How you can help?
Finances
Introductions
Ideas
Expertise
Media
Tech
even frequent flyer miles :)

Thanks
Lucy Evstratova
+79165884397
Unicore.pro
AlfaBank
4154 8120 0093 9516 
Sberbank
4276 3800 1234 3302

Ачворвоы выовпывп ывп ыврп

Machine Learning - Empatika Open

More Related Content

What's hot

Similar to Machine Learning - Empatika Open

More from Empatika

Recently uploaded

Machine Learning - Empatika Open