[243] turning data into value

Ph.D in Computer Science at ENS Paris/INRIA
Postdoctoral Fellow at Carnegie Mellon University
>500 citations, Best Paper Award at 2009 CVPR Conference
NEC Labs (Bell Labs) in Cupertino (Silicon Valley)
Senior Researcher at Intel (3 pending patents)
- Developed ML algorithms for face recognition
Invited speaker to CMU, Samsung, Tokyo Univ, SNU, etc.
Co-Founder of Solidware
Olivier Duchenne
Co-founder | Chief Machine Learning Scientist
8 years experience in Machine learning, Computer Vision and Big Data

Guidelines for using Machine Learning on real data
Avoid Common Mistakes
Understand Better the Data
1.Big Enough Data?
2.Changing Data
Machine Learning and Data Science

From Computer Vision Experience
To Solving Companies issues:
Ex: car accident prediction (insurance),
default prediction (bank),
stock value prediction
Machine Learning and Data Science

Prediction Function
Predicted Target Value
ML Algorithms analyze
historical data
to detect patterns
PAST DATA
(Training Data Set)
Internal Data
Ex: Age, Gender
External Data
Ex: Web Crawl
Target Value
Machine-Learning based Predictive Modeling
Newly Incoming Data
Unknown
Target Value
Internal Data External Data

1. Prediction Function. Ex: a linear function, a neural net,…
2. The prediction function is parametrized. Ex: 𝐟 𝜶 𝐗 = 𝜶𝒊 𝑿𝒊𝒊
3. The goal is to find the best prediction function, i.e. the best
parameters.
4. We build an objective function, that represents how good a
prediction function is.
5. The objective function always has a data term. Ex: 𝐨𝐛𝐣 𝜶 =
𝒇 𝜶 𝑿 𝒔 − 𝒀 𝒔 𝟐
𝒔
6. The algorithm tries to find the best parameters, that optimizes this
objective function. Ex: closed form solution, stochastic gradient
descent, …
Basic Explanation of Machine Learning

History of Machine Learning for Computer Vision
Model-Driven Mixed Data-Driven
1970s
Hand-designed Model
1980s
Alignment
Method
2000s
Deformable
Model
2010s
Conv. Network
1990s
Grid Model

Why didn’t people use ML since the beginning?
General Assumptions for the reason
1.“Better Computer” available now
2.“Better Algorithms”
3.“Amount of Data”
“We create so much data that 90% of the data in the world today has
been created in the last two years alone”
- Petter Bae Brandtzæ g, SINTEF ICT

How much data did CV Researcher use?
Image source: http://www.vision.caltech.edu/ Image source: http://doi.ieeecomputersociety.org/
2004
Caltech 101
10K Images
2005-2010
Pascal VOC
2K  30K objects
2010-2015
Image Net
10M  15M images
http://www.image-net.org/

The answer is… “Amount of Data”
Image source - Smartdatacollective.com
• Most Advanced Machine
Learning cannot be applied if
there are not enough data
• Critical mass of data is
necessary to use, for example,
deep learning
• When the amount of data
increases, the machine
learning models and, therefore,
the prediction model becomes
more complex and better

With enough data, ANY algorithms okay?
Support vector machines Bayesian networks
Regression forestSparse dictionary learning
Artificial neural networksK-Nearest neighbors
Deep learning Boosting
Deep Learning Neural Networks Log. Regression
No, it depends on the company and the problem you are trying to solve
A B C

What Changed in Machine Learning Domain
From the Past to the Present:

Synonym: Over generalizing
That is like visiting a new place during one day, seeing a mountain fire.
And believing that there are fires everyday there.
Why do we need lots of data?
Overfitting
In real life, we do not have many chances of having
clean & BIG data

0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejeon Gwangju
Prob. To default
Prob. To default
… (many more cities)
An example: Overfitting due to lack of data
As there are many
categories,
some categories with small
data show outlier results

0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
Seoul Busan Daejon Kyangju
Prob. To default
Prob. To default
… (many more cities)
So, always use error bars

You want to detect an event which occur on average with probability: p=5%
Let’s say you have many cities with ~50 samples
On average, 1/13 will have this event 0 times.
Without proper handling, the extreme case, will be all wrong.
This kind of error can happen often

How to fight against overfitting
Data
More Samples
Less Variables
Artificial Data Extension
Algorithm
Simpler Objective Function
Regularization
Bagging
Modeling
Feature Engineering
Data Normalization

Data
In Computer Vision, it is possible to extend the data.
Ex: Hiring annotator, Amazon Mechanical Turk, Google Re-Captcha
Companies often they have a limited number of samples, and cannot extend it.
Ex: A Korean Bank that gives ~100K loans per year

1. Count only positives ( Detecting rare events require more data)
Ex: Image Detection. It’s easy to find an infinite number of negatives.
Often company want to detect rare events (few positives)
Ex: predicting car accident / ad clicks / defaults / online purchase
How to count your data?

2. Difficulty of the task
• Learning addition ( 𝒚 = 𝟏 ∗ 𝑿 𝟏 + 𝟏 ∗ 𝑿 𝟐 )
(Requires ~100 samples)
• Learning object recognition
( Requires ~10M samples)

3. Probabilistic event detection is harder.
What is in this image? Will this user click on a car advertisement?
Client #1: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Client #2: Male, 27y.o, lives in Seoul, Salary
man in the construction sector, already
previously clicked on a car advertisement
Yes
No

Algorithm
1. Many algorithms exist: GLM, Boosting, Lasso, Regression Forest, SVM,
Gaussian Process, Bayesian Networks, Deep Learning, …
2. The complexity of their prediction functions differ.
3. The more complex the prediction function is, the more it fits the data.
Purchase
Prob.
Age
Purchase
Prob.
Age
Purchase
Prob.
Age
Underfitting Overfitting
Algorithm

1. Less parameters  Less overfitting
2. More parameters  Less underfitting
3. Ex: Best of both worlds: Deep Conv Nets
Algorithm

Avoiding “Too Many Categories” problem
Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San

Busan
Seoul
Dae-
jeon
Dae
-gou
Po-
hang
In-
cheon
Soo-
won
Ul-
San
Grouping
Merging

0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
1 2 3 4 5 6
Prob. To default
Prob. To default log10(population)

Regularization
𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 + 𝜆Ω(𝜃)
𝑚𝑖𝑛 𝜃 𝑠 𝑙𝑜𝑠𝑠 𝑔𝑡 𝑠, 𝑓𝜃 𝑋𝑠 , s.t. Ω 𝜃 < 𝜆
Ω 𝜃 =
𝜃 2
𝜃 1

Data Normalization
Removing variance that has no impact on the target value  Help the ML system to focus on meaningful variance
Deep Face (Facebook 2014), DB size: 120M images

Bagging
1. Randomly modify slightly the training set.
2. Do the training
3. Repeat
4. Average all prediction functions

• Market changes
• Law/Regulation Changes
• Collected Data changes
• Client filtering / Marketing changes
 Data change through time
 Representation of data change
• Variable names change
• Category names change
Changing Data
• Cyclic Data Changes
 Seasonality
• Trending has to be handled separately
 Interpolation – Extrapolation

Why is time so different from other variables ?
Prob.
To buy
A
smartphone
Age
Prob.
To buy
A
smartphone
Time
?
?
Interpolation Extrapolation

Time is correlated with hidden variables
Cost for car
insurance
(one type of
insurance)
Time
New Law

Change causes can be unknown, but consistant
Cost for car
insurance
(one type of
insurance)
Time

Seasonality
Cost for car
insurance
(one type of
insurance)
Time

Changing Data Representation
• Collected Data changes
• Category splitting, merging
• Variable names change
• Category names change

Job Applications: contact@solidware.io
Visit our booth 
Thank you
Visit our website: solidware.io

[243] turning data into value

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to [243] turning data into value

Similar to [243] turning data into value (20)

More from NAVER D2

More from NAVER D2 (20)

Recently uploaded

Recently uploaded (20)

[243] turning data into value