Machine Learning
The Endless Quest for Accuracy
Photo by Caden Crawford on Flickr
Machine Learning in a Few Words
Machine Learning Algorithms
have limits, they’re good at
Interpolation not Extrapolation …
What does that mean?
Photo by Ricardo Gomez Angel on Unsplash
Learning how to drive in a calm city doesn’t
prepare you for …
Photo by Derek Liang on Unsplash
… that’s the next level … 
Forrester Predicts Investment In Artificial Intelligence Will Grow
300% in 2017
Accenture found that 85% of executives plan to
invest extensively in AI-related technology over
the next three years.
Photo by Dan Grinwis on Unsplash
Yet the hype is BIG!
Why?
Photo by Patrick Lindenberg on Unspla
The price of 1GB was 500,000$ in 1980
and 0,03$ in 2015
There exists a Huge Technology Ecosystem
Hadoop … Spark …
Photo by Christine Donaldson on Unspla
GPU are 8 to 10 times faster than CPU
for Neural Networks Processing
Photo by Maxime Rossignol on Unspla
Photo by Markus Spiske on Unspla
We now can quickly process vast amounts of data
This allows Reinforcement and Supervised
learning to achieve incredible accomplishments
Reinforcement Learning:
Learn by simulating your environnement
Supervised Learning:
Learn thanks to tagged historical data
Data
Engineering
Raw Data
Learning
Data
Test Data
Algorithm
Model Accuracy
How does it work?
Photo by Caroline Methot on Unspla
To optimise this process: Improve the Algorithms!
Improve the Algorithm ex: MNIST
1998 : Linear Classifier -> 12% error rate
2012 : committee of 35 conv. Net -> 0.23%
14 millions+ tagged Images the challenge is to
recognize the objects
2010 : 25% error rate
2015 : 5% error rate
Data
Engineering
Raw Data
Learning
Data
Test Data
Algorithm
Model Accuracy
To optimise this process : Improve the Data Engineering!
Photo by Patryk Grądys on Unspla
Corrupted
Data Cleansing
Inconsistent
Inaccurrate
Irrelevent
Dirty
Feature Engineering
1506335498
Monday 25 September 2017 10:31:38
This is a date in Unix time
It was a Monday Morning
In September
In 2017
Photo by Shane Rounce on Unspla
Photo by Annie Spratt on Unsplas
Data
Engineering
Raw Data
Learning
Data
Test Data
Algorithm
Model Accuracy
To optimise this process : Get More Data!
Photo by Dennis Kummer on Unspla
“Information is the oil of the 21st century, and
analytics is the combustion engine”
Mr. Sondergaard from Gartner in 2011
More Data = More Information = Better Accuracy
Demographic
Behavioural
Geographical
Weather
Photo by Ramón Salinero on Unspla
Open Data
Behavior change quickly how do we adapt?
Photo by Jeremy Perkins on Unspla
Refresh your Models as fast as possible
Collect your Data from every where
Use this Data to make better decisions
Collect the results to build better models
Build a Virtuous Cycle
Machine Learning gives Great Insights
but Machine Learning is only as good as your
Business Strategy
Photo by Samson Duborg-Rankin on Unspla
Example: If you Sell Products Online
Photo by Wojtek Witkowski on Unsplas
If you optimise your campain on the probability of click on your ads
If you choose to optimise the probability of selling procuct
You will naturaly build models that will avoid Bots
Your Models will be honey pots for Bots
Bots click but don’t buy
Photo by Johannes Plenio on Unspla
Let’s Wrap-up
You have Defeated the Algorithm Complexity
Your Data are Cleaner than Ever and Full of Business Insights
You have Gathered all the Data there is
Your models are Refreshed as Fast as Possible
Your Business Strategy Maximises your Model’s Insights and Vice Versa
Now you can … Do it again!
Welcome to the Endless Quest for Accuracy.
Photo by Paul Trienekens on Unspl

Machine learning accuracy

  • 1.
    Machine Learning The EndlessQuest for Accuracy Photo by Caden Crawford on Flickr
  • 3.
  • 4.
    Machine Learning Algorithms havelimits, they’re good at Interpolation not Extrapolation … What does that mean? Photo by Ricardo Gomez Angel on Unsplash
  • 5.
    Learning how todrive in a calm city doesn’t prepare you for … Photo by Derek Liang on Unsplash
  • 6.
    … that’s thenext level … 
  • 8.
    Forrester Predicts InvestmentIn Artificial Intelligence Will Grow 300% in 2017 Accenture found that 85% of executives plan to invest extensively in AI-related technology over the next three years. Photo by Dan Grinwis on Unsplash Yet the hype is BIG! Why?
  • 9.
    Photo by PatrickLindenberg on Unspla The price of 1GB was 500,000$ in 1980 and 0,03$ in 2015
  • 10.
    There exists aHuge Technology Ecosystem Hadoop … Spark … Photo by Christine Donaldson on Unspla
  • 11.
    GPU are 8to 10 times faster than CPU for Neural Networks Processing Photo by Maxime Rossignol on Unspla
  • 12.
    Photo by MarkusSpiske on Unspla We now can quickly process vast amounts of data This allows Reinforcement and Supervised learning to achieve incredible accomplishments
  • 13.
    Reinforcement Learning: Learn bysimulating your environnement
  • 14.
    Supervised Learning: Learn thanksto tagged historical data
  • 15.
    Data Engineering Raw Data Learning Data Test Data Algorithm ModelAccuracy How does it work? Photo by Caroline Methot on Unspla To optimise this process: Improve the Algorithms!
  • 16.
    Improve the Algorithmex: MNIST 1998 : Linear Classifier -> 12% error rate 2012 : committee of 35 conv. Net -> 0.23%
  • 17.
    14 millions+ taggedImages the challenge is to recognize the objects 2010 : 25% error rate 2015 : 5% error rate
  • 18.
    Data Engineering Raw Data Learning Data Test Data Algorithm ModelAccuracy To optimise this process : Improve the Data Engineering! Photo by Patryk Grądys on Unspla
  • 19.
  • 20.
    Feature Engineering 1506335498 Monday 25September 2017 10:31:38 This is a date in Unix time It was a Monday Morning In September In 2017 Photo by Shane Rounce on Unspla
  • 21.
    Photo by AnnieSpratt on Unsplas
  • 22.
    Data Engineering Raw Data Learning Data Test Data Algorithm ModelAccuracy To optimise this process : Get More Data! Photo by Dennis Kummer on Unspla
  • 23.
    “Information is theoil of the 21st century, and analytics is the combustion engine” Mr. Sondergaard from Gartner in 2011
  • 24.
    More Data =More Information = Better Accuracy Demographic Behavioural Geographical Weather Photo by Ramón Salinero on Unspla Open Data
  • 25.
    Behavior change quicklyhow do we adapt? Photo by Jeremy Perkins on Unspla Refresh your Models as fast as possible Collect your Data from every where Use this Data to make better decisions Collect the results to build better models Build a Virtuous Cycle
  • 26.
    Machine Learning givesGreat Insights but Machine Learning is only as good as your Business Strategy Photo by Samson Duborg-Rankin on Unspla
  • 27.
    Example: If youSell Products Online Photo by Wojtek Witkowski on Unsplas If you optimise your campain on the probability of click on your ads If you choose to optimise the probability of selling procuct You will naturaly build models that will avoid Bots Your Models will be honey pots for Bots Bots click but don’t buy
  • 28.
    Photo by JohannesPlenio on Unspla Let’s Wrap-up You have Defeated the Algorithm Complexity Your Data are Cleaner than Ever and Full of Business Insights You have Gathered all the Data there is Your models are Refreshed as Fast as Possible Your Business Strategy Maximises your Model’s Insights and Vice Versa
  • 29.
    Now you can… Do it again! Welcome to the Endless Quest for Accuracy. Photo by Paul Trienekens on Unspl

Editor's Notes

  • #2 https://www.flickr.com/photos/cadencrawford/8296313952 Caden Crawford
  • #3 This is the team of Fujitsu’s Center of Excellence in Artificial Intelligence, we are located inside Polytechnique’s Start-up Incubator I started to work 17 years ago … back in the days when Machine Learning was called Data-Mining I worked mostly in data related industries, KXEN and Criteo
  • #4 Machine Learning is a concept dating back to the 50s, it’s a field of computer science that gives computers the ability to learn without being explicitly programmed, they are learning from experience to accomplish a certain task.
  • #11 Those technologies run on commodity hardware, it’s cheaper and cheaper to store and process vaste amount of data
  • #15 Fraud detection data
  • #16 Let say that you want to predict the probability of an ad to be clicked on?
  • #17 The MNIST database of handwritten digits, has a training set of 60,000 examples, and a test set of 10,000 examples. 
  • #18 The ImageNet project is a large visual database designed for use in visual object recognition software research. In 2012, a deep convolutional neural net achieved 16%; in the next couple of years, error rates fell to a few percent Today, many consider ImageNet solved—the error rate is incredibly low at around 2%. But that’s for classification, or identifying which object is in an image. This doesn’t mean an algorithm knows the properties of that object, where it comes from, what it’s used for, who made it, or how it interacts with its surroundings.
  • #19 Let say that you want to predict the probability of an ad to clicked on?
  • #21 Feature engineering is the process of transforming raw data into features that better represent the underlying problem to the predictive models, resulting in improved model accuracy on unseen data. The Unix epoch (or Unix time or POSIX time or Unix timestamp) is the number of seconds that have elapsed since January 1, 1970 (midnight UTC/GMT)
  • #23 Let say that you want to predict the probability of an ad to clicked on?
  • #25 30% of businesses are Weather dependent