An explanation of machine learning for business


Published on

Slides of the course on big data by Clement Levallois from EMLYON Business School.
For business students. Check the online video connected with these slides.
-> Machine learning explained in simple terms to a business audience: what is a training set, a test set, and how does machine learning differ from statistics.

Published in: Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

An explanation of machine learning for business

  1. 1. MK99 – Big Data 1 Big data & cross-platform analytics MOOC lectures Pr. Clement Levallois
  2. 2. MK99 – Big Data 2 A short note on machine learning for business
  3. 3. MK99 – Big Data 3 Machine Learning • Family of techniques to formulate predictions, based on data •Why is it called Machine learning? –Machine: it is about algorithms running on computers, not equations solved with pen and paper –Learning: the algorithms start with zero accuracy. Then, they get more accurate while being fed with data: the algorithm refines its parameters, it “learns”.
  4. 4. MK99 – Big Data 4 Typical set up 1.We start with a training set Data already collected: we know the actual values to be found Ex: a list of consumers, their characteristics and their associated credit score 2.The algorithms are trained on this set -> A series of algorithms run on the training set. Their parameters get adjusted so that the actual values get progressively predicted the most accurately possible. 3.A test set (“fresh data”) is brought -> List of consumer characteristics. Their credit score is known but hidden. 4.Running the trained algo on the test set -> Predict the credit score for each consumer in the test set, using the algorithms that were trained on phase 1 5.A measure of accuracy - Given the correct values to be predicted in the test set, how accurate were the algorithms? -> Where the credit scores accurately predicted? Actual values
  5. 5. MK99 – Big Data 5 Vocabulary •Data scientists “train” their model and then test it •They are concerned by “out-of-sample” prediction –The fact that their model predicts accurately data points in the training set (the “sample”) is trivial –This is the accuracy on the test set that matters! –This is called an “out-of-sample” prediction
  6. 6. MK99 – Big Data 6 Why is machine learning (ML) so different from statistics? •ML does not focus on causality – just prediction! –Note: for this reason, ML cannot predict the effect of intervention - it has no causal model. •ML has a special concern for out-of-sample prediction –Will be especially careful about over-fitting •ML picks its algorithms from diff academic disciplines –Text, network relations, clustering, not just traditional statistics •Coming from comput. sciences, ML has affinities with big data –Procedures optimized for speed and scale But the best data scientists often started as statisticians / econometricians: See Hal Varian: Chief Economist at Google
  7. 7. MK99 – Big Data 7 •Kaggle is a website hosting ML competitions, anybody can join •Goal: make the best prediction on a dataset, with cash prizes •From predicting clicks on ads to epileptic seizures •Always the same setup: a training set, a test set, a scoring based on accuracy.
  8. 8. MK99 – Big Data 8 This slide presentation is part of a course offered by EMLYON Business School ( Contact Clement Levallois (levallois [at] for more information.