Successfully reported this slideshow.
Upcoming SlideShare
×

# A short introduction to statistical learning

775 views

Published on

Groupe de travail de l'Axe Apprentissage statistique et Processus
INRA, unité MIA-T
October 16th, 2014

Published in: Science
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### A short introduction to statistical learning

1. 1. A short introduction to statistical learning Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Axe “Apprentissage et Processus” October 15th, 2014 - Unité MIA-T, INRA, Toulouse Nathalie Villa-Vialaneix | Introduction to statistical learning 1/25
2. 2. Outline 1 Introduction Background and notations Underfitting / Overfitting Consistency 2 SVM Nathalie Villa-Vialaneix | Introduction to statistical learning 2/25
3. 3. Outline 1 Introduction Background and notations Underfitting / Overfitting Consistency 2 SVM Nathalie Villa-Vialaneix | Introduction to statistical learning 3/25
4. 4. Background Purpose: predict Y from X; Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
5. 5. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
6. 6. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
7. 7. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
8. 8. Background Purpose: predict Y from X; What we have: n observations of (X; Y), (x1; y1), . . . , (xn; yn); What we want: estimate unknown Y from new X: xn+1, . . . , xm. X can be: numeric variables; or factors; or a combination of numeric variables and factors. Y can be: a numeric variable (Y 2 R) ) (supervised) regression régression; a factor ) (supervised) classification discrimination. Nathalie Villa-Vialaneix | Introduction to statistical learning 4/25
9. 9. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
10. 10. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
11. 11. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
12. 12. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
13. 13. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
14. 14. Basics From (xi ; yi)i , definition of a machine, n s.t.: ^ynew = n(xnew): if Y is numeric, n is called a regression function fonction de classification; if Y is a factor, n is called a classifier classifieur; n is said to be trained or learned from the observations (xi ; yi)i . Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conflicting objectives!! Nathalie Villa-Vialaneix | Introduction to statistical learning 5/25
15. 15. Underfitting/Overfitting sous/sur - apprentissage Function x ! y to be estimated Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
16. 16. Underfitting/Overfitting sous/sur - apprentissage Observations we might have Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
17. 17. Underfitting/Overfitting sous/sur - apprentissage Observations we do have Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
18. 18. Underfitting/Overfitting sous/sur - apprentissage First estimation from the observations: underfitting Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
19. 19. Underfitting/Overfitting sous/sur - apprentissage Second estimation from the observations: accurate estimation Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
20. 20. Underfitting/Overfitting sous/sur - apprentissage Third estimation from the observations: overfitting Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
21. 21. Underfitting/Overfitting sous/sur - apprentissage Summary Nathalie Villa-Vialaneix | Introduction to statistical learning 6/25
22. 22. Errors training error (measures the accuracy to the observations) Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
23. 23. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n Nathalie Villa-Vialaneix | Introduction to statistical learning 7/25
24. 24. Errors training error (measures the accuracy to the observations) I if y is a factor: misclassification rate ]f^yi , yi ; i = 1; : : : ; ng n I if y is numeric: mean square error (MSE) 1 n Xn i=1 (^yi