Upcoming SlideShare
×

# Introduction to Machine Learning (case studies)

1,502
-1

Published on

introduction for beginners

Published in: Education, Technology
6 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
1,502
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
0
0
Likes
6
Embeds 0
No embeds

No notes for slide

### Introduction to Machine Learning (case studies)

1. 1. Machine Learning and Data Mining: case studies 2013, April 02nd, 14:00 Dmitry Efimov http://mech.math.msu.su/~efimov/
2. 2. 3 Outline 1. Machine Learning problems 2. Methods: Regression, Distance, Probability 3. Case studies 4. How to solve problems?
3. 3. 4 How to teach computer to grade students essays? Essay grading
4. 4. 5 How to predict prices in the next year? Heavy Machines sales
5. 5. 6 How to predict molecule response for medicines? Molecule response
6. 6. 7 How to repair missed connections? How to give weights to connections? People relationships
7. 7. 8 What is Kaggle?
8. 8. 9 Definitions
9. 9. • Regression
10. 10. 11 • What about this case? • Or if there are many features? • Powerful method: Neural Networks But…
11. 11. Distance approach: SVM • 12Vapnik, 1995
12. 12. SVM (non-linear case) • 13Vapnik, 1995
13. 13. 14 Probability approach: decision trees
14. 14. Ensembling: Random Forests • Boosting = average of many simple algorithms • Simple algorithm = one decision tree • Boosting + decision trees = Random Forests 15Breiman, 2001
15. 15. 16 Case 1. Social ties strength
16. 16. • Organized by Panjia (www.panjiaco.com) • Problem: predict the strength of social ties • The prize pool: 75 000 \$ • Training set size: 50 000 • Test set size: 40 000 17 Description of problem
17. 17. • Number of features: more than 500! • Features example: 1) Number of friends (node feature) 2) Number of common friends (edge feature) 3) Number of common albums (combined Number of all albums feature) 18 Features engineering
18. 18. Stochastic gradient descent in decision trees (GBM) 19Ridgeway, 2007
19. 19. 20 Obtained accuracy
20. 20. 21 Case 2. Biological Response prediction
21. 21. Functional Ensembling • 22Efimov & Nikulin, 2012
22. 22. Functional Ensembling: Example • 23Efimov & Nikulin, 2012
23. 23. Functional Ensembling: Algorithm • 24
24. 24. Final ensembling • 25 min min 0.55 0.1 mean 0.9 0.75 max max
25. 25. Obtained accuracy 26 Winner result 0.37356 Our result 0.37363 Our best result 0.37093 0.3705 0.371 0.3715 0.372 0.3725 0.373 0.3735 0.374
26. 26. 27 How to solve problems?
27. 27. • Algorithm perfectly works on Training set • But! Algorithm does not work on Test set! 28 Overfitting
28. 28. • Target is unknown for the Test set • Separate Training set in two parts: • 1st part: New Training set • 2nd part: New Test set (with known target) 29 Crossvalidation
29. 29. If you are interested in this topic… • Read papers and books about Machine Learning • Communicate with people (Kaggle, LinkedIn) • Participate in competitions • Study Mathematics 30 What’s next?
30. 30. Thank you! Any questions? Dmitry Efimov defimov@aus.edu