Introduction to Machine Learning (case studies)

3,783 views

Published on

introduction for beginners

Published in: Education, Technology
0 Comments
9 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,783
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
9
Embeds 0
No embeds

No notes for slide

Introduction to Machine Learning (case studies)

  1. 1. Machine Learning and Data Mining: case studies 2013, April 02nd, 14:00 Dmitry Efimov http://mech.math.msu.su/~efimov/
  2. 2. 3 Outline 1. Machine Learning problems 2. Methods: Regression, Distance, Probability 3. Case studies 4. How to solve problems?
  3. 3. 4 How to teach computer to grade students essays? Essay grading
  4. 4. 5 How to predict prices in the next year? Heavy Machines sales
  5. 5. 6 How to predict molecule response for medicines? Molecule response
  6. 6. 7 How to repair missed connections? How to give weights to connections? People relationships
  7. 7. 8 What is Kaggle?
  8. 8. 9 Definitions
  9. 9. • Regression
  10. 10. 11 • What about this case? • Or if there are many features? • Powerful method: Neural Networks But…
  11. 11. Distance approach: SVM • 12Vapnik, 1995
  12. 12. SVM (non-linear case) • 13Vapnik, 1995
  13. 13. 14 Probability approach: decision trees
  14. 14. Ensembling: Random Forests • Boosting = average of many simple algorithms • Simple algorithm = one decision tree • Boosting + decision trees = Random Forests 15Breiman, 2001
  15. 15. 16 Case 1. Social ties strength
  16. 16. • Organized by Panjia (www.panjiaco.com) • Problem: predict the strength of social ties • The prize pool: 75 000 $ • Training set size: 50 000 • Test set size: 40 000 17 Description of problem
  17. 17. • Number of features: more than 500! • Features example: 1) Number of friends (node feature) 2) Number of common friends (edge feature) 3) Number of common albums (combined Number of all albums feature) 18 Features engineering
  18. 18. Stochastic gradient descent in decision trees (GBM) 19Ridgeway, 2007
  19. 19. 20 Obtained accuracy
  20. 20. 21 Case 2. Biological Response prediction
  21. 21. Functional Ensembling • 22Efimov & Nikulin, 2012
  22. 22. Functional Ensembling: Example • 23Efimov & Nikulin, 2012
  23. 23. Functional Ensembling: Algorithm • 24
  24. 24. Final ensembling • 25 min min 0.55 0.1 mean 0.9 0.75 max max
  25. 25. Obtained accuracy 26 Winner result 0.37356 Our result 0.37363 Our best result 0.37093 0.3705 0.371 0.3715 0.372 0.3725 0.373 0.3735 0.374
  26. 26. 27 How to solve problems?
  27. 27. • Algorithm perfectly works on Training set • But! Algorithm does not work on Test set! 28 Overfitting
  28. 28. • Target is unknown for the Test set • Separate Training set in two parts: • 1st part: New Training set • 2nd part: New Test set (with known target) 29 Crossvalidation
  29. 29. If you are interested in this topic… • Read papers and books about Machine Learning • Communicate with people (Kaggle, LinkedIn) • Participate in competitions • Study Mathematics 30 What’s next?
  30. 30. Thank you! Any questions? Dmitry Efimov defimov@aus.edu

×