• Save
Introduction to Machine Learning (case studies)
Upcoming SlideShare
Loading in...5
×
 

Introduction to Machine Learning (case studies)

on

  • 657 views

introduction for beginners

introduction for beginners

Statistics

Views

Total Views
657
Views on SlideShare
657
Embed Views
0

Actions

Likes
1
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Introduction to Machine Learning (case studies) Introduction to Machine Learning (case studies) Presentation Transcript

  • Machine Learning and Data Mining: case studies 2013, April 02nd, 14:00 Dmitry Efimov http://mech.math.msu.su/~efimov/
  • 3 Outline 1. Machine Learning problems 2. Methods: Regression, Distance, Probability 3. Case studies 4. How to solve problems?
  • 4 How to teach computer to grade students essays? Essay grading
  • 5 How to predict prices in the next year? Heavy Machines sales
  • 6 How to predict molecule response for medicines? Molecule response
  • 7 How to repair missed connections? How to give weights to connections? People relationships
  • 8 What is Kaggle?
  • 9 Definitions
  • • Regression
  • 11 • What about this case? • Or if there are many features? • Powerful method: Neural Networks But…
  • Distance approach: SVM • 12Vapnik, 1995
  • SVM (non-linear case) • 13Vapnik, 1995
  • 14 Probability approach: decision trees
  • Ensembling: Random Forests • Boosting = average of many simple algorithms • Simple algorithm = one decision tree • Boosting + decision trees = Random Forests 15Breiman, 2001
  • 16 Case 1. Social ties strength
  • • Organized by Panjia (www.panjiaco.com) • Problem: predict the strength of social ties • The prize pool: 75 000 $ • Training set size: 50 000 • Test set size: 40 000 17 Description of problem
  • • Number of features: more than 500! • Features example: 1) Number of friends (node feature) 2) Number of common friends (edge feature) 3) Number of common albums (combined Number of all albums feature) 18 Features engineering
  • Stochastic gradient descent in decision trees (GBM) 19Ridgeway, 2007
  • 20 Obtained accuracy
  • 21 Case 2. Biological Response prediction
  • Functional Ensembling • 22Efimov & Nikulin, 2012
  • Functional Ensembling: Example • 23Efimov & Nikulin, 2012
  • Functional Ensembling: Algorithm • 24
  • Final ensembling • 25 min min 0.55 0.1 mean 0.9 0.75 max max
  • Obtained accuracy 26 Winner result 0.37356 Our result 0.37363 Our best result 0.37093 0.3705 0.371 0.3715 0.372 0.3725 0.373 0.3735 0.374
  • 27 How to solve problems?
  • • Algorithm perfectly works on Training set • But! Algorithm does not work on Test set! 28 Overfitting
  • • Target is unknown for the Test set • Separate Training set in two parts: • 1st part: New Training set • 2nd part: New Test set (with known target) 29 Crossvalidation
  • If you are interested in this topic… • Read papers and books about Machine Learning • Communicate with people (Kaggle, LinkedIn) • Participate in competitions • Study Mathematics 30 What’s next?
  • Thank you! Any questions? Dmitry Efimov defimov@aus.edu