Your SlideShare is downloading. ×
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Machine Learning with R
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Machine Learning with R

1,042

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,042
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
31
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Machine Learning with R Joshua Reich josh@i2pi.com April 2, 2009
  • 2. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions
  • 3. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions ML Alternatives • Matlab • Weka • Python • Stand alone (e.g. Vowpal Wabbit)
  • 4. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions ”The best thing about R is that it was developed by statisticians. The worst thing about R is that it was developed by statisticians.” –Bo Cowgill, Google (at SF R Meetup)
  • 5. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Why R? • Working with the CLI - iterative discovery • Integrated graphics • Community supported packages (CRAN) • ODBC Integration • You already use it
  • 6. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions How to find out about stuff? • ?function • help.search("search string") • RSiteSearch("search string") • http://rseek.org/ • names(object) or attributes(object) • > kmeans function (x, centers, iter.max = 10, nstart = 1, algorithm = c("Hartigan-Wong", "Lloyd", "Forgy", "MacQueen")) ...
  • 7. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions What is Machine Learning? Statistics Machine Learning Probability Model Learning Model Observations Observations Estimation Training MLE Optimization
  • 8. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Semantics v. Pragmatics • For most statistical models there are either closed form or quick numerical approximations for finding model properties - e.g., confidence intervals. Assuming you believe that your data generating process is accurately captured by your model, then you can make direct statements about unseen events. • Machine learning is a close cousin to non-parametric techniques and relies on training/testing/validation cycles, bootstrapping and cross-validation to determine measures of reliability. But invariably, simple models and a lot of data trump more elaborate models based on less data –Halevy, Norvig & Pereira.
  • 9. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Inductive Bias In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6. ”What are you doing?”, asked Minsky. ”I am training a randomly wired neural net to play Tic-tac-toe”, Sussman replied. ”Why is the net wired randomly?”, asked Minsky. ”I do not want it to have any preconceptions of how to play”, Sussman said. Minsky then shut his eyes.
  • 10. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Inductive Bias ”Why do you close your eyes?” Sussman asked his teacher. ”So that the room will be empty.”
  • 11. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions What is Machine Learning? • Regression vs. Classification Y ∈ Rp vs. Y ∈ {Y1 , Y2 , . . . , YN } • Supervised vs. Unsupervised Learning Y = f (X ) vs. X1 , X2 , . . . , XN
  • 12. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions General Supervised Learning Framework (Ch 7 of ElemStatLearn) • Training / Validation / Test • Variance - Bias Decomposition: Overfitting • Feature Selection / Regularization • Bootstrapping / Cross-Validation
  • 13. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions What we will walk through • K-Means clustering: kmeans() • K-Nearest Neighbours: knn() • Regression Trees: rpart() • Improving trees with PCA: princomp() • Linear Discriminant Analysis: lda() • Support Vector Machines: svm()
  • 14. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions The Problem A 15 A A A A A C AA AA C CC C C CCC CCC C C A A CC C C CCC CC C C C CC C CC C C CCCC C CCC C C C C A C A C CC CCCCCC CCCCCCCC AA AAAA CCCCC ACCCCCC CCCCCCCCC C C C C C 10 C CCC CC CC C C C C CC C C AA A AA CCCCCCCCCCCCCCCCCCCCCC C CC A A C CCCCCCCCCCCCCCCC CCC C C C CCCCCCCCCCC C CC C A AC C CCC CCCCCCCCCCCC CCCC C CC C CCCC C C C CCCCCCCC CCCC C C CC CCC CCCC CCCC C C A C CA A ACCC C CCCCC A AAC A A ACC CCCCC CCCCCCC C C C C C A CC CC CCC C CC C C C C C CC A A A C CCCCACACCCCCCCCCC C C CCCCC CC C C C C C A C A AAACCCACCCCCCCCCCCCCCC C A A CC CCC CCCCCCCC CC A A CCCCC CCCCCCCC CC A CC C C CC A A AAA ACCACCCCCCCCCCCCCCCCCCCCCC C C C CC C A A C A A AACCCCCCCCCCCCC CCCCCC C A A A AACCCC CACCCCCCCC CC C CC A C C C ACC C CCCCCCCC CC C CA CA CC C C CCCC C CC C AAACAACCCCCCC CCCCCC C C CC A C C C A A A AAAACAACAACCCACCCCCCCCCCCCC C A A A ACA A CACC C C CC C A A ACACCCCA CCACCCC CCCCC C C A C A AAAACAACCCCCCCC CCCC AAA CCCC C CC CC C AC CC C C C C CA A C C A AAAAC A A C AC AA ACA AA C A A AA CAAAA A AACCA AAC AAAAA C C AC A A AA A A A CC 5 A AA C A AAAAAAAA AA C A AA AA C A AAAAA AAAA A A C B A A AA A AAAA AAA A A AAAAAAAAAAAAAAAAAA A AAAA AAA AAAA A B BBBB BBBBBBB BBBBBBBBBBB BB AAAAAAA AAAAA A A AA A AAAAAAA A AAAAAAA A A AA B BBBBBBB BB B B BB A B B BBBBBBBBBBBB B BBBB B B B BBBBB B B AAAAAAAAAA A A BBB BBBBB BBB A A AAAAAAAAAAAAAAA A A B BBBBBBBBBBBBBBBB AAAAA AAAAAAAA B BB B y A AAA A A A A AAA A A A A AAA AAAA A A AA A AAAAAAAAAA AA A A AA AAAAAAAA AAA AAA AAAA A A AA A BBBBBB B B BBBBBBBBBBBBB B BBBBBBBBBB B BBBBBBBBBBBBBB BBBBBBB B B B BBBBBBBBBBBBB BB B B BBBBBBBBBBBBB BBBBBBBBBB BBBBBBB B A AAAAAAAAAAAAAA A BBBBBBBBBBBBBBBB B B BBBBBBBBBB BB BBB B BBBBBB B B BB B A A AA A A A AA A A A A AAAAAAAAA A BBBBB BBBBBBBB B BBB BB B B BB BBBBBB BBB BBB B B B BBBB BBBB B BBBBBBBBBBB BB B A A A AAAAAAAA A A A B BBBBBBBBBBBBB B B A A AA AA A B B BBBBBBBBB B B BB BBB B BB BBBBBBBB 0 AAA AAAAAAAAAAAAA BBB A A A AAAAA AA AA A A ABBBB BBBBBB BB B A AAA A AA A A A AA AA A A A AAAAAA AAAAAAA A AAAA AAA A A BB B BB BB BB B B B A AAAA AA AAAA AA AA A A A A A AA AA AAA A AAAAA A A BB B B B AAAAAA AAAA A AA A B A A AAAA AA A AA A A A A A AAAAAA AAA AAA A AA AA A A A A A AA A A A A AAAAAAA AA A A A A AAAAA A AAA A A AA AA AAAAA AAAAA A A A A AA AA AA A A A A −5 A A AAA AA A AAAAA AAAA AA A A AAA AAAA A AAA A A A A AAA A A A A AA A A AA A AA A A A A AA AA AAAAA A AA −10 A A A AA A A A −5 0 5 10 x
  • 15. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Machine Learning More • MachineLearning CRAN view http://cran.r-project.org/web/views/MachineLearning.html • The caret package is a good one. • Elements of Statistical Learning http://www-stat.stanford.edu/ tibs/ElemStatLearn/ • Machine Learning (Mitchell) http://www.cs.cmu.edu/ tom/mlbook.html • Video Lectures http://videolectures.net/
  • 16. Outline Why R? How to find out about stuff? What is Machine Learning? Show me the money Learn More Questions Questions?

×