Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- Non-linear density estimation using... by Arthur Breitman 455 views
- Simulation of rare events and optim... by Arthur Breitman 1619 views
- Bhagwani by Mohit kumar Bhagwani 24 views
- Beads Blessings by daize03 300 views
- Indice sintetico de calidad educativa by Yeison Salcedo 33 views
- Phrasal verbs by oskitar123 232 views

1,869 views

Published on

Simple presentation explaining what machine learning is.

No Downloads

Total views

1,869

On SlideShare

0

From Embeds

0

Number of Embeds

11

Shares

0

Downloads

53

Comments

0

Likes

4

No embeds

No notes for slide

- 1. Machine LearningLW/OB presentation
- 2. Machine learning ( ML ) is a field concerned withstudying and developing algorithms that perform better at a task as they gain experience ( but mostly I wanted to use this cool picture )
- 3. WARNING This presentation is seriously lacking slides, preparation and cool running examples.That being said. I know what I’m talking about ;)
- 4. What ML is reallyabout…
- 5. What ML is reallyabout…• ML is about data, andmodeling its distribution
- 6. What ML is reallyabout…• ML is about data, andmodeling its distribution• ML is about a tradeoffbetween model accuracyand predictive power
- 7. What ML is reallyabout…• ML is about data, andmodeling its distribution• ML is about a tradeoffbetween model accuracyand predictive power• ML is about findingsimple yet expressiveclasses of distributions
- 8. What ML is reallyabout…• ML is about data, andmodeling its distribution• ML is about a tradeoffbetween model accuracyand predictive power• ML is about findingsimple yet expressiveclasses of distributions• ML is about usingapproximate numericalmethods to performBayesian update on thetraining data
- 9. ML = intersection of
- 10. Data sizes vary…
- 11. Data sizes vary…From a couple kilobytes
- 12. Data sizes varyFrom a couple kilobytes To petabytes
- 13. Type of problems solved• Supervised• Unsupervised• Reinforcement learning• ( transduction )
- 14. Type of problems solved• Supervised – Classification – Regression• Unsupervised• Reinforcement learning• ( transduction )
- 15. Type of problems solved• Supervised• Unsupervised - Clustering - Discovering causal links• Reinforcement learning• ( transduction )
- 16. Type of problems solved• Supervised• Unsupervised• Reinforcement learning – Learn to perform a task, only from final result• ( transduction ) – Not discussed, improve supervised learning with unsupervised samples
- 17. Typical applications• Image, speech, pattern recognition• Collaborative filtering• Time series forecasting• Game playing• Denoising• Any task where experience is valuable
- 18. Common ML techniques
- 19. Common ML techniques• Linearregression
- 20. Common ML techniques• Linearregression• Factor models
- 21. Common ML techniques• Linearregression• Factor models• Decision trees
- 22. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networks
- 23. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networksperceptron, multilayer perceptron with backpropagation,hebbian autoassociative memory, Boltzmann machine,spiking neurons…
- 24. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networks• SVM’s
- 25. Common ML techniques• Linearregression• Factor models• Decision trees• Neural networks• SVM’s• Bayesiannetworks, whitebox models…
- 26. Meta-Methods
- 27. Meta-Methods– Ensemble forecasting
- 28. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging
- 29. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting
- 30. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting– Inductive bias through
- 31. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting– Inductive bias through • Out of sample testing
- 32. Meta-Methods– Ensemble forecasting– Bootstrapping, Bagging, model averaging– Boosting– Inductive bias through • Out of sample testing • Minimum description length
- 33. Neural networks demystified
- 34. Neural networksdemystified• Perceptron ( 1957 )
- 35. Neural networksdemystified• Perceptron ( 1957 ) THIS IS…
- 36. Neural networksdemystified• Perceptron ( 1957 ) THIS IS… LINEAR ALGEBRA!
- 37. Neural networksdemystified• Perceptron• Linear separability
- 38. Neural networksdemystified• Perceptron• Linear separability8 binary inputs =>1/2212classifications linearlyseparable
- 39. Neural networksdemystified• Perceptron ( 1957 )• Linear separability• Multilayeredperceptron +backpropagation
- 40. Neural networksdemystified• Perceptron ( 1957 )• Linear separability• Multilayeredperceptron +backpropagation( 1969 ~ 1986 )
- 41. Neural networksdemystified• Perceptron ( 1957 )• Linear separability• Multilayeredperceptron +backpropagation• Smooth Interpolation
- 42. Many more types…
- 43. SVM in a nutshell
- 44. SVM in a nutshell• Maximize margin
- 45. SVM in a nutshell• Maximize margin• Embed in a high dimensional space
- 46. Ensemble learning• Combine predictions through voting ( with classifiers ) or regression to improve prediction
- 47. Ensemble learning• Combine predictions through voting ( with classifiers ) or regression to improve prediction• Train on random ( with replacement ) subsets of the data ( bootstrapping )
- 48. Ensemble learning• Combine predictions through voting ( with classifiers ) or regression to improve prediction• Train on random ( with replacement ) subsets of the data ( bootstrapping )• Or weight the data according to the quality of prediction, and train new weak classifiers accordingly ( boosting )
- 49. Numerical tricks
- 50. Numerical tricks• Optimization of fit with standard operational search techniques
- 51. Numerical tricks• Optimization of fit with standard operational search techniques• EM algorithm
- 52. Numerical tricks• Optimization of fit with standard operational search techniques• EM algorithm• MCMC methods ( Gibbs sampling, metropolis algorithm… )
- 53. A fundamental Bayesian model, the Hidden Markov Model
- 54. A fundamental Bayesian model, the Hidden Markov Model• Hidden states produce observed states
- 55. A fundamental Bayesian model, the Hidden Markov Model• Hidden states produce observed states• Billions of application – Finance – Speech recognition – Swype – Kinect – Open heart surgery – Airplane navigation
- 56. Questions I was asked• How does Boosting work ?• What is the No Free Lunch Theorem ?• Writing style recognition• Signature recognition• Rule extraction• Moving odds in response to informed gamblers• BellKor-Pragmatic Chaos and the Netflix prize
- 57. Writing style recognition
- 58. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )
- 59. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )• Clustering of HMM model parameters
- 60. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )• Clustering of HMM model parameters• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )
- 61. Writing style recognition• Naïve Bayes ( similar to spam filtering, bag of words approach )• Clustering of HMM model parameters• Simple statistics on text corpus ( sentence length distribution, word length distribution, density of punctuation )• Combine with a logistic regression
- 62. Signature recognition
- 63. Signature recognition• Depends if raster or vector
- 64. Signature recognition• Depends if raster or vector• Post office uses neural networks, but corpus is gigantic
- 65. Signature recognition• Depends if raster or vector• Post office uses neural networks, but corpus is gigantic• Dimensionality reduction is key
- 66. Signature recognition• Wavelet on raster image for feature extraction
- 67. Signature recognition• Depends if raster or vector• Post office uses neural networks, but corpus is gigantic• Dimensionality reduction is key• Wavelet on raster image for feature extraction• Path following then learning on path features ( total variation, average curvature etc )
- 68. Rules extraction• Hard, hypothesis space not smooth• Decision tree regression• Genetic Programming ( Koza )
- 69. Netflix prize
- 70. Netflix prize• The base (cinematch) = latent semantic model• The defining characteristic of winners, ensemble prediction with neural networks to combine predictors• Best team were mergers of good teams
- 71. Latent semantic model• There is a set of K “features”. Each movie has a score in each feature, each user has a weight for each feature.• Features are latent, we only assume the value of K.• Equivalent to representing the rating matrix as a product of a score and preference matrix. SVD minimizes RMSE
- 72. Poker is hard…
- 73. Poker is hard…• Gigantic, yet not continuous state space• Dimensionality reduction isn’t easy• High variance• Possible to make parametric strategies and optimize with ML• Inputs such as pot odds trivial to compute
- 74. Uhuh, slides end here
- 75. Sort of… Questions ?

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment