Machine Learning Lecture

575 views

Published on

A lecture I gave for CSE/EE599 on the basics of machine learning and different toolkits.

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
575
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Machine Learning Lecture

  1. 1. Machine LearningRoughly speaking, for a given learning task, with a given finite amount of training data, thebest generalization performance will be achieved if the right balance is struck between theaccuracy attained on that particular training set, and the “capacity” of the machine, that is, theability of the machine to learn any training set without error. A machine with too much capacityis like a botanist with a photographic memory who, when presented with a new tree,concludes that it is not a tree because it has a different number of leaves from anything shehas seen before; a machine with too little capacity is like the botanist’s lazy brother, whodeclares that if it’s green, it’s a tree. Neither can generalize well. The exploration andformalization of these concepts has resulted in one of the shining peaks of the theory ofstatistical learning.(Vapnik, 1979)
  2. 2. What is machine learning? Data Model Output examples training Predictions ClassificationsWhy: Face Recognition? Clusters Ordinals
  3. 3. Categories of problemsBy output:Clustering Regression Prediction Classification Ordinal Reg.By input: Vector, X Time Series, x(t)
  4. 4. One size never fits all…• Improving an algorithm: – First option: better features • Visualize classes • Trends • Histograms WEKA or GGOBI – Next: make the algorithm smarter (more complicated) • Interaction of features • Better objective and training criteria
  5. 5. Categories of ML algorithms By training: Supervised (labeled) Unsupervised (unlabeled) By model: Non-parametric Kernel Parametric Raw data only methods Model parameters only 40 40 40 30 30 30 y=1 + 0.5t + 4t2 - t3 20 20 20output output 10 10 10 0 0 0 -10 -10 -10 -20 -20 -20 -4 -2 0 2 4 6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 input input
  6. 6. 40 0.2 30 0.15 20 output 10 0.1 0 0.05 -10 -20 0 -4 -2 0 2 4 6 0 50 100 150 200 250 input 40 40 40 30 30 30 20 20 20output output output 10 10 10 0 0 0 -10 -10 -10 -20 -20 -20 -4 -2 0 2 4 6 -4 -2 0 2 4 6 -4 -2 0 2 4 6 input input input
  7. 7. Training a ML algorithm • Choose data • Optimize model parameters according to: – Objective function Regression Classification40 10 Max Margin 130 Mean Square Error 8 2 620 410 2 0 0-10 -2 -2 0 2 4 6 8-20 -4 -2 0 2 4 6
  8. 8. Pitfalls of ML algorithms• Clean your features: – Training volume: more is better – Outliers: remove them! – Dynamic range: normalize it!• Generalization – Over fitting – Under fitting• Speed: parametric vs. non• What are you learning? …features, features, features…
  9. 9. outliers 40 40 30 30 20 20 output output 10 10 0 0 -10 -10 -20 -20 -4 -2 0 2 4 6 -4 -2 0 2 4 6 input input 50 40 30 20 Keep a “good” percentile range!output 10 0 5-95, 1-99: depends on your data -10 -20 -4 -2 0 2 4 6 input
  10. 10. Dynamic range 6 1.2 1 1 5 2 1 2 4 0.8 3 0.6 f2 f2 2 0.4 1 0.2 0 0 -1 -0.2 0 200 400 600 800 1000 0 0.2 0.4 0.6 0.8 1 f1 f1 400 6 1 1 350 5 2 2 300 4 250 3f2 f2 200 2 150 1 100 50 0 0 -1 0 200 400 600 800 1000 -2 0 2 4 6 8 f1 f1
  11. 11. Over fitting and comparing algorithms• Early stop• Regularization• Validation Sets
  12. 12. Under fittingCurse of dimensionality
  13. 13. Under fittingCurse of dimensionality
  14. 14. K-Means clustering •Planar decision boundaries, depending on space you are in… •Highly Efficient •Not always great (but usually pretty good) •Needs good starting criteria
  15. 15. K-Nearest Neighbor •Arbitrary decision boundaries •Not so efficient… •With enough data in each class… optimal •Easy to train, known as a lazy classifier
  16. 16. Mixture of Gaussians •Arbitrary decision boundaries with enough boundaries •Efficient, depending on number of models and Gaussians •Can represent more than just Gaussian distributions •Generative, sometimes tough to train up •Spurious singularities •Can get a distribution for a specific class and feature(s)… and get a Bayesian classifier
  17. 17. Components Analysis(principal or independent) •Reduces dimensionality •All other classifiers work in a rotated space •Remember Eigen-values and Vectors?
  18. 18. Trees Classifiers •Arbitrary Decision boundaries •Can be quite efficient (or not!) •Needs good criteria for splitting •Easy to visualize
  19. 19. Multi-Layer Perceptron •Arbitrary (but linear) Decision boundaries •Can be quite efficient (or not!) •What did it learn?
  20. 20. Support Vector Machines •Arbitrary Decision boundaries •Efficiency depends on support vector size and feature size
  21. 21. Hidden Markov Models •Arbitrary Decision boundaries •Efficiency depends on state space and number of models •Generalizes to incorporate features that change over time
  22. 22. More sophisticated approaches• Graphical models (like an HMM) – Bayesian network – Markov random fields• Boosting – Adaboost• Voting• Cascading• Stacking…

×