SlideShare a Scribd company logo
1 of 45
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010
Last Time Support Vector Machines Kernel Methods
Today Short recap of Kernel Methods Review of Supervised Learning Unsupervised Learning (Soft) K-means clustering Expectation Maximization Spectral Clustering Principle Components Analysis Latent Semantic Analysis
Kernel Methods Feature extraction to higher dimensional spaces. Kernels describe the relationship between vectors (points) rather than the new feature space directly.
When can we use kernels? Any time training and evaluation are both based on the dot product between two points. SVMs Perceptron k-nearest neighbors k-means etc.
Kernels in SVMs Optimize αi’s and bias w.r.t. kernel Decision function:
Kernels in Perceptrons Training Decision function
Good and Valid Kernels Good: Computing K(xi,xj) is cheaper than ϕ(xi) Valid:  Symmetric: K(xi,xj) =K(xj,xi)  Decomposable into ϕ(xi)Tϕ(xj) Positive Semi Definite Gram Matrix Popular Kernels Linear, Polynomial Radial Basis Function String (technically infinite dimensions) Graph
Supervised Learning Linear Regression Logistic Regression Graphical Models Hidden Markov Models Neural Networks Support Vector Machines Kernel Methods
Major concepts Gaussian, Multinomial, Bernoulli Distributions Joint vs. Conditional Distributions Marginalization Maximum Likelihood Risk Minimization Gradient Descent Feature Extraction, Kernel Methods
Some favorite distributions Bernoulli Multinomial Gaussian
Maximum Likelihood Identify the parameter values that yield the maximum likelihood of generating the observed data. Take the partial derivative of the likelihood function Set to zero Solve NB: maximum likelihood parameters are the same as maximum log likelihood parameters
Maximum Log Likelihood Why do we like the log function? It turns products (difficult to differentiate) and turns them into sums (easy to differentiate) log(xy) = log(x) + log(y) log(xc) = clog(x)
Risk Minimization Pick a loss function Squared loss Linear loss Perceptron (classification) loss Identify the parameters that minimize the loss function. Take the partial derivative of the loss function Set to zero Solve
Frequentistsv. Bayesians Point estimates vs. Posteriors Risk Minimization vs. Maximum Likelihood L2-Regularization	 Frequentists: Add a constraint on the size of the weight vector Bayesians: Introduce a zero-mean prior on the weight vector Result is the same!
L2-Regularization Frequentists: Introduce a cost on the size of the weights Bayesians: Introduce a prior on the weights
Types of Classifiers Generative Models Highest resource requirements.   Need to approximate the joint probability Discriminative Models Moderate resource requirements.  Typically fewer parameters to approximate than generative models Discriminant Functions Can be trained probabilistically, but the output does not include confidence information
Linear Regression Fit a line to a set of points
Linear Regression Extension to higher dimensions Polynomial fitting Arbitrary function fitting Wavelets Radial basis functions Classifier output
Logistic Regression Fit gaussians to data for each class The decision boundary is where the PDFs cross No “closed form” solution to the gradient. Gradient Descent
Graphical Models General way to describe the dependence relationships between variables. Junction Tree Algorithm allows us to efficiently calculate marginals over any variable.
Junction Tree Algorithm Moralization “Marry the parents” Make undirected Triangulation Remove cycles >4 Junction Tree Construction Identify separators such that the running intersection property holds Introduction of Evidence Pass slices around the junction tree to generate marginals
Hidden Markov Models Sequential Modeling Generative Model Relationship between observations and state (class) sequences
Perceptron Step function used for squashing. Classifier as Neuron metaphor.
Perceptron Loss Classification Error vs. Sigmoid Error Loss is only calculated on Mistakes Perceptrons use strictly classification error
Neural Networks Interconnected Layers of Perceptrons or Logistic Regression “neurons”
Neural Networks There are many possible configurations of neural networks Vary the number of layers Size of layers
Support Vector Machines Maximum Margin Classification Small Margin Large Margin
Support Vector Machines Optimization Function Decision Function
Visualization of Support Vectors 30
Questions? Now would be a good time to ask questions about Supervised Techniques.
Clustering Identify discrete groups of similar data points Data points are unlabeled
Recall K-Means Algorithm Select K – the desired number of clusters Initialize K cluster centroids For each point in the data set, assign it to the cluster with the closest centroid Update the centroid based on the points assigned to each cluster If any data point has changed clusters, repeat
k-means output
Soft K-means In k-means, we force every data point to exist in exactly one cluster. This constraint can be relaxed. Minimizes the entropy of cluster  assignment
Soft k-means example
Soft k-means We still define a cluster by a centroid, but we calculate the centroid as the weighted mean of all the data points Convergence is based on a stopping threshold rather than changed assignments
Gaussian Mixture Models Rather than identifying clusters by “nearest” centroids Fit a Set of k Gaussians to the data.
GMM example
Gaussian Mixture Models Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,
Graphical Modelswith unobserved variables What if you have variables in a Graphical model that are never observed? Latent Variables Training latent variable models is an unsupervised learning application uncomfortable amused laughing sweating
Latent Variable HMMs We can cluster sequences using an HMM with unobserved state variables  We will train the latent variable models using Expectation Maximization
Expectation Maximization Both the training of GMMs and Gaussian Models with latent variables are accomplished using Expectation Maximization Step 1: Expectation (E-step) Evaluate the “responsibilities” of each cluster with the current parameters Step 2: Maximization (M-step) Re-estimate parameters using the existing “responsibilities” Related to k-means
Questions	 One more time for questions on supervised learning…
Next Time Gaussian Mixture Models (GMMs) Expectation Maximization

More Related Content

What's hot

Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence butest
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means ClusteringAnna Fensel
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmEditor IJCATR
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering TutorialZitao Liu
 
Instance based learning
Instance based learningInstance based learning
Instance based learningswapnac12
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)Cory Cook
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.pptbutest
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsSangwoo Mo
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangChinmay Patel
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 

What's hot (20)

Artificial Intelligence
Artificial Intelligence Artificial Intelligence
Artificial Intelligence
 
K-means Clustering
K-means ClusteringK-means Clustering
K-means Clustering
 
Svm ms
Svm msSvm ms
Svm ms
 
New Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids AlgorithmNew Approach for K-mean and K-medoids Algorithm
New Approach for K-mean and K-medoids Algorithm
 
Spectral clustering Tutorial
Spectral clustering TutorialSpectral clustering Tutorial
Spectral clustering Tutorial
 
Instance based learning
Instance based learningInstance based learning
Instance based learning
 
Db Scan
Db ScanDb Scan
Db Scan
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
 
Kmeans
KmeansKmeans
Kmeans
 
Clustering: A Survey
Clustering: A SurveyClustering: A Survey
Clustering: A Survey
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
 
lecture_mooney.ppt
lecture_mooney.pptlecture_mooney.ppt
lecture_mooney.ppt
 
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelClustering Using Shared Reference Points Algorithm Based On a Sound Data Model
Clustering Using Shared Reference Points Algorithm Based On a Sound Data Model
 
Meta-Learning with Implicit Gradients
Meta-Learning with Implicit GradientsMeta-Learning with Implicit Gradients
Meta-Learning with Implicit Gradients
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Parallel kmeans clustering in Erlang
Parallel kmeans clustering in ErlangParallel kmeans clustering in Erlang
Parallel kmeans clustering in Erlang
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Birch
BirchBirch
Birch
 
Data clustering
Data clustering Data clustering
Data clustering
 

Viewers also liked

Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques butest
 
Clustering training
Clustering trainingClustering training
Clustering trainingGabor Veress
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centersAndres Mendez-Vazquez
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine LearningCodeForFrankfurt
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...butest
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibTaras Matyashovsky
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
Clustering
ClusteringClustering
ClusteringMeme Hei
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big DataMax Lin
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningTerry Taewoong Um
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
A very easy explanation to understanding machine learning (Supervised & Unsup...
A very easy explanation to understanding machine learning (Supervised & Unsup...A very easy explanation to understanding machine learning (Supervised & Unsup...
A very easy explanation to understanding machine learning (Supervised & Unsup...Ryo Onozuka
 

Viewers also liked (14)

Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques Part 2: Unsupervised Learning Machine Learning Techniques
Part 2: Unsupervised Learning Machine Learning Techniques
 
Clustering training
Clustering trainingClustering training
Clustering training
 
25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers25 Machine Learning Unsupervised Learaning K-means K-centers
25 Machine Learning Unsupervised Learaning K-means K-centers
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
CC282 Unsupervised Learning (Clustering) Lecture 7 slides for ...
 
Introduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlibIntroduction to ML with Apache Spark MLlib
Introduction to ML with Apache Spark MLlib
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
Clustering
ClusteringClustering
Clustering
 
Cluster Analysis
Cluster AnalysisCluster Analysis
Cluster Analysis
 
Machine Learning on Big Data
Machine Learning on Big DataMachine Learning on Big Data
Machine Learning on Big Data
 
Introduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep LearningIntroduction to Machine Learning and Deep Learning
Introduction to Machine Learning and Deep Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
A very easy explanation to understanding machine learning (Supervised & Unsup...
A very easy explanation to understanding machine learning (Supervised & Unsup...A very easy explanation to understanding machine learning (Supervised & Unsup...
A very easy explanation to understanding machine learning (Supervised & Unsup...
 

Similar to Lecture 17: Supervised Learning Recap

3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clusteringKrish_ver2
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptxssuser2023c6
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 butest
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411Clay Stanek
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...wolf
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptSubrata Kumer Paul
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.pptLPrashanthi
 

Similar to Lecture 17: Supervised Learning Recap (20)

3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx17- Kernels and Clustering.pptx
17- Kernels and Clustering.pptx
 
November, 2006 CCKM'06 1
November, 2006 CCKM'06 1 November, 2006 CCKM'06 1
November, 2006 CCKM'06 1
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
ProbabilisticModeling20080411
ProbabilisticModeling20080411ProbabilisticModeling20080411
ProbabilisticModeling20080411
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Fe...
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
 
multiarmed bandit.ppt
multiarmed bandit.pptmultiarmed bandit.ppt
multiarmed bandit.ppt
 
Dataa miining
Dataa miiningDataa miining
Dataa miining
 
CLUSTERING
CLUSTERINGCLUSTERING
CLUSTERING
 
10 clusbasic
10 clusbasic10 clusbasic
10 clusbasic
 
Clustering
ClusteringClustering
Clustering
 
Neural nw k means
Neural nw k meansNeural nw k means
Neural nw k means
 

More from butest

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 

More from butest (20)

1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 

Lecture 17: Supervised Learning Recap

  • 1. Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010
  • 2. Last Time Support Vector Machines Kernel Methods
  • 3. Today Short recap of Kernel Methods Review of Supervised Learning Unsupervised Learning (Soft) K-means clustering Expectation Maximization Spectral Clustering Principle Components Analysis Latent Semantic Analysis
  • 4. Kernel Methods Feature extraction to higher dimensional spaces. Kernels describe the relationship between vectors (points) rather than the new feature space directly.
  • 5. When can we use kernels? Any time training and evaluation are both based on the dot product between two points. SVMs Perceptron k-nearest neighbors k-means etc.
  • 6. Kernels in SVMs Optimize αi’s and bias w.r.t. kernel Decision function:
  • 7. Kernels in Perceptrons Training Decision function
  • 8. Good and Valid Kernels Good: Computing K(xi,xj) is cheaper than ϕ(xi) Valid: Symmetric: K(xi,xj) =K(xj,xi) Decomposable into ϕ(xi)Tϕ(xj) Positive Semi Definite Gram Matrix Popular Kernels Linear, Polynomial Radial Basis Function String (technically infinite dimensions) Graph
  • 9. Supervised Learning Linear Regression Logistic Regression Graphical Models Hidden Markov Models Neural Networks Support Vector Machines Kernel Methods
  • 10. Major concepts Gaussian, Multinomial, Bernoulli Distributions Joint vs. Conditional Distributions Marginalization Maximum Likelihood Risk Minimization Gradient Descent Feature Extraction, Kernel Methods
  • 11. Some favorite distributions Bernoulli Multinomial Gaussian
  • 12. Maximum Likelihood Identify the parameter values that yield the maximum likelihood of generating the observed data. Take the partial derivative of the likelihood function Set to zero Solve NB: maximum likelihood parameters are the same as maximum log likelihood parameters
  • 13. Maximum Log Likelihood Why do we like the log function? It turns products (difficult to differentiate) and turns them into sums (easy to differentiate) log(xy) = log(x) + log(y) log(xc) = clog(x)
  • 14. Risk Minimization Pick a loss function Squared loss Linear loss Perceptron (classification) loss Identify the parameters that minimize the loss function. Take the partial derivative of the loss function Set to zero Solve
  • 15. Frequentistsv. Bayesians Point estimates vs. Posteriors Risk Minimization vs. Maximum Likelihood L2-Regularization Frequentists: Add a constraint on the size of the weight vector Bayesians: Introduce a zero-mean prior on the weight vector Result is the same!
  • 16. L2-Regularization Frequentists: Introduce a cost on the size of the weights Bayesians: Introduce a prior on the weights
  • 17. Types of Classifiers Generative Models Highest resource requirements. Need to approximate the joint probability Discriminative Models Moderate resource requirements. Typically fewer parameters to approximate than generative models Discriminant Functions Can be trained probabilistically, but the output does not include confidence information
  • 18. Linear Regression Fit a line to a set of points
  • 19. Linear Regression Extension to higher dimensions Polynomial fitting Arbitrary function fitting Wavelets Radial basis functions Classifier output
  • 20. Logistic Regression Fit gaussians to data for each class The decision boundary is where the PDFs cross No “closed form” solution to the gradient. Gradient Descent
  • 21. Graphical Models General way to describe the dependence relationships between variables. Junction Tree Algorithm allows us to efficiently calculate marginals over any variable.
  • 22. Junction Tree Algorithm Moralization “Marry the parents” Make undirected Triangulation Remove cycles >4 Junction Tree Construction Identify separators such that the running intersection property holds Introduction of Evidence Pass slices around the junction tree to generate marginals
  • 23. Hidden Markov Models Sequential Modeling Generative Model Relationship between observations and state (class) sequences
  • 24. Perceptron Step function used for squashing. Classifier as Neuron metaphor.
  • 25. Perceptron Loss Classification Error vs. Sigmoid Error Loss is only calculated on Mistakes Perceptrons use strictly classification error
  • 26. Neural Networks Interconnected Layers of Perceptrons or Logistic Regression “neurons”
  • 27. Neural Networks There are many possible configurations of neural networks Vary the number of layers Size of layers
  • 28. Support Vector Machines Maximum Margin Classification Small Margin Large Margin
  • 29. Support Vector Machines Optimization Function Decision Function
  • 31. Questions? Now would be a good time to ask questions about Supervised Techniques.
  • 32. Clustering Identify discrete groups of similar data points Data points are unlabeled
  • 33. Recall K-Means Algorithm Select K – the desired number of clusters Initialize K cluster centroids For each point in the data set, assign it to the cluster with the closest centroid Update the centroid based on the points assigned to each cluster If any data point has changed clusters, repeat
  • 35. Soft K-means In k-means, we force every data point to exist in exactly one cluster. This constraint can be relaxed. Minimizes the entropy of cluster assignment
  • 37. Soft k-means We still define a cluster by a centroid, but we calculate the centroid as the weighted mean of all the data points Convergence is based on a stopping threshold rather than changed assignments
  • 38. Gaussian Mixture Models Rather than identifying clusters by “nearest” centroids Fit a Set of k Gaussians to the data.
  • 40. Gaussian Mixture Models Formally a Mixture Model is the weighted sum of a number of pdfs where the weights are determined by a distribution,
  • 41. Graphical Modelswith unobserved variables What if you have variables in a Graphical model that are never observed? Latent Variables Training latent variable models is an unsupervised learning application uncomfortable amused laughing sweating
  • 42. Latent Variable HMMs We can cluster sequences using an HMM with unobserved state variables We will train the latent variable models using Expectation Maximization
  • 43. Expectation Maximization Both the training of GMMs and Gaussian Models with latent variables are accomplished using Expectation Maximization Step 1: Expectation (E-step) Evaluate the “responsibilities” of each cluster with the current parameters Step 2: Maximization (M-step) Re-estimate parameters using the existing “responsibilities” Related to k-means
  • 44. Questions One more time for questions on supervised learning…
  • 45. Next Time Gaussian Mixture Models (GMMs) Expectation Maximization

Editor's Notes

  1. p(x) = pi_0f_0(x) + pi_1f_1(x) + pi_2f_2(x) + ldots + pi_kf_k(x)