SlideShare a Scribd company logo
1 of 29
Download to read offline
INDIAN INSTITUTE OF TECHNOLOGY ROORKEE
bala@cs.iitr.ac.in
https://faculty.iitr.ac.in/cs/bala/
CSN-382 (Lecture 10)
Dr. R. Balasubramanian
Professor
Department of Computer Science and Engineering
Mehta Family School of Data Science and Artificial Intelligence
Indian Institute of Technology Roorkee
Roorkee 247 667
Machine Learning
2
● Signal – all valid values for a variable (shows
between max and min values for x axis and y
axis). Represents a valid data.
● Noise – The spread of data points across the
best fit line. For a given value of x, there are
multiple values of y (some on line and some
around the line). This spread is due to random
factors.
● Signal to Noise Ratio – Variance of signal /
variance in noise.
● Greater the SNR the better the model will be.
X min X max
Signal
Y max
Y min
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
PCA (Signal to noise ratio)
3
● PCA can also be used to reduce
dimensions.
● Arrange all eigenvectors along with
corresponding eigenvalues in
descending order of eigenvalues.
● Plot a cumulative eigenvalue graph.
● Eigenvectors with insignificant
contribution to total eigenvalues
can be removed from analysis.
PCA for dimensionality reduction
4
Advantages Disadvantages
● Helps is reducing dimensions
● Correlated features are
removed
● Improves performance of an
algorithm
● Low noise sensitivity
● Assumes that feature set is correlated
● Sensitive to outliers
● High variance axis is treated as PC,
and low variance axes are treated as
noise
● Covariance matrix are difficult to be
evaluated in an accurate manner
Advantages and disadvantages
5
● Dimensionality reduction
● Improving signal to noise ratio
● Helps in removing correlation between variables
● To speed up the convergence of Neural networks
● Computer vision (Face recognition)
Applications of PCA
6
Feature Selection
► Instance based learning (kNN, last class)
 Not useful if the number of features is large.
► Feature Reduction
 Features contain information about the target.
► More features means better information or more information,
and better discriminative power or better classification power.
 But this may not be true always
7
Curse of Dimensionality
8
Curse of Dimensionality
► Irrelevant features
 In algorithm such as k nearest neighbor these irrelevant features
introduce noise and they fool the learning algorithm.
► Redundant features
 If you have a fixed number of training examples and redundant
features which do not contribute additional information they may
lead to degradation in performance of the learning algorithm.
► These irrelevant features and redundant features can confuse
learner, especially when you have limited training examples and
limited computational resources.
► Large number of features and limited training examples
 Overfitting
9
To overcome Curse of Dimensionality
► Feature Selection
► Feature Extraction
10
Feature Selection
► Given Set of initial features 𝐹 = {𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑛}
► we can find 𝐹′ = {𝑥1′, 𝑥2′, 𝑥3′, … , 𝑥𝑚′}⊂ 𝐹
► We want to find a subset 𝐹′ of those features in 𝐹 so that it
optimizes certain criteria.
► How feature selection is differing from feature extraction.
► Feature selection in problems like hyperspectral imaging.
► From 𝑛 features set, we can have 2𝑛 possible feature subsets.
 Optimized algorithm in polynomial time
 Heuristic
 Greedy algorithm
 Randomized algorithm
11
Feature Subset Evaluation
► Unsupervised (Filter method)
► Supervised (Wrapper method)
12
Feature Selection Steps
► Feature Selection is an optimization problem:
► Step 1: Search the space of possible feature subsets.
► Step 2: Pick the subset that is optimal or near optimal w.r.t
some optimal function.
13
Feature Selection Steps
► Search Strategies
 Optimum
 Heuristic
 Randomized
► Evaluation Methods
 Filter methods
 Wrapper methods
14
Evaluating Feature Subset
► Supervised (Wrapper method)
 Train using selected subset
 Estimate error on validation dataset
► Unsupervised (Filter method)
 Look at input only
 Select the subset that has most input
15
Evaluation Strategies
16
Two different frameworks of feature
selection
► Find uncorrelated features in the reduced features
► Heuristic algorithms
 Forward Selection Algorithm
 Backward Selection Algorithm
► Forward Selection Algorithm
 Start with empty feature set and then you add features one by one
► Backward Selection Algorithm
 In backward search you start with the full feature set. Then you try
removing features from the features that you have.
17
Feature Selection
► Univariate (looks at each feature independently of others)
 Pearson correlation coefficient
 F-Score
 Chi-Square
 Signal to noise ratio
► Rank features by importance
► Ranking cut-off determined by user
► Univariate methods measure some type of correlation between
two random variables.
► The label 𝑦𝑖 and a fixed feature, 𝑥𝑖𝑗 for fixed 𝑗
18
Pearson correlation coefficient
► Please refer lecture 4 slides
19
● Signal – all valid values for a variable (shows
between max and min values for x axis and y
axis). Represents a valid data.
● Noise – The spread of data points across the
best fit line. For a given value of x, there are
multiple values of y (some on line and some
around the line). This spread is due to random
factors.
● Signal to Noise Ratio – Variance of signal /
variance in noise.
● Greater the SNR the better the model will be.
X min X max
Signal
Y max
Y min
+
+
+
+
+
+
+
+
+
+ +
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ +
+
+
+ +
+
+
+
+
+
+
+
+
+
+
Signal to noise ratio
20
Multivariate Feature Selection
► Multivariate (consider all features simultaneously).
► Consider the vector 𝑤 for any linear classifier.
► Classification of point 𝑥 is given by 𝑤𝑇𝑥 + 𝑤0.
► Small entries of 𝑤 will have little effect on dot product and hence
those features are less relevant.
► For example if 𝑤 = (10, 0.01, −9) then features 0 and 2 are
contributing more to a dot product than feature 1.
 A ranking of features given by this 𝑤 are 0,2 and 1.
► The 𝑤 can be obtained any of linear classifiers.
21
Multivariate Feature Selection
► A variant of this approach is called recursive feature elimination
 Compute 𝑤 on all features
 Remove features with smallest 𝑤𝑖
 Precompute 𝑤 on reduced data
 Goto step 2 if stopping criteria doesn’t meet.
22
Linear Discriminant Analysis
23
● Linear Discriminant Analysis is a supervised learning algorithm for
classification.
● Similar to PCA, it can be used for dimensionality reduction, by
projecting the input data to a linear subspace consisting of the
directions which maximize the separation between classes.
● It is a linear transformation technique.
● It can be used as a pre-processing stage for pattern-classification.
● The purpose of LDA is to lower the dimension space with a good
separability between the classes.
● It assumes that the features are normally distributed.
Linear Discriminant Analysis
24
Objective of LDA
25
● Fisher’s LDA aims to maximise
equation (1), maximize the distance
between means and minimize the
variance within classes
● Equation-1 can be rewritten with
two new terms:
○ Between class matrix (SB)
○ Within class matrix (SW)
Here, W is a unit vector
onto which the data points
are to be projected.
Objective of LDA
26
● Upon differentiating the equation
(2) w.r.t W and equating with 0, we
get a generalized eigenvalue-
eigenvector problem
○ SBW = vSwW
○ Sw
-1SBW = vW
■ Where v = eigenvalue
■ W = eigenvector
Objective of LDA
27
LDA
Matrix
● SB represents how precisely the data is
scattered across the classes
● Goal is to maximize SB. i.e. the distance
between the two classes should be
higher
Between
Class
Matrix(SB)
Step:2
● SW captures how precisely the data is
scattered within the class
● Goal is to minimize SW. i.e. the distance
between the elements of the class
should be minimum
Within
Class
Matrix(SW)
LDA Matrix
28
Linear Discriminant Analysis - Procedure
29
Thank You!

More Related Content

Similar to Machine Learning Notes for beginners ,Step by step

Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMayuraD1
 
svm-proyekt.pptx
svm-proyekt.pptxsvm-proyekt.pptx
svm-proyekt.pptxElinEliyev
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdfBeyaNasr1
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedOmid Vahdaty
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier홍배 김
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkshesnasuneer
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxSivam Chinna
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical EquationsIRJET Journal
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxavinashBajpayee1
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxvipul6601
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...vijaym148
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 

Similar to Machine Learning Notes for beginners ,Step by step (20)

Machine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester ElectiveMachine learning Module-2, 6th Semester Elective
Machine learning Module-2, 6th Semester Elective
 
svm-proyekt.pptx
svm-proyekt.pptxsvm-proyekt.pptx
svm-proyekt.pptx
 
Ai saturdays presentation
Ai saturdays presentationAi saturdays presentation
Ai saturdays presentation
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Machine Learning.pdf
Machine Learning.pdfMachine Learning.pdf
Machine Learning.pdf
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Presentation on machine learning
Presentation on machine learningPresentation on machine learning
Presentation on machine learning
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkkOBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
OBJECTRECOGNITION1.pptxjjjkkkkjjjjkkkkkkk
 
Dimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptxDimensionality Reduction and feature extraction.pptx
Dimensionality Reduction and feature extraction.pptx
 
Machine learning
Machine learningMachine learning
Machine learning
 
Recognition of Handwritten Mathematical Equations
Recognition of  Handwritten Mathematical EquationsRecognition of  Handwritten Mathematical Equations
Recognition of Handwritten Mathematical Equations
 
Unit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptxUnit-1 Introduction and Mathematical Preliminaries.pptx
Unit-1 Introduction and Mathematical Preliminaries.pptx
 
Deep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptxDeep Learning Module 2A Training MLP.pptx
Deep Learning Module 2A Training MLP.pptx
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Optimization
OptimizationOptimization
Optimization
 

Recently uploaded

Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisBoston Institute of Analytics
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token PredictionNABLAS株式会社
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfgreat91
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024patrickdtherriault
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...BabaJohn3
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshareraiaryan448
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...Amil baba
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchersdarmandersingh4580
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Valters Lauzums
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...ssuserf63bd7
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksBoston Institute of Analytics
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunksgmuir1066
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeBoston Institute of Analytics
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样jk0tkvfv
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证dq9vz1isj
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...yulianti213969
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationmuqadasqasim10
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证a8om7o51
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsBrainSell Technologies
 

Recently uploaded (20)

Data Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster AnalysisData Analysis Project Presentation : NYC Shooting Cluster Analysis
Data Analysis Project Presentation : NYC Shooting Cluster Analysis
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
edited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdfedited gordis ebook sixth edition david d.pdf
edited gordis ebook sixth edition david d.pdf
 
Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024Northern New England Tableau User Group (TUG) May 2024
Northern New England Tableau User Group (TUG) May 2024
 
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...Genuine love spell caster )! ,+27834335081)   Ex lover back permanently in At...
Genuine love spell caster )! ,+27834335081) Ex lover back permanently in At...
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
NO1 Best Kala Jadu Expert Specialist In Germany Kala Jadu Expert Specialist I...
 
Bios of leading Astrologers & Researchers
Bios of leading Astrologers & ResearchersBios of leading Astrologers & Researchers
Bios of leading Astrologers & Researchers
 
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
Data Analytics for Digital Marketing Lecture for Advanced Digital & Social Me...
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
Data Visualization Exploring and Explaining with Data 1st Edition by Camm sol...
 
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor NetworksSensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
Sensing the Future: Anomaly Detection and Event Prediction in Sensor Networks
 
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam DunksNOAM AAUG Adobe Summit 2024: Summit Slam Dunks
NOAM AAUG Adobe Summit 2024: Summit Slam Dunks
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
如何办理(UCLA毕业证书)加州大学洛杉矶分校毕业证成绩单学位证留信学历认证原件一样
 
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
1:1原版定制伦敦政治经济学院毕业证(LSE毕业证)成绩单学位证书留信学历认证
 
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
obat aborsi Tarakan wa 081336238223 jual obat aborsi cytotec asli di Tarakan9...
 
What is Insertion Sort. Its basic information
What is Insertion Sort. Its basic informationWhat is Insertion Sort. Its basic information
What is Insertion Sort. Its basic information
 
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
如何办理加州大学伯克利分校毕业证(UCB毕业证)成绩单留信学历认证
 
How to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data AnalyticsHow to Transform Clinical Trial Management with Advanced Data Analytics
How to Transform Clinical Trial Management with Advanced Data Analytics
 

Machine Learning Notes for beginners ,Step by step

  • 1. INDIAN INSTITUTE OF TECHNOLOGY ROORKEE bala@cs.iitr.ac.in https://faculty.iitr.ac.in/cs/bala/ CSN-382 (Lecture 10) Dr. R. Balasubramanian Professor Department of Computer Science and Engineering Mehta Family School of Data Science and Artificial Intelligence Indian Institute of Technology Roorkee Roorkee 247 667 Machine Learning
  • 2. 2 ● Signal – all valid values for a variable (shows between max and min values for x axis and y axis). Represents a valid data. ● Noise – The spread of data points across the best fit line. For a given value of x, there are multiple values of y (some on line and some around the line). This spread is due to random factors. ● Signal to Noise Ratio – Variance of signal / variance in noise. ● Greater the SNR the better the model will be. X min X max Signal Y max Y min + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + PCA (Signal to noise ratio)
  • 3. 3 ● PCA can also be used to reduce dimensions. ● Arrange all eigenvectors along with corresponding eigenvalues in descending order of eigenvalues. ● Plot a cumulative eigenvalue graph. ● Eigenvectors with insignificant contribution to total eigenvalues can be removed from analysis. PCA for dimensionality reduction
  • 4. 4 Advantages Disadvantages ● Helps is reducing dimensions ● Correlated features are removed ● Improves performance of an algorithm ● Low noise sensitivity ● Assumes that feature set is correlated ● Sensitive to outliers ● High variance axis is treated as PC, and low variance axes are treated as noise ● Covariance matrix are difficult to be evaluated in an accurate manner Advantages and disadvantages
  • 5. 5 ● Dimensionality reduction ● Improving signal to noise ratio ● Helps in removing correlation between variables ● To speed up the convergence of Neural networks ● Computer vision (Face recognition) Applications of PCA
  • 6. 6 Feature Selection ► Instance based learning (kNN, last class)  Not useful if the number of features is large. ► Feature Reduction  Features contain information about the target. ► More features means better information or more information, and better discriminative power or better classification power.  But this may not be true always
  • 8. 8 Curse of Dimensionality ► Irrelevant features  In algorithm such as k nearest neighbor these irrelevant features introduce noise and they fool the learning algorithm. ► Redundant features  If you have a fixed number of training examples and redundant features which do not contribute additional information they may lead to degradation in performance of the learning algorithm. ► These irrelevant features and redundant features can confuse learner, especially when you have limited training examples and limited computational resources. ► Large number of features and limited training examples  Overfitting
  • 9. 9 To overcome Curse of Dimensionality ► Feature Selection ► Feature Extraction
  • 10. 10 Feature Selection ► Given Set of initial features 𝐹 = {𝑥1, 𝑥2, 𝑥3, … , 𝑥𝑛} ► we can find 𝐹′ = {𝑥1′, 𝑥2′, 𝑥3′, … , 𝑥𝑚′}⊂ 𝐹 ► We want to find a subset 𝐹′ of those features in 𝐹 so that it optimizes certain criteria. ► How feature selection is differing from feature extraction. ► Feature selection in problems like hyperspectral imaging. ► From 𝑛 features set, we can have 2𝑛 possible feature subsets.  Optimized algorithm in polynomial time  Heuristic  Greedy algorithm  Randomized algorithm
  • 11. 11 Feature Subset Evaluation ► Unsupervised (Filter method) ► Supervised (Wrapper method)
  • 12. 12 Feature Selection Steps ► Feature Selection is an optimization problem: ► Step 1: Search the space of possible feature subsets. ► Step 2: Pick the subset that is optimal or near optimal w.r.t some optimal function.
  • 13. 13 Feature Selection Steps ► Search Strategies  Optimum  Heuristic  Randomized ► Evaluation Methods  Filter methods  Wrapper methods
  • 14. 14 Evaluating Feature Subset ► Supervised (Wrapper method)  Train using selected subset  Estimate error on validation dataset ► Unsupervised (Filter method)  Look at input only  Select the subset that has most input
  • 16. 16 Two different frameworks of feature selection ► Find uncorrelated features in the reduced features ► Heuristic algorithms  Forward Selection Algorithm  Backward Selection Algorithm ► Forward Selection Algorithm  Start with empty feature set and then you add features one by one ► Backward Selection Algorithm  In backward search you start with the full feature set. Then you try removing features from the features that you have.
  • 17. 17 Feature Selection ► Univariate (looks at each feature independently of others)  Pearson correlation coefficient  F-Score  Chi-Square  Signal to noise ratio ► Rank features by importance ► Ranking cut-off determined by user ► Univariate methods measure some type of correlation between two random variables. ► The label 𝑦𝑖 and a fixed feature, 𝑥𝑖𝑗 for fixed 𝑗
  • 18. 18 Pearson correlation coefficient ► Please refer lecture 4 slides
  • 19. 19 ● Signal – all valid values for a variable (shows between max and min values for x axis and y axis). Represents a valid data. ● Noise – The spread of data points across the best fit line. For a given value of x, there are multiple values of y (some on line and some around the line). This spread is due to random factors. ● Signal to Noise Ratio – Variance of signal / variance in noise. ● Greater the SNR the better the model will be. X min X max Signal Y max Y min + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Signal to noise ratio
  • 20. 20 Multivariate Feature Selection ► Multivariate (consider all features simultaneously). ► Consider the vector 𝑤 for any linear classifier. ► Classification of point 𝑥 is given by 𝑤𝑇𝑥 + 𝑤0. ► Small entries of 𝑤 will have little effect on dot product and hence those features are less relevant. ► For example if 𝑤 = (10, 0.01, −9) then features 0 and 2 are contributing more to a dot product than feature 1.  A ranking of features given by this 𝑤 are 0,2 and 1. ► The 𝑤 can be obtained any of linear classifiers.
  • 21. 21 Multivariate Feature Selection ► A variant of this approach is called recursive feature elimination  Compute 𝑤 on all features  Remove features with smallest 𝑤𝑖  Precompute 𝑤 on reduced data  Goto step 2 if stopping criteria doesn’t meet.
  • 23. 23 ● Linear Discriminant Analysis is a supervised learning algorithm for classification. ● Similar to PCA, it can be used for dimensionality reduction, by projecting the input data to a linear subspace consisting of the directions which maximize the separation between classes. ● It is a linear transformation technique. ● It can be used as a pre-processing stage for pattern-classification. ● The purpose of LDA is to lower the dimension space with a good separability between the classes. ● It assumes that the features are normally distributed. Linear Discriminant Analysis
  • 25. 25 ● Fisher’s LDA aims to maximise equation (1), maximize the distance between means and minimize the variance within classes ● Equation-1 can be rewritten with two new terms: ○ Between class matrix (SB) ○ Within class matrix (SW) Here, W is a unit vector onto which the data points are to be projected. Objective of LDA
  • 26. 26 ● Upon differentiating the equation (2) w.r.t W and equating with 0, we get a generalized eigenvalue- eigenvector problem ○ SBW = vSwW ○ Sw -1SBW = vW ■ Where v = eigenvalue ■ W = eigenvector Objective of LDA
  • 27. 27 LDA Matrix ● SB represents how precisely the data is scattered across the classes ● Goal is to maximize SB. i.e. the distance between the two classes should be higher Between Class Matrix(SB) Step:2 ● SW captures how precisely the data is scattered within the class ● Goal is to minimize SW. i.e. the distance between the elements of the class should be minimum Within Class Matrix(SW) LDA Matrix