SlideShare a Scribd company logo
1 of 75
Machine Learning
Unit III
Introduction to Statistical Learning Theory
By: Dr. Shweta Saraswat
Introduction to Statistical Learning Theory
• Feature Extraction
1.Principal component analysis
2.Singular value decomposition.
• Feature selection , Feature Ranking and subset selection
1.Filter,
2.Wrapper
3.Embedded methods,
• Evaluating Machine Learning algorithms and Model Selection.
Introduction to Statistical Learning Theory
• Statistical learning theory is a framework for machine
learning drawing from the fields of statistics and functional
analysis.
• Statistical learning theory deals with the statistical
inference problem of finding a predictive function based
on data.
• Statistical learning theory has led to successful
applications in fields such as computer vision, speech
recognition, and bioinformatics.
Feature Extraction
• Dimensionality Reduction
• Feature extraction :Definition
• Need of Feature Extraction
• Applications of Feature Extraction
• Basic Terminologies used in Feature
Extraction
• Characteristics of Feature Extraction
Dimensionality Reduction
6
Reducing the dimension of the feature space is called “dimensionality reduction.” There are many ways to
achieve dimensionality reduction, but most of these techniques fall into one of two classes:
 Feature Selection
 Feature Extraction
Feature extraction
7
Feature extraction is a process of dimensionality reduction by which an initial
set of raw data is reduced to more manageable groups for processing. A
characteristic of these large data sets is a large number of variables that require
a lot of computing resources to process.
Why is this Useful?
• The process of feature extraction is useful when you need
to reduce the number of resources needed for processing
without losing important or relevant information. Feature
extraction can also reduce the amount of redundant data
for a given analysis. Also, the reduction of the data and the
machine’s efforts in building variable combinations
(features) facilitate the speed of learning and
generalization steps in the machine learning process.
Practical Uses of Feature Extraction
• Autoencoders
• – The purpose of autoencoders is unsupervised learning of efficient data
coding. Feature extraction is used here to identify key features in the data
for coding by learning from the coding of the original data set to derive new
ones.
• Bag-of-Words
• – A technique for natural language processing that extracts the words
(features) used in a sentence, document, website, etc. and classifies them by
frequency of use. This technique can also be applied to image processing.
• Image Processing – Algorithms are used to detect features such as shaped,
edges, or motion in a digital image or video.
Basic Terminologies used in Feature Extraction
• Features are variables that can be defined and observed. For example, in a healthcare context, features may include gender,
height, weight, resting heart rate, or blood sugar levels.
• Feature engineering is the process of reworking a data set to improve the training of a machine learning model. By adding,
deleting, combining, or mutating the data within the data set, data scientists expertly tailor the training data to ensure the
resulting machine learning model will fulfill its business use case.
• Data scientists use feature engineering to prepare an input data set that’s best suited to support the intended business
purpose of the machine learning algorithm. For example, one method involves handling outliers. Since outliers fall so far out
of the expected range, they can negatively impact the accuracy of predictions. One common way of dealing with outliers is
trimming.
• Trimming simply removes the outlier values, ensuring they don’t contaminate the training data.
• Feature extraction is a subset of feature engineering. Data scientists turn to feature extraction when the data in its raw form
is unusable. Feature extraction transforms raw data into numerical features compatible with machine learning algorithms.
One common application is raw data in the form of image files—by extracting the shape of an object or the redness value in
images, data scientists can create new features suitable for machine learning applications.
• Feature selection is closely related. Where feature extraction and feature engineering involve creating new features, feature
selection is the process of choosing which features are most likely to enhance the quality of your prediction variable or
output. By only selecting the most relevant features, feature selection creates simpler, more easily understood machine
learning models.
Four ways feature extraction enables machine
learning models to better serve their intended
purpose
• Feature extraction improves the efficiency and accuracy of machine learning. Here :
1.Reduces redundant data
• Feature extraction cuts through the noise, removing redundant and unnecessary data. This frees
machine learning programs to focus on the most relevant data.
2.Improves model accuracy
• The most accurate machine learning models are those developed using only the data required to
train the model to its intended business use. Including peripheral data negatively impacts the
Including training data model’s accuracy.
3.Boosts speed of learning
• hat doesn’t directly contribute to solving the business problem bogs down the learning process.
Models trained on highly relevant data learn more quickly and make more accurate predictions.
4.More-efficient use of compute resources
• Pruning out peripheral data boosts speed and efficiency. With less data to sift through, compute
resources aren’t dedicated to processing tasks that aren’t generating additional value.
Principal component analysis
• Definition
• Explanation
• Some common terms used in PCA algorithm
• Principal Components in PCA
• Steps for PCA algorithm
• Applications of Principal Component Analysis
Definition
• “Principal Component Analysis is an unsupervised learning algorithm
that is used for the dimensionality reduction in machine learning. It
is a statistical process that converts the observations of correlated
features into a set of linearly uncorrelated features with the help of
orthogonal transformation. These new transformed features are
called the Principal Components”.
• It is one of the popular tools that is used for exploratory data
analysis and predictive modeling. It is a technique to draw strong
patterns from the given dataset by reducing the variances.
Explanation
• PCA generally tries to find the lower-dimensional surface to project the high-
dimensional data.
• PCA works by considering the variance of each attribute because the high attribute
shows the good split between the classes, and hence it reduces the dimensionality.
• Some real-world applications of PCA are image processing, movie recommendation
system, optimizing the power allocation in various communication channels.
• It is a feature extraction technique, so it contains the important variables and drops
the least important variable.
The PCA algorithm is based on some
mathematical concepts such as:
• Variance and Covariance
• Eigenvalues and Eigen factors
Some common terms used in PCA algorithm:
• Dimensionality: It is the number of features or variables present in the given dataset.
More easily, it is the number of columns present in the dataset.
• Correlation: It signifies that how strongly two variables are related to each other. Such
as if one changes, the other variable also gets changed. The correlation value ranges
from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and
+1 indicates that variables are directly proportional to each other.
• Orthogonal: It defines that variables are not correlated to each other, and hence the
correlation between the pair of variables is zero.
• Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will
be eigenvector if Av is the scalar multiple of v.
• Covariance Matrix: A matrix containing the covariance between the pair of variables is
called the Covariance Matrix.
Principal Components in PCA
• As described above, the transformed new features or the output of PCA are
the Principal Components. The number of these PCs are either equal to or
less than the original features present in the dataset. Some properties of
these principal components are given below:
• The principal component must be the linear combination of the original
features.
• These components are orthogonal, i.e., the correlation between a pair of
variables is zero.
• The importance of each component decreases when going to 1 to n, it
means the 1 PC has the most importance, and n PC will have the least
importance.
Steps for PCA algorithm
• Step 1: Getting the dataset
Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the training
set, and Y is the validation set.
• Step 2: Representing data into a structure
Now we will represent our dataset into a structure. Such as we will represent the two-dimensional matrix
of independent variable X. Here each row corresponds to the data items, and the column corresponds to
the Features. The number of columns is the dimensions of the dataset.
• Step 3: Standardizing the data
In this step, we will standardize our dataset. Such as in a particular column, the features with high
variance are more important compared to the features with lower variance.
If the importance of features is independent of the variance of the feature, then we will divide each data
item in a column with the standard deviation of the column. Here we will name the matrix as Z.
• Step 4: Calculating the Covariance of Z
To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose, we will
multiply it by Z. The output matrix will be the Covariance matrix of Z.
Steps for PCA algorithm
• Step 5: Calculating the Eigen Values and Eigen Vectors
Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance
matrix Z. Eigenvectors or the covariance matrix are the directions of the axes with high
information. And the coefficients of these eigenvectors are defined as the eigenvalues.
• Step 6: Sorting the Eigen Vectors
In this step, we will take all the eigenvalues and will sort them in decreasing order, which
means from largest to smallest. And simultaneously sort the eigenvectors accordingly in
matrix P of eigenvalues. The resultant matrix will be named as P*.
• Step 7:Calculating the new features Or Principal Components
Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In
the resultant matrix Z*, each observation is the linear combination of original features. Each
column of the Z* matrix is independent of each other.
• Step 8: Remove less or unimportant features from the new dataset.
The new feature set has occurred, so we will decide here what to keep and what to remove.
It means, we will only keep the relevant or important features in the new dataset, and
unimportant features will be removed out.
Applications of Principal Component Analysis
• PCA is mainly used as the dimensionality reduction
technique in various AI applications such as computer
vision, image compression, etc.
• It can also be used for finding hidden patterns if data
has high dimensions. Some fields where PCA is used
are Finance, data mining, Psychology, etc.
•
Singular value decomposition.
• Introduction
• Definition
• Explanation
• Eigenvalues and Eigenvectors
• Need of SVD
• Singular Value Decomposition process
• Conclusion
introduction
• Linear Algebra makes the core foundation for Machine Leaning
algorithms ranging from simple linear regressions to Deep Neural
Networks. The major reason for that is the set of data can be
represented using a 2-D matrix where the column represents the
features and the row represents the different sample data points.
Having said that, the Matrix computations using all of the values in a
matrix are sometimes redundant or rather computationally
expensive. We need to represent the matrix in a form such that, the
most important part of the matrix which is needed for further
computations could be extracted easily. That’s where the Singular
Value Decomposition(SVD) comes into play.
Definition
• SVD is basically a matrix factorization technique,
which decomposes any matrix into 3 generic and
familiar matrices.
Explanation
• It has some cool applications in Machine Learning and
Image Processing. To understand the concept of Singular
Value Decomposition the knowledge
on eigenvalues and eigenvectors is essential. If you have a
pretty good understanding on eigenvalues and
eigenvectors, scroll down a bit to experience the Singular
Value Decomposition.
•
Eigenvalues and Eigenvectors
• The multiplication of a Matrix and a Vector produces another
vector which is defined as the transformation occurred to
that vector with respect to the specific matrix in the given
vector space.
• However, there exist some vectors for some given Matrix
such that their direction does not change even after the
transformation is applied. Such vectors are called
the eigenvectors of the given matrix while the scaled valued
of the vector after the transformation is defined
as eigenvalue corresponding to that eigenvector.
Eigenvalues and Eigenvectors
• This can be illustrated as
follows:
Need of SVD
The actual motivation for Singular Value Decomposition
arises from the drawbacks of eigendecomposition:
• The concept of eigenvalues is only applicable to
square matices.
• For square matrices, the presence of complex
eigenvalues restricts the amount of real eigenspaces.
Singular Value Decomposition
• To overcome the challenges from
eigendecomposition of a matrix, the need of more
generic representation of any matrices arises and
that’s where Singular Value Decomposition comes
into play. Let A be any rectangular matrix of shape (m
x n) . We can show that both AᵀA and AAᵀ are
symmetric square matrices of shapes (n x n) and (m x
m) respectively.
Singular Value Decomposition
Singular Value Decomposition
Singular Value Decomposition
Singular Value Decomposition
Conclusion
• The main intuition behind Singular Value
Decomposition is, that Matrix A transforms a set of
orthogonal vectors (v) to another set of orthogonal
vectors (u) with a scaling factor of σ. So σ is called the
Singular Value corresponding to the respective
singular vectors u and v.
Feature selection
• Definition
• Explanation
• What is Feature Selection?
• Need for Feature Selection
• Feature Selection Techniques
• Feature selection Methods(Feature Subset Selection):
1. Wrapper Methods
2. Filter Methods
3. Embedded Methods
• Feature Ranking
Definition:
• Feature selection is a way of selecting the subset of
the most relevant features from the original features
set by removing the redundant, irrelevant, or noisy
features.
Explanation
• While developing the machine learning model, only a few variables in the dataset are
useful for building the model, and the rest features are either redundant or irrelevant.
If we input the dataset with all these redundant and irrelevant features, it may
negatively impact and reduce the overall performance and accuracy of the model.
Hence it is very important to identify and select the most appropriate features from
the data and remove the irrelevant or less important features, which is done with the
help of feature selection in machine learning.
• Feature selection is one of the important concepts of machine learning, which highly
impacts the performance of the model. As machine learning works on the concept of
"Garbage In Garbage Out", so we always need to input the most appropriate and
relevant dataset to the model in order to get a better result.
What is Feature Selection?
• A feature is an attribute that has an impact on a problem or is useful for the problem,
and choosing the important features for the model is known as feature selection. Each
machine learning process depends on feature engineering, which mainly contains two
processes; which are Feature Selection and Feature Extraction. Although feature
selection and extraction processes may have the same objective, both are completely
different from each other.
• The main difference between them is that feature selection is about selecting the
subset of the original feature set, whereas feature extraction creates new features.
Feature selection is a way of reducing the input variable for the model by using only
relevant data in order to reduce overfitting in the model.
• So, we can define feature Selection as, "It is a process of automatically or manually
selecting the subset of most appropriate and relevant features to be used in model
building." Feature selection is performed by either including the important features or
excluding the irrelevant features in the dataset without changing them.
Need for Feature Selection
• Before implementing any technique, it is really important to understand, need for the
technique and so for the Feature Selection. As we know, in machine learning, it is
necessary to provide a pre-processed and good input dataset in order to get better
outcomes. We collect a huge amount of data to train our model and help it to learn
better. Generally, the dataset consists of noisy data, irrelevant data, and some part of
useful data. Moreover, the huge amount of data also slows down the training process
of the model, and with noise and irrelevant data, the model may not predict and
perform well. So, it is very necessary to remove such noises and less-important data
from the dataset and to do this, and Feature selection techniques are used.
• Selecting the best features helps the model to perform well. For example, Suppose we
want to create a model that automatically decides which car should be crushed for a
spare part, and to do this, we have a dataset. This dataset contains a Model of the car,
Year, Owner's name, Miles. So, in this dataset, the name of the owner does not
contribute to the model performance as it does not decide if the car should be
crushed or not, so we can remove this column and select the rest of the
features(column) for the model building.
Need for Feature Selection
• Below are some benefits of using feature selection in
machine learning:
• It helps in avoiding the curse of dimensionality.
• It helps in the simplification of the model so that it can be
easily interpreted by the researchers.
• It reduces the training time.
• It reduces overfitting hence enhance the generalization.
Feature Selection Techniques
• There are mainly two types of Feature Selection
techniques, which are:
• 1.Supervised Feature Selection technique
Supervised Feature selection techniques consider the
target variable and can be used for the labelled dataset.
• 2.Unsupervised Feature Selection technique
Unsupervised Feature selection techniques ignore the
target variable and can be used for the unlabelled dataset.
Feature Selection Techniques
Feature Selection Mrhods
• There are mainly three techniques under supervised feature Selection:
42
Wrapper Methods
• In wrapper methodology, selection of features is done
by considering it as a search problem, in which
different combinations are made, evaluated, and
compared with other combinations. It trains the
algorithm by using the subset of features iteratively.
On the basis of the output of the model,
features are added or subtracted, and with this
feature set, the model has trained again.
Some techniques of wrapper methods are:
• Forward selection - Forward selection is an iterative process, which begins with an empty set of
features. After each iteration, it keeps adding on a feature and evaluates the performance to check
whether it is improving the performance or not. The process continues until the addition of a new
variable/feature does not improve the performance of the model.
• Backward elimination - Backward elimination is also an iterative approach, but it is the opposite of
forward selection. This technique begins the process by considering all the features and removes
the least significant feature. This elimination process continues until removing the features does
not improve the performance of the model.
• Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature selection
methods, which evaluates each feature set as brute-force. It means this method tries & make each
possible combination of features and return the best performing feature set.
• Recursive Feature Elimination-
Recursive feature elimination is a recursive greedy optimization approach, where features are
selected by recursively taking a smaller and smaller subset of features. Now, an estimator is trained
with each set of features, and the importance of each feature is determined using coef_attribute or
through a feature_importances_attribute.
2. Filter Methods
• In Filter Method, features are selected on the basis of
statistics measures. This method does not depend on the
learning algorithm and chooses the features as a pre-
processing step.
• The filter method filters out the irrelevant feature and
redundant columns from the model by using different
metrics through ranking.
• The advantage of using filter methods is that it needs low
computational time and does not overfit the data.
2. Filter Methods
Some common techniques of Filter methods are
as follows:
• Information Gain
• Chi-square Test
• Fisher's Score
• Missing Value Ratio
Some common techniques of Filter methods are
as follows:
• 1.Information Gain: Information gain determines the reduction in entropy while
transforming the dataset. It can be used as a feature selection technique by calculating
the information gain of each variable with respect to the target variable.
• 2.Chi-square Test: Chi-square test is a technique to determine the relationship
between the categorical variables. The chi-square value is calculated between each
feature and the target variable, and the desired number of features with the best chi-
square value is selected.
• 3.Fisher's Score:
• Fisher's score is one of the popular supervised technique of features selection. It
returns the rank of the variable on the fisher's criteria in descending order. Then we
can select the variables with a large fisher's score.
4.Missing Value Ratio:
• The value of the missing value ratio can be used for
evaluating the feature set against the threshold value.
The formula for obtaining the missing value ratio is
the number of missing values in each column divided
by the total number of observations. The variable is
having more than the threshold value can be
dropped.
3. Embedded Methods
• Embedded methods combined the advantages of
both filter and wrapper methods by considering the
interaction of features along with low computational
cost. These are fast processing methods similar to the
filter method but more accurate than the filter
method.
These methods are also iterative, which
evaluates each iteration, and optimally finds the
most important features that contribute the
most to training in a particular iteration.
Some techniques of embedded methods are:
• Regularization- Regularization adds a penalty term to different parameters of the machine learning
model for avoiding overfitting in the model. This penalty term is added to the coefficients; hence it
shrinks some coefficients to zero. Those features with zero coefficients can be removed from the
dataset. The types of regularization techniques are L1 Regularization (Lasso Regularization) or
Elastic Nets (L1 and L2 regularization).
• Random Forest Importance - Different tree-based methods of feature selection help us with feature
importance to provide a way of selecting features. Here, feature importance specifies which feature
has more importance in model building or has a great impact on the target variable. Random Forest
is such a tree-based method, which is a type of bagging algorithm that aggregates a different
number of decision trees. It automatically ranks the nodes by their performance or decrease in the
impurity (Gini impurity) over all the trees. Nodes are arranged as per the impurity values, and thus
it allows to pruning of trees below a specific node. The remaining nodes create a subset of the most
important features.
Feature Ranking
• Feature Ranking can be defined as selecting “n”
number of significant features for a problem by
ranking features according to their importance in the
model.
Evaluating Machine Learning
algorithms and Model Selection.
• Introduction
• How to find the best solution
• Basic terminologies used to evaluate machine
learning algorithms
• Train, validate and test algorithms
• Algorithm exploration
• Hyperparameter optimization
• Ensemble learning
Introduction
• Different machine learning algorithms search for
different trends and patterns. One algorithm isn’t the
best across all data sets or for all use cases. To find
the best solution, you need to conduct many
experiments, evaluate machine learning algorithms,
and tune their hyperparameters.
How to find the best solution
• First, you choose, justify, and apply a model performance indicator to assess
your model and justify the choice of an algorithm. Examples of model
performance indicators include the F1 score, true positive rate, and within
cluster sum of squared error.
• Implement your algorithm in at least one deep-learning and at least one
non-deep learning algorithm. Then, compare and document model
performance.
• Apply at least one more iteration in the process model that involves at least
the feature creation task. Record the impact on model performance, such as
data normalizing and principal component analysis (PCA). Depending on the
algorithm class and data set size, you might choose specific technologies or
frameworks to solve your problem.
As you evaluate algorithms, it's useful to know
related terminology:
Related terminologies
Related terminologies
Train, validate, and test algorithms
• Machine learning algorithms learn from examples. If you have
good data, the more examples you provide, the better the
model is at finding patterns in the data. However, be cautious of
overfitting. As described in the first table, overfitting is where a
model can accurately make predictions for data that it was
trained on but can't generalize to other data.
• Overfitting is why you split your data into training data,
validation data, and test data. Validation data and test data are
often referred to interchangeably, but they have distinct
purposes.
Train, validate, and test algorithms
Get familiar with the terminology around
training, testing, and validating data:
Get familiar with the terminology around
training, testing, and validating data:
Algorithm exploration
• The algorithms that you explore must be driven by
your use case. By first identifying what you're trying
to achieve, you can narrow the scope of searching for
solutions. Possible methods include, but aren't
limited to algorithms for regression, classification,
clustering, recommendations, and anomaly
detection. You can also use the Algorithm Explorer to
guide algorithm selection.
Regression
• Regression algorithms are machine learning
techniques for predicting continuous numerical
values. They're supervised learning tasks, so they
require labeled training examples.
Classification
• Classification algorithms are machine learning
techniques for predicting which category the input
data belongs to. They're supervised learning tasks, so
they require labeled training examples.
Clustering
• Clustering algorithms are machine learning
techniques to divide data into a number of groups
where points in the groups have similar traits. They're
unsupervised learning tasks and don't require labeled
training examples.
Recommendation engines
• Recommendation engines are created to predict a
preference or rating that indicates a user’s interest in
an item or product. The algorithms that are used to
create this system find similarities between the users,
the items, or both.
Anomaly detection
• Anomaly detection is a technique that is used to
identify unusual events or patterns that don't
conform to expected behavior. The items that are
identified are often referred to
as anomalies or outliers.
Hyperparameter optimization
• Although the terms parameters and hyperparameters are occasionally used
interchangeably, distinctions exist between them. Parameters are properties that the
algorithm is learning during training. For linear regression, those parameters are the
weights and biases. For random forests, they're the variables and thresholds at each
node.
• Hyperparameters are properties that must be set before training. For k-means
clustering, you must define the value of k. For neural networks, an example is the
learning rate. Hyperparameter optimization is the process of finding the best possible
values for the hyperparameters to optimize your performance metric, such as highest
accuracy and lowest RMSE (root-mean-square error). You train a model for different
combinations of values and evaluate which models find the best solution. Methods to
search for the best combinations include grid search, random search, coarse-to-fine,
and bayesian optimization.
Ensemble learning
• Ensembles combine several machine learning models,
each finding different patterns within the data to
provide a more accurate solution. These techniques
can improve performance, as they capture more
trends. They can also reduce overfitting, as the final
prediction is a consensus from many models.
Ensemble learning
Bagging
• Bagging, or bootstrap aggregations, is the method of building models in parallel and averaging their
prediction as the final prediction. These models can be built with the same algorithm. For example,
the random forest algorithm builds many decision trees. You can also build different types of
models, such as a linear regression model and a support vector machine (SVM) model.
Boosting
• Boosting builds models sequentially by evaluating the success of earlier models. The next model
prioritizes learning trends for predicting examples that the current models perform poorly on.
Three common techniques are AdaBoost, Gradient Boosting, and XGBoosted.
Stacking
• Stacking involves building models and using their output as features into a final model. For
example, your goal is to create a classifier and you build a KNN model and a Naïve Bayes model.
Rather than choose between the two, you can pass their two predictions into a final logistic
regression model. This final model might result in better results than the two intermediate models.
Book Reference
74
S. No. Title of Book Authors Publisher
Reference Books
T1 Pattern Recognition and Machine
Learning
Christopher M. Bishop Springer
T2 Introduction to Machine Learning
(Adaptive Computation and
Machine Learning),
Ethem Alpaydın MIT Press
Text Books
R1 Machine Learning Mitchell. T, McGraw Hill
 introduction to Statistical Theory.pptx

More Related Content

Similar to introduction to Statistical Theory.pptx

Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Jayanti Pande
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxnarmeen11
 
30thSep2014
30thSep201430thSep2014
30thSep2014Mia liu
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningKnoldus Inc.
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfMargiShah29
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docxjaffarbikat
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruptionjagan477830
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...IRJET Journal
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine LearningPyingkodi Maran
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenPoo Kuan Hoong
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine LearningMehwish690898
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptxbeherasushree212
 

Similar to introduction to Statistical Theory.pptx (20)

Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.Data Mining Module 2 Business Analytics.
Data Mining Module 2 Business Analytics.
 
seminar.pptx
seminar.pptxseminar.pptx
seminar.pptx
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
30thSep2014
30thSep201430thSep2014
30thSep2014
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
 
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdfAIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
AIML_UNIT 2 _PPT_HAND NOTES_MPS.pdf
 
Deep Learning Vocabulary.docx
Deep Learning Vocabulary.docxDeep Learning Vocabulary.docx
Deep Learning Vocabulary.docx
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Identifying and classifying unknown Network Disruption
Identifying and classifying unknown Network DisruptionIdentifying and classifying unknown Network Disruption
Identifying and classifying unknown Network Disruption
 
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...IRJET -  	  An User Friendly Interface for Data Preprocessing and Visualizati...
IRJET - An User Friendly Interface for Data Preprocessing and Visualizati...
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
M5.pptx
M5.pptxM5.pptx
M5.pptx
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Rapid Miner
Rapid MinerRapid Miner
Rapid Miner
 
Customer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R OpenCustomer Churn Analytics using Microsoft R Open
Customer Churn Analytics using Microsoft R Open
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 
Feature selection using PCA.pptx
Feature selection using PCA.pptxFeature selection using PCA.pptx
Feature selection using PCA.pptx
 

More from Dr.Shweta

research ethics , plagiarism checking and removal.pptx
research ethics , plagiarism checking and removal.pptxresearch ethics , plagiarism checking and removal.pptx
research ethics , plagiarism checking and removal.pptxDr.Shweta
 
effective modular design.pptx
effective modular design.pptxeffective modular design.pptx
effective modular design.pptxDr.Shweta
 
software design: design fundamentals.pptx
software design: design fundamentals.pptxsoftware design: design fundamentals.pptx
software design: design fundamentals.pptxDr.Shweta
 
Search Algorithms in AI.pptx
Search Algorithms in AI.pptxSearch Algorithms in AI.pptx
Search Algorithms in AI.pptxDr.Shweta
 
Informed search algorithms.pptx
Informed search algorithms.pptxInformed search algorithms.pptx
Informed search algorithms.pptxDr.Shweta
 
constraint satisfaction problems.pptx
constraint satisfaction problems.pptxconstraint satisfaction problems.pptx
constraint satisfaction problems.pptxDr.Shweta
 
review paper publication.pptx
review paper publication.pptxreview paper publication.pptx
review paper publication.pptxDr.Shweta
 
SORTING techniques.pptx
SORTING techniques.pptxSORTING techniques.pptx
SORTING techniques.pptxDr.Shweta
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptxDr.Shweta
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptxDr.Shweta
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxDr.Shweta
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxDr.Shweta
 
Introduction of machine learning.pptx
Introduction of machine learning.pptxIntroduction of machine learning.pptx
Introduction of machine learning.pptxDr.Shweta
 
searching techniques.pptx
searching techniques.pptxsearching techniques.pptx
searching techniques.pptxDr.Shweta
 
LINKED LIST.pptx
LINKED LIST.pptxLINKED LIST.pptx
LINKED LIST.pptxDr.Shweta
 
complexity.pptx
complexity.pptxcomplexity.pptx
complexity.pptxDr.Shweta
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptxDr.Shweta
 

More from Dr.Shweta (20)

research ethics , plagiarism checking and removal.pptx
research ethics , plagiarism checking and removal.pptxresearch ethics , plagiarism checking and removal.pptx
research ethics , plagiarism checking and removal.pptx
 
effective modular design.pptx
effective modular design.pptxeffective modular design.pptx
effective modular design.pptx
 
software design: design fundamentals.pptx
software design: design fundamentals.pptxsoftware design: design fundamentals.pptx
software design: design fundamentals.pptx
 
Search Algorithms in AI.pptx
Search Algorithms in AI.pptxSearch Algorithms in AI.pptx
Search Algorithms in AI.pptx
 
Informed search algorithms.pptx
Informed search algorithms.pptxInformed search algorithms.pptx
Informed search algorithms.pptx
 
constraint satisfaction problems.pptx
constraint satisfaction problems.pptxconstraint satisfaction problems.pptx
constraint satisfaction problems.pptx
 
review paper publication.pptx
review paper publication.pptxreview paper publication.pptx
review paper publication.pptx
 
SORTING techniques.pptx
SORTING techniques.pptxSORTING techniques.pptx
SORTING techniques.pptx
 
Recommended System.pptx
 Recommended System.pptx Recommended System.pptx
Recommended System.pptx
 
semi supervised Learning and Reinforcement learning (1).pptx
 semi supervised Learning and Reinforcement learning (1).pptx semi supervised Learning and Reinforcement learning (1).pptx
semi supervised Learning and Reinforcement learning (1).pptx
 
Unit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptxUnit 2 unsupervised learning.pptx
Unit 2 unsupervised learning.pptx
 
unit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptxunit 1.2 supervised learning.pptx
unit 1.2 supervised learning.pptx
 
Introduction of machine learning.pptx
Introduction of machine learning.pptxIntroduction of machine learning.pptx
Introduction of machine learning.pptx
 
searching techniques.pptx
searching techniques.pptxsearching techniques.pptx
searching techniques.pptx
 
LINKED LIST.pptx
LINKED LIST.pptxLINKED LIST.pptx
LINKED LIST.pptx
 
complexity.pptx
complexity.pptxcomplexity.pptx
complexity.pptx
 
queue.pptx
queue.pptxqueue.pptx
queue.pptx
 
STACK.pptx
STACK.pptxSTACK.pptx
STACK.pptx
 
dsa.pptx
dsa.pptxdsa.pptx
dsa.pptx
 
Introduction to Data Science.pptx
Introduction to Data Science.pptxIntroduction to Data Science.pptx
Introduction to Data Science.pptx
 

Recently uploaded

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 

Recently uploaded (20)

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 

introduction to Statistical Theory.pptx

  • 1. Machine Learning Unit III Introduction to Statistical Learning Theory By: Dr. Shweta Saraswat
  • 2. Introduction to Statistical Learning Theory • Feature Extraction 1.Principal component analysis 2.Singular value decomposition. • Feature selection , Feature Ranking and subset selection 1.Filter, 2.Wrapper 3.Embedded methods, • Evaluating Machine Learning algorithms and Model Selection.
  • 3.
  • 4. Introduction to Statistical Learning Theory • Statistical learning theory is a framework for machine learning drawing from the fields of statistics and functional analysis. • Statistical learning theory deals with the statistical inference problem of finding a predictive function based on data. • Statistical learning theory has led to successful applications in fields such as computer vision, speech recognition, and bioinformatics.
  • 5. Feature Extraction • Dimensionality Reduction • Feature extraction :Definition • Need of Feature Extraction • Applications of Feature Extraction • Basic Terminologies used in Feature Extraction • Characteristics of Feature Extraction
  • 6. Dimensionality Reduction 6 Reducing the dimension of the feature space is called “dimensionality reduction.” There are many ways to achieve dimensionality reduction, but most of these techniques fall into one of two classes:  Feature Selection  Feature Extraction
  • 7. Feature extraction 7 Feature extraction is a process of dimensionality reduction by which an initial set of raw data is reduced to more manageable groups for processing. A characteristic of these large data sets is a large number of variables that require a lot of computing resources to process.
  • 8. Why is this Useful? • The process of feature extraction is useful when you need to reduce the number of resources needed for processing without losing important or relevant information. Feature extraction can also reduce the amount of redundant data for a given analysis. Also, the reduction of the data and the machine’s efforts in building variable combinations (features) facilitate the speed of learning and generalization steps in the machine learning process.
  • 9. Practical Uses of Feature Extraction • Autoencoders • – The purpose of autoencoders is unsupervised learning of efficient data coding. Feature extraction is used here to identify key features in the data for coding by learning from the coding of the original data set to derive new ones. • Bag-of-Words • – A technique for natural language processing that extracts the words (features) used in a sentence, document, website, etc. and classifies them by frequency of use. This technique can also be applied to image processing. • Image Processing – Algorithms are used to detect features such as shaped, edges, or motion in a digital image or video.
  • 10. Basic Terminologies used in Feature Extraction • Features are variables that can be defined and observed. For example, in a healthcare context, features may include gender, height, weight, resting heart rate, or blood sugar levels. • Feature engineering is the process of reworking a data set to improve the training of a machine learning model. By adding, deleting, combining, or mutating the data within the data set, data scientists expertly tailor the training data to ensure the resulting machine learning model will fulfill its business use case. • Data scientists use feature engineering to prepare an input data set that’s best suited to support the intended business purpose of the machine learning algorithm. For example, one method involves handling outliers. Since outliers fall so far out of the expected range, they can negatively impact the accuracy of predictions. One common way of dealing with outliers is trimming. • Trimming simply removes the outlier values, ensuring they don’t contaminate the training data. • Feature extraction is a subset of feature engineering. Data scientists turn to feature extraction when the data in its raw form is unusable. Feature extraction transforms raw data into numerical features compatible with machine learning algorithms. One common application is raw data in the form of image files—by extracting the shape of an object or the redness value in images, data scientists can create new features suitable for machine learning applications. • Feature selection is closely related. Where feature extraction and feature engineering involve creating new features, feature selection is the process of choosing which features are most likely to enhance the quality of your prediction variable or output. By only selecting the most relevant features, feature selection creates simpler, more easily understood machine learning models.
  • 11. Four ways feature extraction enables machine learning models to better serve their intended purpose • Feature extraction improves the efficiency and accuracy of machine learning. Here : 1.Reduces redundant data • Feature extraction cuts through the noise, removing redundant and unnecessary data. This frees machine learning programs to focus on the most relevant data. 2.Improves model accuracy • The most accurate machine learning models are those developed using only the data required to train the model to its intended business use. Including peripheral data negatively impacts the Including training data model’s accuracy. 3.Boosts speed of learning • hat doesn’t directly contribute to solving the business problem bogs down the learning process. Models trained on highly relevant data learn more quickly and make more accurate predictions. 4.More-efficient use of compute resources • Pruning out peripheral data boosts speed and efficiency. With less data to sift through, compute resources aren’t dedicated to processing tasks that aren’t generating additional value.
  • 12. Principal component analysis • Definition • Explanation • Some common terms used in PCA algorithm • Principal Components in PCA • Steps for PCA algorithm • Applications of Principal Component Analysis
  • 13. Definition • “Principal Component Analysis is an unsupervised learning algorithm that is used for the dimensionality reduction in machine learning. It is a statistical process that converts the observations of correlated features into a set of linearly uncorrelated features with the help of orthogonal transformation. These new transformed features are called the Principal Components”. • It is one of the popular tools that is used for exploratory data analysis and predictive modeling. It is a technique to draw strong patterns from the given dataset by reducing the variances.
  • 14. Explanation • PCA generally tries to find the lower-dimensional surface to project the high- dimensional data. • PCA works by considering the variance of each attribute because the high attribute shows the good split between the classes, and hence it reduces the dimensionality. • Some real-world applications of PCA are image processing, movie recommendation system, optimizing the power allocation in various communication channels. • It is a feature extraction technique, so it contains the important variables and drops the least important variable.
  • 15. The PCA algorithm is based on some mathematical concepts such as: • Variance and Covariance • Eigenvalues and Eigen factors
  • 16. Some common terms used in PCA algorithm: • Dimensionality: It is the number of features or variables present in the given dataset. More easily, it is the number of columns present in the dataset. • Correlation: It signifies that how strongly two variables are related to each other. Such as if one changes, the other variable also gets changed. The correlation value ranges from -1 to +1. Here, -1 occurs if variables are inversely proportional to each other, and +1 indicates that variables are directly proportional to each other. • Orthogonal: It defines that variables are not correlated to each other, and hence the correlation between the pair of variables is zero. • Eigenvectors: If there is a square matrix M, and a non-zero vector v is given. Then v will be eigenvector if Av is the scalar multiple of v. • Covariance Matrix: A matrix containing the covariance between the pair of variables is called the Covariance Matrix.
  • 17. Principal Components in PCA • As described above, the transformed new features or the output of PCA are the Principal Components. The number of these PCs are either equal to or less than the original features present in the dataset. Some properties of these principal components are given below: • The principal component must be the linear combination of the original features. • These components are orthogonal, i.e., the correlation between a pair of variables is zero. • The importance of each component decreases when going to 1 to n, it means the 1 PC has the most importance, and n PC will have the least importance.
  • 18. Steps for PCA algorithm • Step 1: Getting the dataset Firstly, we need to take the input dataset and divide it into two subparts X and Y, where X is the training set, and Y is the validation set. • Step 2: Representing data into a structure Now we will represent our dataset into a structure. Such as we will represent the two-dimensional matrix of independent variable X. Here each row corresponds to the data items, and the column corresponds to the Features. The number of columns is the dimensions of the dataset. • Step 3: Standardizing the data In this step, we will standardize our dataset. Such as in a particular column, the features with high variance are more important compared to the features with lower variance. If the importance of features is independent of the variance of the feature, then we will divide each data item in a column with the standard deviation of the column. Here we will name the matrix as Z. • Step 4: Calculating the Covariance of Z To calculate the covariance of Z, we will take the matrix Z, and will transpose it. After transpose, we will multiply it by Z. The output matrix will be the Covariance matrix of Z.
  • 19. Steps for PCA algorithm • Step 5: Calculating the Eigen Values and Eigen Vectors Now we need to calculate the eigenvalues and eigenvectors for the resultant covariance matrix Z. Eigenvectors or the covariance matrix are the directions of the axes with high information. And the coefficients of these eigenvectors are defined as the eigenvalues. • Step 6: Sorting the Eigen Vectors In this step, we will take all the eigenvalues and will sort them in decreasing order, which means from largest to smallest. And simultaneously sort the eigenvectors accordingly in matrix P of eigenvalues. The resultant matrix will be named as P*. • Step 7:Calculating the new features Or Principal Components Here we will calculate the new features. To do this, we will multiply the P* matrix to the Z. In the resultant matrix Z*, each observation is the linear combination of original features. Each column of the Z* matrix is independent of each other. • Step 8: Remove less or unimportant features from the new dataset. The new feature set has occurred, so we will decide here what to keep and what to remove. It means, we will only keep the relevant or important features in the new dataset, and unimportant features will be removed out.
  • 20. Applications of Principal Component Analysis • PCA is mainly used as the dimensionality reduction technique in various AI applications such as computer vision, image compression, etc. • It can also be used for finding hidden patterns if data has high dimensions. Some fields where PCA is used are Finance, data mining, Psychology, etc. •
  • 21. Singular value decomposition. • Introduction • Definition • Explanation • Eigenvalues and Eigenvectors • Need of SVD • Singular Value Decomposition process • Conclusion
  • 22. introduction • Linear Algebra makes the core foundation for Machine Leaning algorithms ranging from simple linear regressions to Deep Neural Networks. The major reason for that is the set of data can be represented using a 2-D matrix where the column represents the features and the row represents the different sample data points. Having said that, the Matrix computations using all of the values in a matrix are sometimes redundant or rather computationally expensive. We need to represent the matrix in a form such that, the most important part of the matrix which is needed for further computations could be extracted easily. That’s where the Singular Value Decomposition(SVD) comes into play.
  • 23. Definition • SVD is basically a matrix factorization technique, which decomposes any matrix into 3 generic and familiar matrices.
  • 24. Explanation • It has some cool applications in Machine Learning and Image Processing. To understand the concept of Singular Value Decomposition the knowledge on eigenvalues and eigenvectors is essential. If you have a pretty good understanding on eigenvalues and eigenvectors, scroll down a bit to experience the Singular Value Decomposition. •
  • 25. Eigenvalues and Eigenvectors • The multiplication of a Matrix and a Vector produces another vector which is defined as the transformation occurred to that vector with respect to the specific matrix in the given vector space. • However, there exist some vectors for some given Matrix such that their direction does not change even after the transformation is applied. Such vectors are called the eigenvectors of the given matrix while the scaled valued of the vector after the transformation is defined as eigenvalue corresponding to that eigenvector.
  • 26. Eigenvalues and Eigenvectors • This can be illustrated as follows:
  • 27. Need of SVD The actual motivation for Singular Value Decomposition arises from the drawbacks of eigendecomposition: • The concept of eigenvalues is only applicable to square matices. • For square matrices, the presence of complex eigenvalues restricts the amount of real eigenspaces.
  • 28. Singular Value Decomposition • To overcome the challenges from eigendecomposition of a matrix, the need of more generic representation of any matrices arises and that’s where Singular Value Decomposition comes into play. Let A be any rectangular matrix of shape (m x n) . We can show that both AᵀA and AAᵀ are symmetric square matrices of shapes (n x n) and (m x m) respectively.
  • 33. Conclusion • The main intuition behind Singular Value Decomposition is, that Matrix A transforms a set of orthogonal vectors (v) to another set of orthogonal vectors (u) with a scaling factor of σ. So σ is called the Singular Value corresponding to the respective singular vectors u and v.
  • 34. Feature selection • Definition • Explanation • What is Feature Selection? • Need for Feature Selection • Feature Selection Techniques • Feature selection Methods(Feature Subset Selection): 1. Wrapper Methods 2. Filter Methods 3. Embedded Methods • Feature Ranking
  • 35. Definition: • Feature selection is a way of selecting the subset of the most relevant features from the original features set by removing the redundant, irrelevant, or noisy features.
  • 36. Explanation • While developing the machine learning model, only a few variables in the dataset are useful for building the model, and the rest features are either redundant or irrelevant. If we input the dataset with all these redundant and irrelevant features, it may negatively impact and reduce the overall performance and accuracy of the model. Hence it is very important to identify and select the most appropriate features from the data and remove the irrelevant or less important features, which is done with the help of feature selection in machine learning. • Feature selection is one of the important concepts of machine learning, which highly impacts the performance of the model. As machine learning works on the concept of "Garbage In Garbage Out", so we always need to input the most appropriate and relevant dataset to the model in order to get a better result.
  • 37. What is Feature Selection? • A feature is an attribute that has an impact on a problem or is useful for the problem, and choosing the important features for the model is known as feature selection. Each machine learning process depends on feature engineering, which mainly contains two processes; which are Feature Selection and Feature Extraction. Although feature selection and extraction processes may have the same objective, both are completely different from each other. • The main difference between them is that feature selection is about selecting the subset of the original feature set, whereas feature extraction creates new features. Feature selection is a way of reducing the input variable for the model by using only relevant data in order to reduce overfitting in the model. • So, we can define feature Selection as, "It is a process of automatically or manually selecting the subset of most appropriate and relevant features to be used in model building." Feature selection is performed by either including the important features or excluding the irrelevant features in the dataset without changing them.
  • 38. Need for Feature Selection • Before implementing any technique, it is really important to understand, need for the technique and so for the Feature Selection. As we know, in machine learning, it is necessary to provide a pre-processed and good input dataset in order to get better outcomes. We collect a huge amount of data to train our model and help it to learn better. Generally, the dataset consists of noisy data, irrelevant data, and some part of useful data. Moreover, the huge amount of data also slows down the training process of the model, and with noise and irrelevant data, the model may not predict and perform well. So, it is very necessary to remove such noises and less-important data from the dataset and to do this, and Feature selection techniques are used. • Selecting the best features helps the model to perform well. For example, Suppose we want to create a model that automatically decides which car should be crushed for a spare part, and to do this, we have a dataset. This dataset contains a Model of the car, Year, Owner's name, Miles. So, in this dataset, the name of the owner does not contribute to the model performance as it does not decide if the car should be crushed or not, so we can remove this column and select the rest of the features(column) for the model building.
  • 39. Need for Feature Selection • Below are some benefits of using feature selection in machine learning: • It helps in avoiding the curse of dimensionality. • It helps in the simplification of the model so that it can be easily interpreted by the researchers. • It reduces the training time. • It reduces overfitting hence enhance the generalization.
  • 40. Feature Selection Techniques • There are mainly two types of Feature Selection techniques, which are: • 1.Supervised Feature Selection technique Supervised Feature selection techniques consider the target variable and can be used for the labelled dataset. • 2.Unsupervised Feature Selection technique Unsupervised Feature selection techniques ignore the target variable and can be used for the unlabelled dataset.
  • 42. Feature Selection Mrhods • There are mainly three techniques under supervised feature Selection: 42
  • 43. Wrapper Methods • In wrapper methodology, selection of features is done by considering it as a search problem, in which different combinations are made, evaluated, and compared with other combinations. It trains the algorithm by using the subset of features iteratively.
  • 44. On the basis of the output of the model, features are added or subtracted, and with this feature set, the model has trained again.
  • 45. Some techniques of wrapper methods are: • Forward selection - Forward selection is an iterative process, which begins with an empty set of features. After each iteration, it keeps adding on a feature and evaluates the performance to check whether it is improving the performance or not. The process continues until the addition of a new variable/feature does not improve the performance of the model. • Backward elimination - Backward elimination is also an iterative approach, but it is the opposite of forward selection. This technique begins the process by considering all the features and removes the least significant feature. This elimination process continues until removing the features does not improve the performance of the model. • Exhaustive Feature Selection- Exhaustive feature selection is one of the best feature selection methods, which evaluates each feature set as brute-force. It means this method tries & make each possible combination of features and return the best performing feature set. • Recursive Feature Elimination- Recursive feature elimination is a recursive greedy optimization approach, where features are selected by recursively taking a smaller and smaller subset of features. Now, an estimator is trained with each set of features, and the importance of each feature is determined using coef_attribute or through a feature_importances_attribute.
  • 46. 2. Filter Methods • In Filter Method, features are selected on the basis of statistics measures. This method does not depend on the learning algorithm and chooses the features as a pre- processing step. • The filter method filters out the irrelevant feature and redundant columns from the model by using different metrics through ranking. • The advantage of using filter methods is that it needs low computational time and does not overfit the data.
  • 48. Some common techniques of Filter methods are as follows: • Information Gain • Chi-square Test • Fisher's Score • Missing Value Ratio
  • 49. Some common techniques of Filter methods are as follows: • 1.Information Gain: Information gain determines the reduction in entropy while transforming the dataset. It can be used as a feature selection technique by calculating the information gain of each variable with respect to the target variable. • 2.Chi-square Test: Chi-square test is a technique to determine the relationship between the categorical variables. The chi-square value is calculated between each feature and the target variable, and the desired number of features with the best chi- square value is selected. • 3.Fisher's Score: • Fisher's score is one of the popular supervised technique of features selection. It returns the rank of the variable on the fisher's criteria in descending order. Then we can select the variables with a large fisher's score.
  • 50. 4.Missing Value Ratio: • The value of the missing value ratio can be used for evaluating the feature set against the threshold value. The formula for obtaining the missing value ratio is the number of missing values in each column divided by the total number of observations. The variable is having more than the threshold value can be dropped.
  • 51. 3. Embedded Methods • Embedded methods combined the advantages of both filter and wrapper methods by considering the interaction of features along with low computational cost. These are fast processing methods similar to the filter method but more accurate than the filter method.
  • 52. These methods are also iterative, which evaluates each iteration, and optimally finds the most important features that contribute the most to training in a particular iteration.
  • 53. Some techniques of embedded methods are: • Regularization- Regularization adds a penalty term to different parameters of the machine learning model for avoiding overfitting in the model. This penalty term is added to the coefficients; hence it shrinks some coefficients to zero. Those features with zero coefficients can be removed from the dataset. The types of regularization techniques are L1 Regularization (Lasso Regularization) or Elastic Nets (L1 and L2 regularization). • Random Forest Importance - Different tree-based methods of feature selection help us with feature importance to provide a way of selecting features. Here, feature importance specifies which feature has more importance in model building or has a great impact on the target variable. Random Forest is such a tree-based method, which is a type of bagging algorithm that aggregates a different number of decision trees. It automatically ranks the nodes by their performance or decrease in the impurity (Gini impurity) over all the trees. Nodes are arranged as per the impurity values, and thus it allows to pruning of trees below a specific node. The remaining nodes create a subset of the most important features.
  • 54. Feature Ranking • Feature Ranking can be defined as selecting “n” number of significant features for a problem by ranking features according to their importance in the model.
  • 55. Evaluating Machine Learning algorithms and Model Selection. • Introduction • How to find the best solution • Basic terminologies used to evaluate machine learning algorithms • Train, validate and test algorithms • Algorithm exploration • Hyperparameter optimization • Ensemble learning
  • 56. Introduction • Different machine learning algorithms search for different trends and patterns. One algorithm isn’t the best across all data sets or for all use cases. To find the best solution, you need to conduct many experiments, evaluate machine learning algorithms, and tune their hyperparameters.
  • 57. How to find the best solution • First, you choose, justify, and apply a model performance indicator to assess your model and justify the choice of an algorithm. Examples of model performance indicators include the F1 score, true positive rate, and within cluster sum of squared error. • Implement your algorithm in at least one deep-learning and at least one non-deep learning algorithm. Then, compare and document model performance. • Apply at least one more iteration in the process model that involves at least the feature creation task. Record the impact on model performance, such as data normalizing and principal component analysis (PCA). Depending on the algorithm class and data set size, you might choose specific technologies or frameworks to solve your problem.
  • 58. As you evaluate algorithms, it's useful to know related terminology:
  • 61. Train, validate, and test algorithms • Machine learning algorithms learn from examples. If you have good data, the more examples you provide, the better the model is at finding patterns in the data. However, be cautious of overfitting. As described in the first table, overfitting is where a model can accurately make predictions for data that it was trained on but can't generalize to other data. • Overfitting is why you split your data into training data, validation data, and test data. Validation data and test data are often referred to interchangeably, but they have distinct purposes.
  • 62. Train, validate, and test algorithms
  • 63. Get familiar with the terminology around training, testing, and validating data:
  • 64. Get familiar with the terminology around training, testing, and validating data:
  • 65. Algorithm exploration • The algorithms that you explore must be driven by your use case. By first identifying what you're trying to achieve, you can narrow the scope of searching for solutions. Possible methods include, but aren't limited to algorithms for regression, classification, clustering, recommendations, and anomaly detection. You can also use the Algorithm Explorer to guide algorithm selection.
  • 66. Regression • Regression algorithms are machine learning techniques for predicting continuous numerical values. They're supervised learning tasks, so they require labeled training examples.
  • 67. Classification • Classification algorithms are machine learning techniques for predicting which category the input data belongs to. They're supervised learning tasks, so they require labeled training examples.
  • 68. Clustering • Clustering algorithms are machine learning techniques to divide data into a number of groups where points in the groups have similar traits. They're unsupervised learning tasks and don't require labeled training examples.
  • 69. Recommendation engines • Recommendation engines are created to predict a preference or rating that indicates a user’s interest in an item or product. The algorithms that are used to create this system find similarities between the users, the items, or both.
  • 70. Anomaly detection • Anomaly detection is a technique that is used to identify unusual events or patterns that don't conform to expected behavior. The items that are identified are often referred to as anomalies or outliers.
  • 71. Hyperparameter optimization • Although the terms parameters and hyperparameters are occasionally used interchangeably, distinctions exist between them. Parameters are properties that the algorithm is learning during training. For linear regression, those parameters are the weights and biases. For random forests, they're the variables and thresholds at each node. • Hyperparameters are properties that must be set before training. For k-means clustering, you must define the value of k. For neural networks, an example is the learning rate. Hyperparameter optimization is the process of finding the best possible values for the hyperparameters to optimize your performance metric, such as highest accuracy and lowest RMSE (root-mean-square error). You train a model for different combinations of values and evaluate which models find the best solution. Methods to search for the best combinations include grid search, random search, coarse-to-fine, and bayesian optimization.
  • 72. Ensemble learning • Ensembles combine several machine learning models, each finding different patterns within the data to provide a more accurate solution. These techniques can improve performance, as they capture more trends. They can also reduce overfitting, as the final prediction is a consensus from many models.
  • 73. Ensemble learning Bagging • Bagging, or bootstrap aggregations, is the method of building models in parallel and averaging their prediction as the final prediction. These models can be built with the same algorithm. For example, the random forest algorithm builds many decision trees. You can also build different types of models, such as a linear regression model and a support vector machine (SVM) model. Boosting • Boosting builds models sequentially by evaluating the success of earlier models. The next model prioritizes learning trends for predicting examples that the current models perform poorly on. Three common techniques are AdaBoost, Gradient Boosting, and XGBoosted. Stacking • Stacking involves building models and using their output as features into a final model. For example, your goal is to create a classifier and you build a KNN model and a Naïve Bayes model. Rather than choose between the two, you can pass their two predictions into a final logistic regression model. This final model might result in better results than the two intermediate models.
  • 74. Book Reference 74 S. No. Title of Book Authors Publisher Reference Books T1 Pattern Recognition and Machine Learning Christopher M. Bishop Springer T2 Introduction to Machine Learning (Adaptive Computation and Machine Learning), Ethem Alpaydın MIT Press Text Books R1 Machine Learning Mitchell. T, McGraw Hill