Linear Discriminant Analysis (LDA)
• Linear Discriminant Analysis (LDA) is one of the
commonly used dimensionality reduction techniques in
machine learning to solve more than two-class classification
problems.
• Linear Discriminant analysis is one of the most popular
dimensionality reduction techniques used for supervised
classification problems in machine learning.
• To separate two or more classes having multiple features
efficiently, the Linear Discriminant Analysis model is
considered
• LDA can also be used in data pre-processing to reduce the
number of features, just as PCA, which reduces the
computing cost significantly.
• LDA is also used in face detection algorithms, LDA is used
to extract useful data from different faces. Coupled with
eigenfaces, it produces effective results.
• LDA is used to minimize the number of features to a
manageable number before going through the classification
process.
• Face recognition is the popular application of computer
vision, where each face is represented as the combination of
a number of pixel values.
• LDA has a great application in classifying the patient
disease on the basis of various parameters of patient health
and the medical treatment
• Classifies disease as mild, moderate or severe.
• This classification helps the doctors in either increasing or
decreasing the pace of the treatment.
• LDA can also be used for making predictions and so in
decision making.
• For example, "will you buy this product” will give a
predicted result of either one or two possible classes as a
buying or not.
Differences between LDA and PCA
LDA
• Supervised
• Good for classification
• Well-suited for classification tasks where you want to maximize
class separability.
• used for classification tasks
• Effective at capturing the most important information in the data
when class labels are available.
PCA
• Unsupervised
• Training faster
• Suits for linear relationships between features
• used for exploratory data analysis
• faster and more computationally efficient than LDA
Unit-5
Clustering analysis
Contents
• Introduction to clustering
• Similarity and dissimilarity measures
• Clustering techniques- partitioning algorithm,
• Hierarchical algorithm,
• Density based algorithm
Clustering in Machine Learning
• The word cluster is derived from an old English word, ‘clyster, ‘
meaning a bunch.
• Cluster is a group of similar things or people positioned or
occurring closely together.
• All points in a cluster depict similar characteristics
• Machine learning could be used to identify traits and
segregate these clusters.
• Clustering or cluster analysis is a machine learning technique,
which groups the unlabeled dataset
• A way of grouping the data points into different clusters,
consisting of similar data points
• The objects with the possible similarities remain in a group
that has less or no similarities with another group
• It is an unsupervised learning method, hence no supervision is
provided to the algorithm, and it deals with the unlabeled
dataset.
Partitioning Clustering
• It is a type of clustering that divides the data into non-
hierarchical groups.
• It is also known as the centroid-based method.
• The most common example of partitioning clustering is the K-
Means Clustering algorithm.
Density-Based Clustering
• The density-based clustering method connects the highly-
dense areas into clusters and the arbitrarily shaped
distributions are formed as long as the dense region can be
connected.
• Identifies different clusters in the dataset and connects the
areas of high densities into clusters.
• The dense areas in data space are divided from each other by
sparser areas.
• These algorithms can face difficulty in clustering the data
points if the dataset has varying densities and high
dimensions.
Hierarchical Clustering
• Hierarchical clustering can be used as an alternative for the
partitioned clustering as there is no requirement of pre-
specifying the number of clusters to be created.
• The dataset is divided into clusters to create a tree-like
structure, which is also called a dendrogram.
• The observations or any number of clusters can be selected by
cutting the tree at the correct level.
• The most common example of this method is
the Agglomerative Hierarchical algorithm.
• Hierarchical Clustering is categorized into divisive and
agglomerative clustering.
• Divisive Clustering, or the top-down approach, groups all the
data points in a single cluster.
• Agglomerative Clustering, or the bottom-up approach, assigns
each data point as a cluster and aggregates the most similar
clusters.
• Divisive Clustering is more accurate.
Applications of Clustering
• In Identification of Cancer Cells: The clustering algorithms are
widely used for the identification of cancerous cells.
• It divides the cancerous and non-cancerous data sets into
different groups.
• Customer Segmentation: It is used in market research to
segment the customers based on their choice and
preferences.
• In Land Use: The clustering technique is used in identifying
the area of similar lands use in the GIS database. This can be
very useful to find that for what purpose the particular land
should be used, that means for which purpose it is more
suitable.
• Hashtags on social media also use clustering techniques to
classify all posts with the same hashtag under one stream
• Classifying different species of plants and animals with the
help of image recognition techniques
• It helps in deriving plant and animal taxonomies and classifies
genes with similar functionalities to gain insight into
structures inherent to populations.
• It is applicable in city planning to identify groups of houses
and other facilities according to their type, value, and
geographic coordinates.

computational statistics machine learning unit 5.pptx

  • 1.
    Linear Discriminant Analysis(LDA) • Linear Discriminant Analysis (LDA) is one of the commonly used dimensionality reduction techniques in machine learning to solve more than two-class classification problems. • Linear Discriminant analysis is one of the most popular dimensionality reduction techniques used for supervised classification problems in machine learning. • To separate two or more classes having multiple features efficiently, the Linear Discriminant Analysis model is considered
  • 2.
    • LDA canalso be used in data pre-processing to reduce the number of features, just as PCA, which reduces the computing cost significantly. • LDA is also used in face detection algorithms, LDA is used to extract useful data from different faces. Coupled with eigenfaces, it produces effective results. • LDA is used to minimize the number of features to a manageable number before going through the classification process. • Face recognition is the popular application of computer vision, where each face is represented as the combination of a number of pixel values.
  • 3.
    • LDA hasa great application in classifying the patient disease on the basis of various parameters of patient health and the medical treatment • Classifies disease as mild, moderate or severe. • This classification helps the doctors in either increasing or decreasing the pace of the treatment. • LDA can also be used for making predictions and so in decision making. • For example, "will you buy this product” will give a predicted result of either one or two possible classes as a buying or not.
  • 4.
    Differences between LDAand PCA LDA • Supervised • Good for classification • Well-suited for classification tasks where you want to maximize class separability. • used for classification tasks • Effective at capturing the most important information in the data when class labels are available. PCA • Unsupervised • Training faster • Suits for linear relationships between features • used for exploratory data analysis • faster and more computationally efficient than LDA
  • 5.
  • 6.
    Contents • Introduction toclustering • Similarity and dissimilarity measures • Clustering techniques- partitioning algorithm, • Hierarchical algorithm, • Density based algorithm
  • 7.
    Clustering in MachineLearning • The word cluster is derived from an old English word, ‘clyster, ‘ meaning a bunch. • Cluster is a group of similar things or people positioned or occurring closely together. • All points in a cluster depict similar characteristics • Machine learning could be used to identify traits and segregate these clusters.
  • 8.
    • Clustering orcluster analysis is a machine learning technique, which groups the unlabeled dataset • A way of grouping the data points into different clusters, consisting of similar data points • The objects with the possible similarities remain in a group that has less or no similarities with another group • It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the unlabeled dataset.
  • 10.
    Partitioning Clustering • Itis a type of clustering that divides the data into non- hierarchical groups. • It is also known as the centroid-based method. • The most common example of partitioning clustering is the K- Means Clustering algorithm.
  • 11.
    Density-Based Clustering • Thedensity-based clustering method connects the highly- dense areas into clusters and the arbitrarily shaped distributions are formed as long as the dense region can be connected. • Identifies different clusters in the dataset and connects the areas of high densities into clusters. • The dense areas in data space are divided from each other by sparser areas. • These algorithms can face difficulty in clustering the data points if the dataset has varying densities and high dimensions.
  • 13.
    Hierarchical Clustering • Hierarchicalclustering can be used as an alternative for the partitioned clustering as there is no requirement of pre- specifying the number of clusters to be created. • The dataset is divided into clusters to create a tree-like structure, which is also called a dendrogram. • The observations or any number of clusters can be selected by cutting the tree at the correct level. • The most common example of this method is the Agglomerative Hierarchical algorithm.
  • 14.
    • Hierarchical Clusteringis categorized into divisive and agglomerative clustering. • Divisive Clustering, or the top-down approach, groups all the data points in a single cluster. • Agglomerative Clustering, or the bottom-up approach, assigns each data point as a cluster and aggregates the most similar clusters. • Divisive Clustering is more accurate.
  • 16.
    Applications of Clustering •In Identification of Cancer Cells: The clustering algorithms are widely used for the identification of cancerous cells. • It divides the cancerous and non-cancerous data sets into different groups. • Customer Segmentation: It is used in market research to segment the customers based on their choice and preferences. • In Land Use: The clustering technique is used in identifying the area of similar lands use in the GIS database. This can be very useful to find that for what purpose the particular land should be used, that means for which purpose it is more suitable.
  • 17.
    • Hashtags onsocial media also use clustering techniques to classify all posts with the same hashtag under one stream • Classifying different species of plants and animals with the help of image recognition techniques • It helps in deriving plant and animal taxonomies and classifies genes with similar functionalities to gain insight into structures inherent to populations. • It is applicable in city planning to identify groups of houses and other facilities according to their type, value, and geographic coordinates.