4. Pattern Recognition
ā¢ Pattern recognition is the automated recognition of patterns and
regularities in data.
ā¢ It has applications in statistical data analysis, signal processing, image
analysis, information retrieval, bioinformatics, data
compression, computer graphics and machine learning.
ā¢ Pattern recognition has its origins in statistics and engineering; some
modern approaches to pattern recognition include the use of machine
learning, due to the increased availability of big data and a new
abundance of processing power.
ā¢ However, these activities can be viewed as two facets of the same field of
application, and together they have undergone substantial development
over the past few decades.
ā¢ The field of pattern recognition is concerned with the automatic discovery
of regularities in data through the use of computer algorithms and with
the use of these regularities to take actions such as classifying the data
into different categories.[1]
5. Design principles of pattern
recognition system
ā¢ Pattern Recognition System
Pattern is everything around in this digital world.
ā¢ A pattern can either be seen physically or it can
be observed mathematically by applying
algorithms.
ā¢ In Pattern Recognition, pattern is comprises of
the following two fundamental things:
ā¢ Collection of observations
ā¢ The concept behind the observation
6.
7. Contā¦
ā¢ Design Principles of Pattern Recognition
In pattern recognition system, for recognizing the pattern or structure two
basic approaches are used which can be implemented in diferrent
techniques. These are ā
ā Statistical Approach and
ā Structural Approach
ā¢ Statistical Approach:
Statistical methods are mathematical formulas, models, and techniques
that are used in the statistical analysis of raw research data.
ā¢ The application of statistical methods extracts information from research
data and provides different ways to assess the robustness of research
outputs.
ā Two main statistical methods are used :Descriptive Statistics: It summarizes
data from a sample using indexes such as the mean or standard deviation.
ā Inferential Statistics: It draw conclusions from data that are subject to random
variation.
8. Statistical Pattern recognition
ā¢ Structural Approach:
The Structural Approach is a technique wherein
the learner masters the pattern of sentence.
Structures are the different arrangements of
words in one accepted style or the other.
ā Types of structures:Sentence Patterns
ā Phrase Patterns
ā Formulas
ā Idioms
10. Parameter estimation methods
ā¢ The term parameter estimation refers to the process of
using sample data (in reliability engineering, usually
times-to-failure or success data) to estimate the
parameters of the selected distribution.
ā¢ Several parameter estimation methods are available.
ā¢ More specifically, we start with the relatively simple
method of Probability Plotting and continue with the
more sophisticated methods of Rank Regression (or
Least Squares), Maximum Likelihood Estimation and
Bayesian Estimation Methods.
11. Probability Plotting
ā¢ The least mathematically intensive method for parameter estimation is
the method of probability plotting.
ā¢ As the term implies, probability plotting involves a physical plot of the
data on specially constructed probability plotting paper. This method is
easily implemented by hand, given that one can obtain the appropriate
probability plotting paper.
ā¢ The method of probability plotting takes the cdf of the distribution and
attempts to linearize it by employing a specially constructed paper. The
following sections illustrate the steps in this method using the 2-
parameter Weibull distribution as an example. This includes:
ā¢ Linearize the unreliability function
ā¢ Construct the probability plotting paper
ā¢ Determine the X and Y positions of the plot points
ā¢ And then using the plot to read any particular time or
reliability/unreliability value of interest.
12. Methods of Parameter Estimation
ā¢ The techniques used for parameter estimation are called estimators.
ā¢ Some estimators are:
ā¢ Probability Plotting: A method of finding parameter values where the data
is plotted on special plotting paper and parameters are derived from the
visual plot
ā¢ Rank Regression (Least Squares): A method of finding parameter values
that minimizes the sum of the squares of the residuals.
ā¢ Maximum Likelihood Estimation: A method of finding parameter values
that, given a set of observations, will maximize the likelihood function.
ā¢ Bayesian Estimation Methods: A family of estimation methods that tries
to minimize the posterior expectation of what is called the utility function.
In practice, what this means is that existing knowledge about a situation is
formulated, data is gathered, and then posterior knowledge is used to
update our beliefs.
13. Principle Component Analysis(PCA)
ā¢ Principal component analysis (PCA) is a technique used to
emphasize variation and bring out strong patterns in a dataset.
ā¢ It's often used to make data easy to explore and visualize.
ā¢ 2D example
ā¢ First, consider a dataset in only two dimensions, like (height,
weight).
ā¢ This dataset can be plotted as points in a plane.
ā¢ But if we want to tease out variation, PCA finds a new coordinate
system in which every point has a new (x,y) value.
ā¢ The axes don't actually mean anything physical; they're
combinations of height and weight called "principal components"
that are chosen to give one axes lots of variation.
14. Contā¦
ā¢ 3D example
ā¢ With three dimensions, PCA is more useful, because
it's hard to see through a cloud of data.
ā¢ In the example below, the original data are plotted in
3D, but you can project the data into 2D through a
transformation no different than finding a camera
angle: rotate the axes to find the best angle.
ā¢ To see the "official" PCA transformation, click the
"Show PCA" button.
ā¢ The PCA transformation ensures that the horizontal
axis PC1 has the most variation, the vertical axis PC2
the second-most, and a third axis PC3 the least.
15. Linear Discriminant Analysis(LDA)
ā¢ Linear discriminant analysis (LDA), normal discriminant analysis (NDA),
or discriminant function analysis is a generalization of Fisher's linear discriminant,
a method used in statistics and other fields, to find a linear combination of
features that characterizes or separates two or more classes of objects or events.
ā¢ The resulting combination may be used as a linear classifier, or, more commonly,
for dimensionality reduction before later classification.
ā¢ LDA is closely related to analysis of variance (ANOVA) and regression analysis,
which also attempt to express one dependent variable as a linear combination of
other features or measurements.
ā¢ However, ANOVA uses categorical independent variables and
a continuous dependent variable, whereas discriminant analysis has
continuous independent variables and a categorical dependent variable (i.e. the
class label).
ā¢ Logistic regression and probit regression are more similar to LDA than ANOVA is, as
they also explain a categorical variable by the values of continuous independent
variables. These other methods are preferable in applications where it is not
reasonable to assume that the independent variables are normally distributed,
which is a fundamental assumption of the LDA method.
16. Contā¦
ā¢ LDA is also closely related to principal component analysis (PCA)
and factor analysis in that they both look for linear combinations of
variables which best explain the data.
ā¢ LDA explicitly attempts to model the difference between the classes of
data.
ā¢ PCA, in contrast, does not take into account any difference in class, and
factor analysis builds the feature combinations based on differences
rather than similarities.
ā¢ Discriminant analysis is also different from factor analysis in that it is not
an interdependence technique: a distinction between independent
variables and dependent variables (also called criterion variables) must be
made.
ā¢ LDA works when the measurements made on independent variables for
each observation are continuous quantities.
ā¢ When dealing with categorical independent variables, the equivalent
technique is discriminant correspondence analysis.
17. Classification Techniques
ā¢ Various types of classification algorithms:
ā¢ Logistic Regression
ā¢ Naive Bayes Classifier
ā¢ K-Nearest Neighbors
ā¢ Decision Tree
ā Random Forest
ā¢ Support Vector Machines
18. Contā¦
ā¢ Logistic Regression
ā¢ Logistic regression is a calculation used to predict
a binary outcome: either something happens, or
does not. This can be exhibited as Yes/No,
Pass/Fail, Alive/Dead, etc.
ā¢ Naive Bayes Classifier
ā¢ Naive Bayes calculates the possibility of whether
a data point belongs within a certain category or
does not. In text analysis, it can be used to
categorize words or phrases as belonging to a
preset ātagā (classification) or not.
19. Contā¦.
ā¢ K-nearest Neighbors
ā¢ K-nearest neighbors (k-NN) is a pattern
recognition algorithm that uses training datasets
to find the k closest relatives in future examples.
ā¢ When k-NN is used in classification, you calculate
to place data within the category of its nearest
neighbor. If k = 1, then it would be placed in the
class nearest 1. K is classified by a plurality poll of
its neighbors.
20. Contā¦.
ā¢ Decision Tree
ā¢ A decision tree is a supervised learning algorithm
that is perfect for classification problems, as itās
able to order classes on a precise level.
ā¢ It works like a flow chart, separating data points
into two similar categories at a time from the
ātree trunkā to ābranches,ā to āleaves,ā where the
categories become more finitely similar.
ā¢ This creates categories within categories,
allowing for organic classification with limited
human supervision.
21.
22. Random Forest
ā¢ The random forest algorithm is an expansion of
decision tree, in that, you first construct some-
axis real-world decision trees with training data,
then fit your new data within one of the trees as
a ārandom forest.ā
ā¢ Support Vector Machines
ā¢ A support vector machine (SVM) uses algorithms
to train and classify data within degrees of
polarity, taking it to a degree
beyond X/Y prediction.
23. Nearest Neighbor(NN) Rule
ā¢ K-Nearest Neighbors is one of the most basic yet
essential classification algorithms in Machine Learning.
ā¢ It belongs to the supervised learning domain and finds
intense application in pattern recognition, data mining
and intrusion detection.
ā¢ It is widely disposable in real-life scenarios since it is
non-parametric, meaning, it does not make any
underlying assumptions about the distribution of data
(as opposed to other algorithms such as GMM, which
assume a Gaussian distribution of the given data).
24. Contā¦
ā¢ In statistics, the k-nearest neighbors algorithm (k-NN) is a non-
parametric machine learning method first developed by Evelyn
Fix and Joseph Hodges in 1951,and later expanded by Thomas
Cover.
ā¢ It is used for classification and regression.
ā¢ In both cases, the input consists of the k closest training examples
in feature space. The output depends on whether k-NN is used for
classification or regression:
ā¢ In k-NN classification, the output is a class membership. An object is
classified by a plurality vote of its neighbors, with the object being
assigned to the class most common among its k nearest neighbors
(k is a positive integer, typically small). If k = 1, then the object is
simply assigned to the class of that single nearest neighbor.
ā¢ In k-NN regression, the output is the property value for the object.
This value is the average of the values of k nearest neighbors.
25. Bayes Classifier
ā¢ A Naive Bayes classifier is a probabilistic machine
learning model thatās used for classification task.
ā¢ The crux of the classifier is based on the Bayes
theorem.
ā¢ Bayes Theorem:
ā¢ Using Bayes theorem, we can find the probability
of A happening, given that B has occurred. Here, B is
the evidence and A is the hypothesis. The assumption
made here is that the predictors/features are
independent. That is presence of one particular feature
does not affect the other. Hence it is called naive.
26. Types of Naive Bayes Classifier:
ā¢ Multinomial Naive Bayes:
ā¢ This is mostly used for document classification problem, i.e whether
a document belongs to the category of sports, politics, technology
etc. The features/predictors used by the classifier are the frequency
of the words present in the document.
ā¢ Bernoulli Naive Bayes:
ā¢ This is similar to the multinomial naive bayes but the predictors are
boolean variables. The parameters that we use to predict the class
variable take up only values yes or no, for example if a word occurs
in the text or not.
ā¢ Gaussian Naive Bayes:
ā¢ When the predictors take up a continuous value and are not
discrete, we assume that these values are sampled from a gaussian
distribution.
27. Support Vector Machine(SVM)
ā¢ In machine learning, support-vector machines (SVMs, also support-
vector networks) are supervised learning models with associated
learning algorithms that analyze data
for classification and regression analysis.
ā¢ Developed at AT&T Bell Laboratories by Vapnik with colleagues
(Boser et al., 1992, Guyon et al., 1993, Vapnik et al., 1997), SVMs
are one of the most robust prediction methods, being based on
statistical learning frameworks or VC theory proposed by Vapnik
and Chervonenkis (1974) and Vapnik (1982, 1995).
ā¢ Given a set of training examples, each marked as belonging to one
of two categories, an SVM training algorithm builds a model that
assigns new examples to one category or the other, making it a
non-probabilistic binary linear classifier (although methods such
as Platt scaling exist to use SVM in a probabilistic classification
setting).
28. Contā¦
ā¢ An SVM maps training examples to points in space so as to
maximise the width of the gap between the two categories.
ā¢ Support vector machines (SVMs) are powerful yet flexible
supervised machine learning algorithms which are used
both for classification and regression.
ā¢ But generally, they are used in classification problems.
ā¢ In 1960s, SVMs were first introduced but later they got
refined in 1990.
ā¢ SVMs have their unique way of implementation as
compared to other machine learning algorithms.
ā¢ Lately, they are extremely popular because of their ability
to handle multiple continuous and categorical variables.
29. Working of SVM
ā¢ An SVM model is basically a representation of
different classes in a hyperplane in
multidimensional space.
ā¢ The hyperplane will be generated in an
iterative manner by SVM so that the error can
be minimized.
ā¢ The goal of SVM is to divide the datasets into
classes to find a maximum marginal
hyperplane (MMH).
30. The followings are important concepts
in SVM ā
ā¢ Support Vectors ā Datapoints that are closest to the
hyperplane is called support vectors. Separating line
will be defined with the help of these data points.
ā¢ Hyperplane ā As we can see in the above diagram, it is
a decision plane or space which is divided between a
set of objects having different classes.
ā¢ Margin ā It may be defined as the gap between two
lines on the closet data points of different classes. It
can be calculated as the perpendicular distance from
the line to the support vectors. Large margin is
considered as a good margin and small margin is
considered as a bad margin.
31.
32. K-means clustering
ā¢ We are given a data set of items, with certain
features, and values for these features (like a
vector).
ā¢ The task is to categorize those items into
groups.
ā¢ To achieve this, we will use the kMeans
algorithm; an unsupervised learning
algorithm.
33. Contā¦.
ā¢ (It will help if you think of items as points in an n-
dimensional space). The algorithm will categorize the
items into k groups of similarity. To calculate that
similarity, we will use the euclidean distance as
measurement.
ā¢ The algorithm works as follows:
ā¢ First we initialize k points, called means, randomly.
ā¢ We categorize each item to its closest mean and we
update the meanās coordinates, which are the averages
of the items categorized in that mean so far.
ā¢ We repeat the process for a given number of iterations
and at the end, we have our clusters.
34. The above algorithm in pseudocode:
ā¢ Initialize k means with random values
ā¢ For a given number of iterations:
ā¢ Iterate through items:
ā¢ Find the mean closest to the item
ā¢ Assign item to mean
ā¢ Update mean