SlideShare a Scribd company logo
1 of 88
Paul Bharath Bhushan Petlu
Computer Applications Department,
Chadalawada Ramanamma Engineering College,
Tirupati, Andhra Pradesh, India.
A duck seems to be pleasant on the surface of the pond;
But, there is a restless pedaling under the water.
 Human beings dreamt of creating machines with
human-like traits
 Robots in manufacturing, mining, agriculture,
space, ocean exploration, and health sciences etc
 These machines are enslaved by commands
create intelligent machines that emulate human
intelligence
 Human Intelligence possesses robust attributes
with complex sensory, control, affective
(emotional), and cognitive (thinking processes)
 Central Nervous System: over one hundred billion
biological neurons
 CNS – acquires information from natural sensory
organs
 Cognitive Mathematics?
 Neural networks: a low-level cognitive machine – a
thinking machine
 Fuzzy logic: mathematical power for the emulation
of the higher-order cognitive functions – the
thought and perception process
 Neural networks + Fuzzy logic
 Needs, Motivations, and Rationale:
 Information is power and a must for success
 The collected information may be categorized on
the basis of nature of experience:
1. Experimental data
2. Structured human knowledge expressed in linguistic
form
 Banks, hospitals, automobiles, observatories
around the world, etc
 Soft Computing / Machine Learning:
 Conventional approach Human intelligence to
solving decision problems
 The present scene much different from
yesterday. We now have ocean of data to be
processed. Humans are unable to extract useful
information from them. Computers of today can
store this data and analyze it. However, to lead to
meaningful analysis, a new mathematical theory
has emerged which is built on the foundation of
human facilities of learning, memorizing, adapting
and generalizing
Soft Computing / Machine Learning:
The basic premises of machine learning are:
 The real world is pervasively imprecise and
uncertain
 The precision and certainty carry a cost
The guiding principle of machine learning, which
follows from these premises, is as follows:
Exploit tolerance for imprecision and uncertainty to
achieve tractability, robustness, and low-cost solutions
 There are 3 identified features to have a well
defined learning problems:
1. 1) The learning task
2. 2) The measure of performance
3. 3) The task experience
 Important aspects of ‘learning from experience’
behavior of humans and other animals embedded
in machine learning are:
 1) Remembering and Adapting
 2) Generalizing
 Machine Computer Program
 Learning Machine
 Learning Algorithms Computer Program Design
 Learned Knowledge
A block diagrammatic representation of a Learning
Machine:
 Google is by far the most popular and extensively
used of all search engines.
 The moment we start browsing for items on
Amazon, we see recommendations for products,
books, movies, music, etc., we probably are
interested in.
 Amazon used recommender system designed by
machine learning, based on the data generated
from social networking sites.
 Some application domains are:
 Medical Diagnostics
 Finance Domain
 Stock Market Forecasting
 Machine Vision
 Speech Recognition
 Text Mining
 Robotics and Automation
 Etc..
 Medical Diagnostics: Major success of deep learning
for machine vision applications have made it
possible to accurately analyze medical images – X-
rays, MRI, CT scan, ultrasound images, ECG etc.
Machine Learning augmented with deep learning
diagnoses promises to revolutionize healthcare.
 Finance Domain: The applications of Machine
Learning in finance domain helps banks offer
personalized services to customers at lower cost,
and better compliance. This helps banks to
generate higher revenue. Machine Learning can
scan through large amounts of transactional data in
seconds, and identify if there is any fraudulent
behavior and predict it.
 Stock Market Forecasting: Readers aware of the stock
market know that the seamless buying and selling
of company stocks data is in the form of time
series. It is sequential data wherein data at a time t
depends on the past history at t-1, t-2, … Stock
index is an average value that is calculated by
combining various stocks and its prediction
represents the market’s movements over time.
 Machine Vision: A machine vision system captures
images through a camera and analyzes these to
describe the images. A level of visual
understanding and recognition that humans
exhibit, cannot be matched by machine vision
algorithms. However, certain problems, such as
biometric recognition – finger prints identification,
face recognition, etc are being handled with success
 Speech Recognition: Using signal processing techniques
we can represent the speech signal by a set of real
values. The resulting data is sequential in nature
and with deep learning we get higher levels of
performance for speech recognition systems.
Virutal Personal Assistants (Amazon Echo, Google
Home) assist in finding information when asked
over voice.
 For example: “what are the flights from Delhi to
Chennai?”
 Text Mining: Text mining is an area that is
concerned with the identification of patterns in text
data. The procedure involves analysis of text for
extraction of useful information for specific
purposes – email spam detection etc.
 Robotics and Automation: Computers are controlling and
monitoring manufacturing processes with high degree
of automation, facilitated by machine learning
techniques and robots – for industrial automation,
medical robots, military robots, robots employed in
disaster areas, and so on. Machine Vision is an integral
part of many robotic applications, for example, images
have to be analyzed online and a machine learning
system has to categorize the objects into ‘defect’ and
‘non-defect’ category and then the robot can put the
objects in the right place.
 More on Application Domains: Machine Learning /
Data Mining is omnipresent and is an empirical
technology that has applications in all knowledge
domains: engineering, business management,
natural science, social science.
 Data Representation: Structured / Unstructured Data
 Experience in the form of raw data is a source of
learning in many applications and human knowledge
in linguistic form is an additional learning source.
 Data warehousing provides integrated, consistent and
cleaned data to the machine learning algorithms and
also from the availability of data in a flat file, which is a
simple data table.
 Logical structure of a database is established by data
modelling. A data model determines how data is
stored, organized, and then manipulated in the
database.
 Structured Data: It is the data that adheres to a
predefined data model. It can be stored in a
relational database. It conforms to a tabular format
with relationship between different rows and
columns. Data can easily be aggregated from
various locations in the database. This data model
is the simplest way to manage information and the
techniques to analyze structured data
 Unstructured Data: It is the information that either
doesn’t have a data model or is not organized in a
predefined manner. It often includes text and
multimedia data, for example, social media data
generated from YouTube, Facebook, Twitter,
Instagram, LinkedIn, etc, text internal to the
company such as documents, logs, survey results,
emails, images and videos, audios. In case where
this kind of data has internal structure, the data
still considered ‘unstructured’ because it doesn’t fit
neatly in a relational database.
 Semi structured Data: It is the information that
doesn’t conform to a formal structure of data
models associated with relational databases, but
that does have some organizational properties that
make it easier to analyze. With some processing,
we can transform them into a format that machines
accept for various prediction tasks.
 Unlocking the information power of Unstructured data:
About 80% of the total data being collected and
stored today is unstructured. Therefore, unlocking
the information power of such data is very
important. Examples of non-relational databases
include Apache Cassandra, MongoDB,
Hadoop/MapReduce, Spark, among others. A
number of software solutions are being designed to
search unstructured data and extract important
information.
 Forms of Learning:
 Any method that incorporates information from
experience in the design of a machine employs
learning. A learning method depends on the type of
experience from which the machine will learn or
trained. The type of available learning experience
can have a significant impact on the success or
failure of the learning machine.
 Forms of Learning:
 The field of machine learning usually
distinguished four forms of learning:
 1) Supervised Learning
 2) Unsupervised Learning
 3) Reinforcement Learning
 4) Learning based on natural processes – evolution,
swarming, and immune systems
 1) Predictive / Directed / Supervised Learning:
 In general xi is a D-dimensional vector of number,
say, height and weight of a person. These are called
features, attributes or covariates.
 Input xi could be a complex structured object, such
as an image, a sentence, an email message, a time
series, a molecular shape, a graph, etc
 similarly, the form of output or response variable
can in principle be anything, but most methods
assume that yi is a categorical or nominal variable
from some finite set, yi ϵ {1, 2, …., C)
 Binary Classification: C = 2
 (a) Some labeled training examples of colored shapes, along with 3
unlabeled test cases. (b) Representing the training data as an N x D
design matrix. Row i represents the feature vector xi. The last column is
the label, yi ϵ {0, 1}. Based on a figure by Leslie Kaelbling
 Classification of flowers:
 Three types of Iris flowers: Setosa, Versicolor and Virginica. Source:
http://www.statlab.uni-heidelberg.de/data/iris/.
Image Classification:
 We might want to classify the image as a whole, e.g., is it an
indoors or outdoors scene? Is it a horizontal or vertical
photo? Does it contain a dog or not? This is called image
classification.
Handwriting recognition:
 In the special case that the images consist of isolated
handwritten letters and digits, for example, in a postal or
ZIP code on a letter, we can use classification to perform
handwriting recognition.
 Face detection and Recognition:
 Example of face detection. (a) Input image (b) Output of classifier,
which detected 5 faces at different poses.
 Regression:
 (a) Linear Regression on some 1d data. (b) Same data with polynomial
regression (degree 2). Figure generated by linregpolyVsDegree
Some of the examples of real-world regression problems are:
•Predict tomorrow’s stock market price given current market
conditions and other possible side information.
•Predict the age of a viewer watching a given video on
YouTube.
•Predict the location in 3d space of a robot armend effector,
given control signals (torques) sent to its various motors.
•Predict the amount of prostate specific antigen (PSA) in the
body as a function of a number of different clinical
measurements.
•Predict the temperature at any location inside a building
using weather data, time, door sensors, etc.
 2) Descriptive / Undirected / Unsupervised Learning:
 Here we are only given inputs, D = {xi}i=1 to N,
and the goal is to find “interesting patterns” in the
data. This is a much less well-defined problem,
since we are not told what kinds of patterns to look
for, and there is no obvious error metric to use.
 3) Reinforcement Learning:
 This is somewhat less commonly used. This is
useful for learning how to act or behave when
given occasional reward or punishment signals.
(For example, consider how a baby learns to walk)
 4) Learning based on Natural Processes: Evolution,
Swarming, and Immune Systems
 Some learning approaches take inspiration from
nature for the development of novel problem-
solving techniques. These are applied successfully
to a variety of optimization problems.
 Evolutionary Computation
 Evolutionary biology essentially states that a
population of individuals possessing the ability to
reproduce and exposed to genetic variation
followed by selection gives rise to new
populations which are fitter to their environment.
The primary streams are: genetic algorithms,
evolution strategies, evolutionary programming
and genetic programming
 Swarm Intelligence
 It is a feature of systems of unintelligent agents
with inadequate individual abilities, displaying
collectively intelligent behaviour. It includes
algorithms derived from the collective behaviour of
social insects (Ant Colony Optimization) and other
animal / human societies (Particle Swarm
Optimization)
 Artificial Immune Systems
 An Artificial Immune System (AIS) replicates
certain aspects of the natural immune system,
which is primarily applied to solve pattern-
recognition problems and cluster data. The natural
immune system has an extraordinary ability to
match patterns, employed to differentiate between
foreign cells making an entry into the body
(antigen) and the cells that are part of the body.
 Machine Learning and Data Mining
 Machine Learning:
 Early AI research was focused on hard coding, the
rules that mimic human intelligence. Machine
Learning, a subfield of AI, still involves classifical
programming, human intelligence is required to
convert raw data to representations used by machine
for learning. Deep learning, a subfield of Machine
Learning, is a form of ‘representation learning’,
inspired by biological nervous system. Machine
Learning is the computation process wherein a
machine ‘learn’ and adjusts its behaviour based on
feedback from data.
 Machine Learning and Data Mining
 Data Mining:
 DM focuses concerns on real-world application,
concentrated on commercial applications and
business-related problems of data analysis tends to
drift in the direction of data mining.
 Both ML and DM are related to each other sharing
methods and algorithms pertaining to the analysis
of the data to seek informative patterns.
 Data Science
 Data Science is a new name given to an action plan
for expanding the technical areas of the field of
statistics.
 Data Science is the extraction of knowledge from
data.
 The task ‘knowledge extraction’ does not have any
boundaries.
 Relationship among key technologies
UNIT – II
 Learning from Observations
 We can visualize each pattern with n numerical
features as a point in n-dimensional state space Rn :
 x = [x1 x2 . . . . xn]T ϵ Rn
 The training experience is in the form of data D that
describes how the system behaves over its entire range
of operation.
 D : {x(i), y(i)}; i = 1, 2, . . . . , N(2.1)
 data D is independently drawn and identically
distributed (iid) represented by probability density
function p(x, y)
 Learning from Observations
 Assume a machine defined by a function f: X  Y
 When f(.) is selected, the machine is called a trained
machine that gives estimated output value
for a given pattern x.
 We can define the set of learning machines by a
function f(x, w), where w contains adjustable
parameters.
 Loss function is L(y, f(x, w))
 Empirical Risk Minimization
 Our problem is to find a decision function f(x, w)
against p(x, y) that minimizes the risk function
R(w).
 With dataset, D : {x(i), y(i)}; i = 1, 2, . . . . , N, being
the only source of information, the empirical risk
function given by:
 This empirical risk function replaces average over
p(x, y) by an average over the training sample.
 Inductive Learning
 Given a collection of examples (x(i) f(x(i)); i = 1,
2, …., N, of a function f(x), return a function h(x)
that approximates f(x).
 The assumption in inductive learning is that the
ideal hypothesis related to unseen patterns is the
one induced by the observed training data.
 Bias and Variance
 Consider the following experiment. We first collect a
random sample D of N independently drawn patterns
from the distribution p(x, y), and then measure the
sample error / training error / approximation error from:
 using loss function for classification problems:

 using loss function for regression problems:
 L(y, f(x, w)) = ½ (y – f(x, w))2
BIAS AND VARIANCE LINEAR CURVE FITTING
 If we run K such
experiments, measuring
the random variable
errorDj[h]; j = 1, 2, ...., K
then the average over
the K experiments is:
 errorD[h] = ED{ errorDj[h]}
 where ED{.} denotes the
expectation or ensemble
average.
 Bias and Variance
 A non-zero error can arise for two reasons:
 1) It may be that the hypothesis function h(.) is, on
an average, different from the regression function
f(x). This is called bias.
 2) It may be that the hypothesis function is very
sensitive to the particular dataset Dj, so that for a
given x, approximation error is larger for some
datasets, and smaller for other datasets. This is
called variance.
 Occam’s razor principle
 The Franciscan Monk, William of Occam, was born
in 1280. His principle:
 ‘The simpler explanations are more reasonable, and
any unnecessary complexity should be shaved off’.
 ‘Simpler’ may imply needing lesser parameters,
lesser training time, fewer attributes for data
representation, and lesser computational
complexity.
 Overfitting avoidance
 Occam’s razor principle suggests hypothesis
functions that avoid overfitting of the training data.
We stop looking for a design when the solution is
‘good enough’, not necessarily the optimal one.
 In the machine learning jargon, a learning machine
is said to overfit the training examples if certain
other learning machine that fits the training
examples less well, actually performs better over
the total distribution of patterns.
 Overfitting avoidance
 Heuristic Search in Inductive Learning
 Trial-and-error is the approach of searching for a ‘good
enough’ solution.
 Applied Machine Learning organizes the search as per
the following two-step procedure:
 1) The search is first focused on a class of the possible
hypothesis, chosen for the learning task in hand. Prior
knowledge and experience are helpful in this selection.
Different hypothesis functions are appropriate for
different kinds of learning tasks, and available data.
 2) For each of the members of the class, the
corresponding learning algorithm organizes the search
through all possible structures of the learning machine.
 Principal techniques used in heuristic search
 Regularization:
 Early Stopping:
 Pruning:
 Principal techniques used in heuristic search
 Regularization:
 The regularization model promotes smoother functions by
creating a new criterion function that relies not only on the
training error, but also on algorithmic intricacy.
 E̅ = E + λ Ω  2.1
 = error on data + λ * hypothesis complexity, where λ gives the weight
of penalty.
 When λ=0, there is no regularization and results in a model that
tends to have some variance in it. That means, this model won’t
generalize well for a dataset different from its training data
(overfitting). As the value of λ rises, till a point, it reduces the
variance without substantially increasing the bias. But after
certain increase in the value of λ, it starts giving rise to increase in
bias in the model, and thus underfitting. λ is optimized using
corss-validation.
 Principal techniques used in heuristic search
 Early Stopping:
 Stopping the training before attaining a minimum
training error represents a technique of restricting the
effective hypothesis complexity.
 Pruning:
 An alternative solution that sometimes is more
successful than early stopping the complexity of the
hypothesis is pruning the full-grown hypothesis that is
likely to be overfitting the training data.
 Evaluation of Learning System
 Before using the Machine Learning System, it
should be evaluated in many aspects, which are:
 Accuracy:
 Robustness:
 Computation Complexity and Speed:
 Interpretability:
 Online Learning:
 Scalability:
 Evaluation of Learning System
 Accuracy: The learning system extracts knowledge
from the training data. The learned knowledge should
be general enough to deal with unknown data. The
generalization capability of a learning system is an
index of accuracy of the learning machine.
 Robustness: It means that the machine can perform
adequately under all circumstances including the cases
when information is corrupted by noise, is incomplete,
and is interfered with irrelevant data. All these
conditions seem to be part of the real world, and must
be considered while evaluating a learning system.
 Evaluation of Learning System
 Computation Complexity and Speed: Computational
complexity of a learning algorithm and learning speed
determine the efficiency of a learning system: how fast
the systems can arrive at a correct answer and how
much computer memory is required. We know how
important speed is in real-time situations.
 Interpretability: This is the level of understanding and
insight offered by a learning algorithm. Interpretability
is subjective and, hence, tougher to evaluate.
Interpretability is easy in decision trees, but still their
interpretability may decrease with an increase in their
complexity.
 Evaluation of Learning System
 Online Learning: The spectrum of applications is
increasing with the growth of technology. There are
sources which are generating streaming data, which
has to be analyzed in real time. An online learning
system continues to receive inputs from a real-time
environment and analyze it in real time.
 Scalability: Today huge amounts of data are being
generated in real-world applications. The capability of
higher levels of scalability is a desirable feature of a
learning machine. Typically, the assessment of
scalability can be done with a series of datasets of
ascending order in size complexity.
 Estimating Generalization Errors
 The success of learning depends on the hypothesis
space complexity and sample complexity, which
are interdependent. The goal is to find a function
simplest in terms of complexity and best in terms
of empirical error on the data. Such a choice is to
give good generalization performance.
 If we partition available data into training /
validation / testing datasets, the validation set is
used to optimize the parameters of the model
obtained using training data.
 Holdout Method and Random Subsampling
 In the holdout technique, some amount of data is
earmarked for the purpose of testing (one-third),
while the remainder is employed for training. If the
data is collected over time (time series data), then we
can make use of the earlier part to train and the
latter part of the data for the purpose of testing.
 Holdout Method and Random Subsampling
 The samples used to train and test have to
represent the underlying distribution for the
problem area. The proportion of class-data in
training, testing, and full datasets should more or
less be same. To make sure this happens, random
sampling should be performed in a manner that
will guarantee that each class is properly
represented in training as well as test sets. This
process is known as stratification.
 Holdout Method and Random Subsampling
 Even though stratified holdout is generally well
worth doing, it offers merely a basic safeguard
against irregular representation in training and test
sets. A more general way to alleviate any bias
resulting from the specific sample selected for
holdout is random sampling, wherein the holdout
technique is iterated K times with various arbitrary
samples. The accuracy estimate on the whole is
considered as the average of the accuracies got
from each repetition.
 Cross-Validation
 A commonly used technique for forecasting the
success rate of a learning method, taking into
account a fixed data sample, is the K-fold cross-
validation.
 Another estimate prevalent is the leave-one-out
cross-validation.
 Cross-Validation
 K-Fold cross-validation
 In K-fold cross-validation, the given data D is randomly
divided into K mutually exclusive subsets or folds, Dk,
where k = 1, 2, …., K, each of about equal size. Training
and testing is done K times. In iteration k, partition Dk
is set aside for testing, and the remainder of the
divisions are collectively employed to train the model.
That is, in the first iteration, the set D2 D3 …. Dk serves
as the training set to attain the first model which is
tested on D1, then second iteration is trained on D1 D3
…. Dk and tested on D2, and so on.
 Cross-Validation
 Stratified K-Fold cross-validation
 If stratification is also used, it is known as stratified
K-fold cross-validation for classification. Ultimately,
the K error estimates received from K iterations are
averaged to give rise to an overall error estimate.
Out of the 10 machines, the one with lowest error
may be deployed. K = 10 folds is the standard
number employed to predict the error rate of a
learning method.
 Cross-Validation
 Leave-one-out Cross-validation
 Only a single sample is left out for the test in each
iteration. The learning machine is trained on the
remainder of the samples. It is judged by its
accuracy on the left-out sample. The average of all
outcomes of all N judgements in N iterations is
taken, and this is the average which is
representative of the error-estimate.
 Cross-Validation
 Leave-one-out Cross-validation
 The computational expense of this process is quite
high as the whole learning process has to be
iterated N times, and this is generally not feasible
for big datasets. Nevertheless, leave-one-out seems
to present an opportunity to squeeze the maximum
out of a small dataset and obtain an estimate that is
as precise as possible. This process disallows
stratification.
 Bootstrapping
 The bootstrap technique is based on the process of
sampling with replacement. In the earlier techniques,
the same instance, which was once chosen could
not be chosen again. However, most learning
techniques can employ an instance several times,
and it affects the learning outcome if it is available
in the training set more than once. The concept of
bootstrapping aims to sample the dataset by
replacement, so as to form a training set and a test
set.
 Bootstrapping
 There are many bootstrap techniques. The most
popular one is the 0.632 bootstrap, which works as
follows:
 A dataset of N instances is sampled N times with
replacements to give rise to another new dataset of N
instances, which is a bootstrap sample – a training set of
N samples. As certain elements in the bootstrap sample
will be repeated, there will be certain instances in the
original dataset D that have not been selected – these
will be used as test instances. If we attempt this many
times, on an average, 63.2% of the original data
instances will result in the bootstrap sample and the
remaining 36.8% will give rise to the test set (therefore,
the name, 0.632 bootstrap).
 Metrics for Assessing Regression (Numeric
Prediction) Accuracy
 The task is to find a model h(x) that explains the
underlying data, that is, for all samples (x, y).
Equivalently, the task is to approximate function
f(x) with unknown properties by h(x).
 Estimating the error in prediction using holdout
and random subsampling, cross-validation and
bootstrap methods are common techniques for
assessing accuracy of the predictor. Several
alternative metrics can be used to assess the
accuracy of numeric prediction.
 Metrics for Assessing Regression (Numeric
Prediction) Accuracy
 Mean Square Error (MSE): The mean is obtained
from the training data as arithmative average,
 Root Mean Square Error (RMSE):
 Taking the square root yields,
 Metrics for Assessing Regression (Numeric
Prediction) Accuracy
 Sum-of-error-squares:
 Sometimes total error, and not the average, is taken
for mathematical manipulation by some statistical
/ machine learning techniques:
 Metrics for Assessing Regression (Numeric
Prediction) Accuracy
 Mean Absolute Error: Measures the average
deviation of the predicted value from the true
value
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 The basic principles – use of an independent test
dataset instead of the training set to evaluate
performance, the holdout technique and cross-
validation – are equally applicable to classification.
The errors in numeric prediction arise in various
sizes whereas in classification, errors simply exist
or are absent.
 Several different measures can be used to assess
the accuracy of a classifier. They are:
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Misclassification Error:
 The metric for assessing the accuracy of
classification algorithms is: number of samples
misclassified by the model h(w, x). For example, for
binary classification problems,
 y(i) ϵ [0, 1], and h(w, x(i)) = y^(i)ϵ [0, 1];
 i = 1, 2, …., N.
 For 0% error, (y(i) - y^(i)) = 0 for all data points.
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Misclassification Error:
 This accuracy measure works well for the
situations where class tuples are more or less evenly
distributed. However, when the classes are
imbalanced, decisions made on classifications based
on misclassification error lead to poor
performance.
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Log Loss: A loss function is a method of evaluating
how well our algorithm models our dataset. Log
Loss takes into account the uncertainty of
prediction based on how much it varies from the
actual label.
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Log Loss: Log loss is a straightforward modification
of log-likelihood function. With maximization
transformed to minimization, the log loss for one
sample is given by:
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Log Loss: The cost function is taken as the average
of loss over the entire dataset. Therefore, log loss
metric for classification tasks is expressed as:
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Log Loss:
 Note that the log loss of a sample is low when it’s
predicted probability is high, indicating that the
prediction matches the actual value. The log loss
increases as the predicted probability reduces;
probabilities close to 0 would be bad and result in
high loss value.
 Metrics for Assessing Classification (Pattern
Recognition) Accuracy
 Cross Entropy:
 Cross entropy is a measure from the field of
information theory. Although the two measures –
log loss and cross entropy – are derived from
different fields, when used as cost functions for
classification models, both the measures calculate
the same quantity and can be used
interchangeably.
Queries?

More Related Content

Similar to Unit I and II Machine Learning MCA CREC.pptx

What is Artificial Intelligence and Machine Learning (1).pptx
What is Artificial Intelligence and Machine Learning (1).pptxWhat is Artificial Intelligence and Machine Learning (1).pptx
What is Artificial Intelligence and Machine Learning (1).pptxprasadishana669
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfTemok IT Services
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Hot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisHot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisWriteMyThesis
 
Fundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdfFundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdfBPBOnline
 
Comparison Between Artificial Intelligence, Machine Learning, and Deep Learning
Comparison Between Artificial Intelligence, Machine Learning, and Deep LearningComparison Between Artificial Intelligence, Machine Learning, and Deep Learning
Comparison Between Artificial Intelligence, Machine Learning, and Deep LearningZaranTech LLC
 
The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)RR IT Zone
 
The technologies of ai used in different corporate world
The technologies of ai used in different  corporate worldThe technologies of ai used in different  corporate world
The technologies of ai used in different corporate worldEr. rahul abhishek
 
Voice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment PeoplesVoice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment PeoplesIJASRD Journal
 
Machine-Learning-Unlocking-the-Power-of-Data.pptx
Machine-Learning-Unlocking-the-Power-of-Data.pptxMachine-Learning-Unlocking-the-Power-of-Data.pptx
Machine-Learning-Unlocking-the-Power-of-Data.pptxyashdeore2227
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxJayChauhan100
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine LearningKnoldus Inc.
 
Machine Learning Chapter one introduction
Machine Learning Chapter one introductionMachine Learning Chapter one introduction
Machine Learning Chapter one introductionARVIND SARDAR
 
Cognitive Computing - A Primer
Cognitive Computing - A PrimerCognitive Computing - A Primer
Cognitive Computing - A PrimerMarlabs
 

Similar to Unit I and II Machine Learning MCA CREC.pptx (20)

What is Artificial Intelligence and Machine Learning (1).pptx
What is Artificial Intelligence and Machine Learning (1).pptxWhat is Artificial Intelligence and Machine Learning (1).pptx
What is Artificial Intelligence and Machine Learning (1).pptx
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Unit IV.pdf
Unit IV.pdfUnit IV.pdf
Unit IV.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
Hot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and ThesisHot Topics in Machine Learning for Research and Thesis
Hot Topics in Machine Learning for Research and Thesis
 
Fundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdfFundamental Areas Of Study In Data Science.pdf
Fundamental Areas Of Study In Data Science.pdf
 
Comparison Between Artificial Intelligence, Machine Learning, and Deep Learning
Comparison Between Artificial Intelligence, Machine Learning, and Deep LearningComparison Between Artificial Intelligence, Machine Learning, and Deep Learning
Comparison Between Artificial Intelligence, Machine Learning, and Deep Learning
 
The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)The Ultimate Guide to Machine Learning (ML)
The Ultimate Guide to Machine Learning (ML)
 
Machine learning
Machine learningMachine learning
Machine learning
 
The technologies of ai used in different corporate world
The technologies of ai used in different  corporate worldThe technologies of ai used in different  corporate world
The technologies of ai used in different corporate world
 
Voice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment PeoplesVoice Based Search Engine for Visually Impairment Peoples
Voice Based Search Engine for Visually Impairment Peoples
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Machine-Learning-Unlocking-the-Power-of-Data.pptx
Machine-Learning-Unlocking-the-Power-of-Data.pptxMachine-Learning-Unlocking-the-Power-of-Data.pptx
Machine-Learning-Unlocking-the-Power-of-Data.pptx
 
Lecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptxLecture-1-Introduction to Deep learning.pptx
Lecture-1-Introduction to Deep learning.pptx
 
Introduction To Machine Learning
Introduction To Machine LearningIntroduction To Machine Learning
Introduction To Machine Learning
 
Machine Learning Chapter one introduction
Machine Learning Chapter one introductionMachine Learning Chapter one introduction
Machine Learning Chapter one introduction
 
Cognitive Computing - A Primer
Cognitive Computing - A PrimerCognitive Computing - A Primer
Cognitive Computing - A Primer
 

Recently uploaded

Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 

Recently uploaded (20)

ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 

Unit I and II Machine Learning MCA CREC.pptx

  • 1. Paul Bharath Bhushan Petlu Computer Applications Department, Chadalawada Ramanamma Engineering College, Tirupati, Andhra Pradesh, India.
  • 2. A duck seems to be pleasant on the surface of the pond; But, there is a restless pedaling under the water.
  • 3.  Human beings dreamt of creating machines with human-like traits  Robots in manufacturing, mining, agriculture, space, ocean exploration, and health sciences etc  These machines are enslaved by commands create intelligent machines that emulate human intelligence
  • 4.  Human Intelligence possesses robust attributes with complex sensory, control, affective (emotional), and cognitive (thinking processes)  Central Nervous System: over one hundred billion biological neurons  CNS – acquires information from natural sensory organs
  • 5.  Cognitive Mathematics?  Neural networks: a low-level cognitive machine – a thinking machine  Fuzzy logic: mathematical power for the emulation of the higher-order cognitive functions – the thought and perception process  Neural networks + Fuzzy logic
  • 6.  Needs, Motivations, and Rationale:  Information is power and a must for success  The collected information may be categorized on the basis of nature of experience: 1. Experimental data 2. Structured human knowledge expressed in linguistic form  Banks, hospitals, automobiles, observatories around the world, etc
  • 7.  Soft Computing / Machine Learning:  Conventional approach Human intelligence to solving decision problems  The present scene much different from yesterday. We now have ocean of data to be processed. Humans are unable to extract useful information from them. Computers of today can store this data and analyze it. However, to lead to meaningful analysis, a new mathematical theory has emerged which is built on the foundation of human facilities of learning, memorizing, adapting and generalizing
  • 8. Soft Computing / Machine Learning: The basic premises of machine learning are:  The real world is pervasively imprecise and uncertain  The precision and certainty carry a cost The guiding principle of machine learning, which follows from these premises, is as follows: Exploit tolerance for imprecision and uncertainty to achieve tractability, robustness, and low-cost solutions
  • 9.  There are 3 identified features to have a well defined learning problems: 1. 1) The learning task 2. 2) The measure of performance 3. 3) The task experience  Important aspects of ‘learning from experience’ behavior of humans and other animals embedded in machine learning are:  1) Remembering and Adapting  2) Generalizing
  • 10.  Machine Computer Program  Learning Machine  Learning Algorithms Computer Program Design  Learned Knowledge
  • 11. A block diagrammatic representation of a Learning Machine:
  • 12.  Google is by far the most popular and extensively used of all search engines.  The moment we start browsing for items on Amazon, we see recommendations for products, books, movies, music, etc., we probably are interested in.  Amazon used recommender system designed by machine learning, based on the data generated from social networking sites.
  • 13.  Some application domains are:  Medical Diagnostics  Finance Domain  Stock Market Forecasting  Machine Vision  Speech Recognition  Text Mining  Robotics and Automation  Etc..
  • 14.  Medical Diagnostics: Major success of deep learning for machine vision applications have made it possible to accurately analyze medical images – X- rays, MRI, CT scan, ultrasound images, ECG etc. Machine Learning augmented with deep learning diagnoses promises to revolutionize healthcare.
  • 15.  Finance Domain: The applications of Machine Learning in finance domain helps banks offer personalized services to customers at lower cost, and better compliance. This helps banks to generate higher revenue. Machine Learning can scan through large amounts of transactional data in seconds, and identify if there is any fraudulent behavior and predict it.
  • 16.  Stock Market Forecasting: Readers aware of the stock market know that the seamless buying and selling of company stocks data is in the form of time series. It is sequential data wherein data at a time t depends on the past history at t-1, t-2, … Stock index is an average value that is calculated by combining various stocks and its prediction represents the market’s movements over time.
  • 17.  Machine Vision: A machine vision system captures images through a camera and analyzes these to describe the images. A level of visual understanding and recognition that humans exhibit, cannot be matched by machine vision algorithms. However, certain problems, such as biometric recognition – finger prints identification, face recognition, etc are being handled with success
  • 18.  Speech Recognition: Using signal processing techniques we can represent the speech signal by a set of real values. The resulting data is sequential in nature and with deep learning we get higher levels of performance for speech recognition systems. Virutal Personal Assistants (Amazon Echo, Google Home) assist in finding information when asked over voice.  For example: “what are the flights from Delhi to Chennai?”
  • 19.  Text Mining: Text mining is an area that is concerned with the identification of patterns in text data. The procedure involves analysis of text for extraction of useful information for specific purposes – email spam detection etc.
  • 20.  Robotics and Automation: Computers are controlling and monitoring manufacturing processes with high degree of automation, facilitated by machine learning techniques and robots – for industrial automation, medical robots, military robots, robots employed in disaster areas, and so on. Machine Vision is an integral part of many robotic applications, for example, images have to be analyzed online and a machine learning system has to categorize the objects into ‘defect’ and ‘non-defect’ category and then the robot can put the objects in the right place.
  • 21.  More on Application Domains: Machine Learning / Data Mining is omnipresent and is an empirical technology that has applications in all knowledge domains: engineering, business management, natural science, social science.
  • 22.  Data Representation: Structured / Unstructured Data  Experience in the form of raw data is a source of learning in many applications and human knowledge in linguistic form is an additional learning source.  Data warehousing provides integrated, consistent and cleaned data to the machine learning algorithms and also from the availability of data in a flat file, which is a simple data table.  Logical structure of a database is established by data modelling. A data model determines how data is stored, organized, and then manipulated in the database.
  • 23.  Structured Data: It is the data that adheres to a predefined data model. It can be stored in a relational database. It conforms to a tabular format with relationship between different rows and columns. Data can easily be aggregated from various locations in the database. This data model is the simplest way to manage information and the techniques to analyze structured data
  • 24.  Unstructured Data: It is the information that either doesn’t have a data model or is not organized in a predefined manner. It often includes text and multimedia data, for example, social media data generated from YouTube, Facebook, Twitter, Instagram, LinkedIn, etc, text internal to the company such as documents, logs, survey results, emails, images and videos, audios. In case where this kind of data has internal structure, the data still considered ‘unstructured’ because it doesn’t fit neatly in a relational database.
  • 25.  Semi structured Data: It is the information that doesn’t conform to a formal structure of data models associated with relational databases, but that does have some organizational properties that make it easier to analyze. With some processing, we can transform them into a format that machines accept for various prediction tasks.
  • 26.  Unlocking the information power of Unstructured data: About 80% of the total data being collected and stored today is unstructured. Therefore, unlocking the information power of such data is very important. Examples of non-relational databases include Apache Cassandra, MongoDB, Hadoop/MapReduce, Spark, among others. A number of software solutions are being designed to search unstructured data and extract important information.
  • 27.  Forms of Learning:  Any method that incorporates information from experience in the design of a machine employs learning. A learning method depends on the type of experience from which the machine will learn or trained. The type of available learning experience can have a significant impact on the success or failure of the learning machine.
  • 28.  Forms of Learning:  The field of machine learning usually distinguished four forms of learning:  1) Supervised Learning  2) Unsupervised Learning  3) Reinforcement Learning  4) Learning based on natural processes – evolution, swarming, and immune systems
  • 29.  1) Predictive / Directed / Supervised Learning:  In general xi is a D-dimensional vector of number, say, height and weight of a person. These are called features, attributes or covariates.  Input xi could be a complex structured object, such as an image, a sentence, an email message, a time series, a molecular shape, a graph, etc  similarly, the form of output or response variable can in principle be anything, but most methods assume that yi is a categorical or nominal variable from some finite set, yi ϵ {1, 2, …., C)
  • 30.  Binary Classification: C = 2  (a) Some labeled training examples of colored shapes, along with 3 unlabeled test cases. (b) Representing the training data as an N x D design matrix. Row i represents the feature vector xi. The last column is the label, yi ϵ {0, 1}. Based on a figure by Leslie Kaelbling
  • 31.  Classification of flowers:  Three types of Iris flowers: Setosa, Versicolor and Virginica. Source: http://www.statlab.uni-heidelberg.de/data/iris/.
  • 32. Image Classification:  We might want to classify the image as a whole, e.g., is it an indoors or outdoors scene? Is it a horizontal or vertical photo? Does it contain a dog or not? This is called image classification. Handwriting recognition:  In the special case that the images consist of isolated handwritten letters and digits, for example, in a postal or ZIP code on a letter, we can use classification to perform handwriting recognition.
  • 33.  Face detection and Recognition:  Example of face detection. (a) Input image (b) Output of classifier, which detected 5 faces at different poses.
  • 34.  Regression:  (a) Linear Regression on some 1d data. (b) Same data with polynomial regression (degree 2). Figure generated by linregpolyVsDegree
  • 35. Some of the examples of real-world regression problems are: •Predict tomorrow’s stock market price given current market conditions and other possible side information. •Predict the age of a viewer watching a given video on YouTube. •Predict the location in 3d space of a robot armend effector, given control signals (torques) sent to its various motors. •Predict the amount of prostate specific antigen (PSA) in the body as a function of a number of different clinical measurements. •Predict the temperature at any location inside a building using weather data, time, door sensors, etc.
  • 36.  2) Descriptive / Undirected / Unsupervised Learning:  Here we are only given inputs, D = {xi}i=1 to N, and the goal is to find “interesting patterns” in the data. This is a much less well-defined problem, since we are not told what kinds of patterns to look for, and there is no obvious error metric to use.
  • 37.  3) Reinforcement Learning:  This is somewhat less commonly used. This is useful for learning how to act or behave when given occasional reward or punishment signals. (For example, consider how a baby learns to walk)
  • 38.  4) Learning based on Natural Processes: Evolution, Swarming, and Immune Systems  Some learning approaches take inspiration from nature for the development of novel problem- solving techniques. These are applied successfully to a variety of optimization problems.
  • 39.  Evolutionary Computation  Evolutionary biology essentially states that a population of individuals possessing the ability to reproduce and exposed to genetic variation followed by selection gives rise to new populations which are fitter to their environment. The primary streams are: genetic algorithms, evolution strategies, evolutionary programming and genetic programming
  • 40.  Swarm Intelligence  It is a feature of systems of unintelligent agents with inadequate individual abilities, displaying collectively intelligent behaviour. It includes algorithms derived from the collective behaviour of social insects (Ant Colony Optimization) and other animal / human societies (Particle Swarm Optimization)
  • 41.  Artificial Immune Systems  An Artificial Immune System (AIS) replicates certain aspects of the natural immune system, which is primarily applied to solve pattern- recognition problems and cluster data. The natural immune system has an extraordinary ability to match patterns, employed to differentiate between foreign cells making an entry into the body (antigen) and the cells that are part of the body.
  • 42.  Machine Learning and Data Mining  Machine Learning:  Early AI research was focused on hard coding, the rules that mimic human intelligence. Machine Learning, a subfield of AI, still involves classifical programming, human intelligence is required to convert raw data to representations used by machine for learning. Deep learning, a subfield of Machine Learning, is a form of ‘representation learning’, inspired by biological nervous system. Machine Learning is the computation process wherein a machine ‘learn’ and adjusts its behaviour based on feedback from data.
  • 43.  Machine Learning and Data Mining  Data Mining:  DM focuses concerns on real-world application, concentrated on commercial applications and business-related problems of data analysis tends to drift in the direction of data mining.  Both ML and DM are related to each other sharing methods and algorithms pertaining to the analysis of the data to seek informative patterns.
  • 44.  Data Science  Data Science is a new name given to an action plan for expanding the technical areas of the field of statistics.  Data Science is the extraction of knowledge from data.  The task ‘knowledge extraction’ does not have any boundaries.
  • 45.  Relationship among key technologies
  • 47.  Learning from Observations  We can visualize each pattern with n numerical features as a point in n-dimensional state space Rn :  x = [x1 x2 . . . . xn]T ϵ Rn  The training experience is in the form of data D that describes how the system behaves over its entire range of operation.  D : {x(i), y(i)}; i = 1, 2, . . . . , N(2.1)  data D is independently drawn and identically distributed (iid) represented by probability density function p(x, y)
  • 48.  Learning from Observations  Assume a machine defined by a function f: X  Y  When f(.) is selected, the machine is called a trained machine that gives estimated output value for a given pattern x.  We can define the set of learning machines by a function f(x, w), where w contains adjustable parameters.  Loss function is L(y, f(x, w))
  • 49.  Empirical Risk Minimization  Our problem is to find a decision function f(x, w) against p(x, y) that minimizes the risk function R(w).  With dataset, D : {x(i), y(i)}; i = 1, 2, . . . . , N, being the only source of information, the empirical risk function given by:  This empirical risk function replaces average over p(x, y) by an average over the training sample.
  • 50.  Inductive Learning  Given a collection of examples (x(i) f(x(i)); i = 1, 2, …., N, of a function f(x), return a function h(x) that approximates f(x).  The assumption in inductive learning is that the ideal hypothesis related to unseen patterns is the one induced by the observed training data.
  • 51.  Bias and Variance  Consider the following experiment. We first collect a random sample D of N independently drawn patterns from the distribution p(x, y), and then measure the sample error / training error / approximation error from:  using loss function for classification problems:   using loss function for regression problems:  L(y, f(x, w)) = ½ (y – f(x, w))2
  • 52. BIAS AND VARIANCE LINEAR CURVE FITTING  If we run K such experiments, measuring the random variable errorDj[h]; j = 1, 2, ...., K then the average over the K experiments is:  errorD[h] = ED{ errorDj[h]}  where ED{.} denotes the expectation or ensemble average.
  • 53.  Bias and Variance  A non-zero error can arise for two reasons:  1) It may be that the hypothesis function h(.) is, on an average, different from the regression function f(x). This is called bias.  2) It may be that the hypothesis function is very sensitive to the particular dataset Dj, so that for a given x, approximation error is larger for some datasets, and smaller for other datasets. This is called variance.
  • 54.  Occam’s razor principle  The Franciscan Monk, William of Occam, was born in 1280. His principle:  ‘The simpler explanations are more reasonable, and any unnecessary complexity should be shaved off’.  ‘Simpler’ may imply needing lesser parameters, lesser training time, fewer attributes for data representation, and lesser computational complexity.
  • 55.  Overfitting avoidance  Occam’s razor principle suggests hypothesis functions that avoid overfitting of the training data. We stop looking for a design when the solution is ‘good enough’, not necessarily the optimal one.  In the machine learning jargon, a learning machine is said to overfit the training examples if certain other learning machine that fits the training examples less well, actually performs better over the total distribution of patterns.
  • 57.  Heuristic Search in Inductive Learning  Trial-and-error is the approach of searching for a ‘good enough’ solution.  Applied Machine Learning organizes the search as per the following two-step procedure:  1) The search is first focused on a class of the possible hypothesis, chosen for the learning task in hand. Prior knowledge and experience are helpful in this selection. Different hypothesis functions are appropriate for different kinds of learning tasks, and available data.  2) For each of the members of the class, the corresponding learning algorithm organizes the search through all possible structures of the learning machine.
  • 58.  Principal techniques used in heuristic search  Regularization:  Early Stopping:  Pruning:
  • 59.  Principal techniques used in heuristic search  Regularization:  The regularization model promotes smoother functions by creating a new criterion function that relies not only on the training error, but also on algorithmic intricacy.  E̅ = E + λ Ω  2.1  = error on data + λ * hypothesis complexity, where λ gives the weight of penalty.  When λ=0, there is no regularization and results in a model that tends to have some variance in it. That means, this model won’t generalize well for a dataset different from its training data (overfitting). As the value of λ rises, till a point, it reduces the variance without substantially increasing the bias. But after certain increase in the value of λ, it starts giving rise to increase in bias in the model, and thus underfitting. λ is optimized using corss-validation.
  • 60.  Principal techniques used in heuristic search  Early Stopping:  Stopping the training before attaining a minimum training error represents a technique of restricting the effective hypothesis complexity.  Pruning:  An alternative solution that sometimes is more successful than early stopping the complexity of the hypothesis is pruning the full-grown hypothesis that is likely to be overfitting the training data.
  • 61.  Evaluation of Learning System  Before using the Machine Learning System, it should be evaluated in many aspects, which are:  Accuracy:  Robustness:  Computation Complexity and Speed:  Interpretability:  Online Learning:  Scalability:
  • 62.  Evaluation of Learning System  Accuracy: The learning system extracts knowledge from the training data. The learned knowledge should be general enough to deal with unknown data. The generalization capability of a learning system is an index of accuracy of the learning machine.  Robustness: It means that the machine can perform adequately under all circumstances including the cases when information is corrupted by noise, is incomplete, and is interfered with irrelevant data. All these conditions seem to be part of the real world, and must be considered while evaluating a learning system.
  • 63.  Evaluation of Learning System  Computation Complexity and Speed: Computational complexity of a learning algorithm and learning speed determine the efficiency of a learning system: how fast the systems can arrive at a correct answer and how much computer memory is required. We know how important speed is in real-time situations.  Interpretability: This is the level of understanding and insight offered by a learning algorithm. Interpretability is subjective and, hence, tougher to evaluate. Interpretability is easy in decision trees, but still their interpretability may decrease with an increase in their complexity.
  • 64.  Evaluation of Learning System  Online Learning: The spectrum of applications is increasing with the growth of technology. There are sources which are generating streaming data, which has to be analyzed in real time. An online learning system continues to receive inputs from a real-time environment and analyze it in real time.  Scalability: Today huge amounts of data are being generated in real-world applications. The capability of higher levels of scalability is a desirable feature of a learning machine. Typically, the assessment of scalability can be done with a series of datasets of ascending order in size complexity.
  • 65.  Estimating Generalization Errors  The success of learning depends on the hypothesis space complexity and sample complexity, which are interdependent. The goal is to find a function simplest in terms of complexity and best in terms of empirical error on the data. Such a choice is to give good generalization performance.  If we partition available data into training / validation / testing datasets, the validation set is used to optimize the parameters of the model obtained using training data.
  • 66.  Holdout Method and Random Subsampling  In the holdout technique, some amount of data is earmarked for the purpose of testing (one-third), while the remainder is employed for training. If the data is collected over time (time series data), then we can make use of the earlier part to train and the latter part of the data for the purpose of testing.
  • 67.  Holdout Method and Random Subsampling  The samples used to train and test have to represent the underlying distribution for the problem area. The proportion of class-data in training, testing, and full datasets should more or less be same. To make sure this happens, random sampling should be performed in a manner that will guarantee that each class is properly represented in training as well as test sets. This process is known as stratification.
  • 68.  Holdout Method and Random Subsampling  Even though stratified holdout is generally well worth doing, it offers merely a basic safeguard against irregular representation in training and test sets. A more general way to alleviate any bias resulting from the specific sample selected for holdout is random sampling, wherein the holdout technique is iterated K times with various arbitrary samples. The accuracy estimate on the whole is considered as the average of the accuracies got from each repetition.
  • 69.  Cross-Validation  A commonly used technique for forecasting the success rate of a learning method, taking into account a fixed data sample, is the K-fold cross- validation.  Another estimate prevalent is the leave-one-out cross-validation.
  • 70.  Cross-Validation  K-Fold cross-validation  In K-fold cross-validation, the given data D is randomly divided into K mutually exclusive subsets or folds, Dk, where k = 1, 2, …., K, each of about equal size. Training and testing is done K times. In iteration k, partition Dk is set aside for testing, and the remainder of the divisions are collectively employed to train the model. That is, in the first iteration, the set D2 D3 …. Dk serves as the training set to attain the first model which is tested on D1, then second iteration is trained on D1 D3 …. Dk and tested on D2, and so on.
  • 71.  Cross-Validation  Stratified K-Fold cross-validation  If stratification is also used, it is known as stratified K-fold cross-validation for classification. Ultimately, the K error estimates received from K iterations are averaged to give rise to an overall error estimate. Out of the 10 machines, the one with lowest error may be deployed. K = 10 folds is the standard number employed to predict the error rate of a learning method.
  • 72.  Cross-Validation  Leave-one-out Cross-validation  Only a single sample is left out for the test in each iteration. The learning machine is trained on the remainder of the samples. It is judged by its accuracy on the left-out sample. The average of all outcomes of all N judgements in N iterations is taken, and this is the average which is representative of the error-estimate.
  • 73.  Cross-Validation  Leave-one-out Cross-validation  The computational expense of this process is quite high as the whole learning process has to be iterated N times, and this is generally not feasible for big datasets. Nevertheless, leave-one-out seems to present an opportunity to squeeze the maximum out of a small dataset and obtain an estimate that is as precise as possible. This process disallows stratification.
  • 74.  Bootstrapping  The bootstrap technique is based on the process of sampling with replacement. In the earlier techniques, the same instance, which was once chosen could not be chosen again. However, most learning techniques can employ an instance several times, and it affects the learning outcome if it is available in the training set more than once. The concept of bootstrapping aims to sample the dataset by replacement, so as to form a training set and a test set.
  • 75.  Bootstrapping  There are many bootstrap techniques. The most popular one is the 0.632 bootstrap, which works as follows:  A dataset of N instances is sampled N times with replacements to give rise to another new dataset of N instances, which is a bootstrap sample – a training set of N samples. As certain elements in the bootstrap sample will be repeated, there will be certain instances in the original dataset D that have not been selected – these will be used as test instances. If we attempt this many times, on an average, 63.2% of the original data instances will result in the bootstrap sample and the remaining 36.8% will give rise to the test set (therefore, the name, 0.632 bootstrap).
  • 76.  Metrics for Assessing Regression (Numeric Prediction) Accuracy  The task is to find a model h(x) that explains the underlying data, that is, for all samples (x, y). Equivalently, the task is to approximate function f(x) with unknown properties by h(x).  Estimating the error in prediction using holdout and random subsampling, cross-validation and bootstrap methods are common techniques for assessing accuracy of the predictor. Several alternative metrics can be used to assess the accuracy of numeric prediction.
  • 77.  Metrics for Assessing Regression (Numeric Prediction) Accuracy  Mean Square Error (MSE): The mean is obtained from the training data as arithmative average,  Root Mean Square Error (RMSE):  Taking the square root yields,
  • 78.  Metrics for Assessing Regression (Numeric Prediction) Accuracy  Sum-of-error-squares:  Sometimes total error, and not the average, is taken for mathematical manipulation by some statistical / machine learning techniques:
  • 79.  Metrics for Assessing Regression (Numeric Prediction) Accuracy  Mean Absolute Error: Measures the average deviation of the predicted value from the true value
  • 80.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  The basic principles – use of an independent test dataset instead of the training set to evaluate performance, the holdout technique and cross- validation – are equally applicable to classification. The errors in numeric prediction arise in various sizes whereas in classification, errors simply exist or are absent.  Several different measures can be used to assess the accuracy of a classifier. They are:
  • 81.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Misclassification Error:  The metric for assessing the accuracy of classification algorithms is: number of samples misclassified by the model h(w, x). For example, for binary classification problems,  y(i) ϵ [0, 1], and h(w, x(i)) = y^(i)ϵ [0, 1];  i = 1, 2, …., N.  For 0% error, (y(i) - y^(i)) = 0 for all data points.
  • 82.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Misclassification Error:  This accuracy measure works well for the situations where class tuples are more or less evenly distributed. However, when the classes are imbalanced, decisions made on classifications based on misclassification error lead to poor performance.
  • 83.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Log Loss: A loss function is a method of evaluating how well our algorithm models our dataset. Log Loss takes into account the uncertainty of prediction based on how much it varies from the actual label.
  • 84.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Log Loss: Log loss is a straightforward modification of log-likelihood function. With maximization transformed to minimization, the log loss for one sample is given by:
  • 85.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Log Loss: The cost function is taken as the average of loss over the entire dataset. Therefore, log loss metric for classification tasks is expressed as:
  • 86.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Log Loss:  Note that the log loss of a sample is low when it’s predicted probability is high, indicating that the prediction matches the actual value. The log loss increases as the predicted probability reduces; probabilities close to 0 would be bad and result in high loss value.
  • 87.  Metrics for Assessing Classification (Pattern Recognition) Accuracy  Cross Entropy:  Cross entropy is a measure from the field of information theory. Although the two measures – log loss and cross entropy – are derived from different fields, when used as cost functions for classification models, both the measures calculate the same quantity and can be used interchangeably.