ML Unit 1 CS.ppt

UNIT 1
Introduction
1. Tom M. Mitchell,―Machine Learning, McGraw-Hill Education (India) Private Limited, 2013.
2. Ethem Alpaydin,―Introduction to Machine Learning (Adaptive Computation and Machine
Learning), The MIT Press 2004.
3. Stephen Marsland, ―Machine Learning: An Algorithmic Perspective, CRC Press, 2009.
4. Bishop, C., Pattern Recognition and Machine Learning. Berlin: Springer- Verlag.
1
KCS 055/ KOE 073: Machine Learning Introduction
Anurag Malik
(Associate Prof. CS & E)
CS & E Dept. M.I.T Moradabad
B.Tech V/ VII CS /ME
Recommended Books:
October 14, 2023

Syllabus KCS 055
2
October 14, 2023

What is Learning?
 “Learning denotes changes in a system that ... enable a system to do the
same task … more efficiently the next time.” - Herbert Simon
• Learning is the process of acquiring new understanding, knowledge,
behaviors, skills, values, attitudes and preferences.
• The ability to learn is possessed by humans, animals, and some machines.
 “Learning is making useful changes in our minds.” - Marvin Minsky
• Some learning is immediate, induced by a single event (e.g. being burned
by a hot stove), but much skill and knowledge accumulates from repeated
experiences.
October 14, 2023 4

Types of Learning
1. Visual (Spatial) :By representing information and with
images, students are able to focus on meaning, such as
architecture, engineering, project management, or design.
2. Aural (Auditory-Musical): If you need someone to
tell you something out loud to understand it, you are an
auditory learner. such as musician, recording engineer,
speech pathologist, or language teacher.
3. Verbal (Linguistic): People who find it easier to express
themselves by writing or speaking can be regarded as a verbal learner.
4. Physical (Kinesthetic) :In this style, learning happens
when the learner carries out a physical activity, rather
than listening to a lecture or watching a demonstration.
5
October 14, 2023

Types of Learning (Cont…)
5. Logical (Mathematical) :When you like using your
brain for logical and mathematical reasoning,
you’re a logical learner. You easily recognise patterns
and can connect seemingly meaningless concepts easily.
such as scientific research, accountancy, bookkeeping
or computer programming.
6. Social (Interpersonal) : If you’re at best in socializing
and communicating with people, both verbally and
non-verbally, this is what you are; a social learner.
People often come to you to listen and ask for
advice. counseling, teaching, training and coaching,
sales, politics, and human resources among others.
6
October 14, 2023

Related Fields
Machine learning is primarily concerned with the accuracy and
effectiveness of the computer system.
psychological models
data
mining
cognitive science
decision theory
information theory
databases
machine
learning
neuroscience
statistics
evolutionary
models
control theory
7

Well – Posed Learning Problems
 Learning can be defined through a computer program that improves its
performance at some task through experience.
 Definition of Learning: A computer program is said to learn from
experience E with respect to some class of tasks T and performance
measure P, if its performance at tasks in T, as measured by P, improves
with experience E.
 Lets have some examples of Well Posed Learning Problems
 Learn to Play Checkers
 Learn to recognize spoken words (SPHINX System)
 Learning to drive an autonomous vehicle (ALVINN System)
 Learning to classify new astronomical structures
 Predict recovery rates of pneumonia patients
 Detect fraudulent use of credit cards
8
October 14, 2023

 Three features: the class of tasks, the measure of performance to be
improved, and the source of experience.
 A checkers learning problem:
 Task T: playing checkers
 Performance measure P: percent of games won against opponents
 Training experience E: playing practice games against itself
 We can specify many learning problems in this fashion, such as learning
to recognize handwritten words, or learning to drive a robotic automobile
autonomously.
 A handwriting recognition learning problem:
 Task T: recognizing and classifying handwritten words within images
 Performance measure P: percent of words correctly classified
 Training experience E: a database of handwritten words with given
classifications
9
October 14, 2023

 A robot driving learning problem:
Task T: driving on public four-lane highways using vision
sensors
Performance measure P: average distance traveled before
an error (as judged by human overseer)
Training experience E: a sequence of images and steering
commands recorded while observing a human driver
10
October 14, 2023

11
DESIGNING A LEARNING SYSTEM
1. Choosing the Training Experience
2. Choosing the Target Function
3. Choosing a Representation for the Target
Function
4. Choosing a Function Approximation Algorithm
5. The Final Design
October 14, 2023

Designing a Learning System
 While designing a Learning system various design issues and approaches
must be consider.
1. Choosing the Training Experience: The first design choice we face is to
choose the type of training experience from which our system will
learn. The type of training experience available can have a significant
impact on success or failure of the learner.
 One key attribute is whether the training experience provides direct or
indirect feedback regarding the choices made by the performance system.
 A second important attribute of the training experience is the degree to
which the learner controls the sequence of training examples.
 A third important attribute of the training experience is how well it
represents the distribution of examples over which the final system
performance P must be measured.
12
October 14, 2023

13
Designing a Learning System
A checkers learning problem:
Task T: Playing checkers (draughts)
Performance Measures P: percent of games won in world tournament
Training Experience E: games played against itself
What experience?
What exactly should be learned?
How shall it be represented?
What specific algorithm to learn it?
October 14, 2023

14
Direct versus Indirect Learning
1. Individual checkers board states and correct
move for each
2. Move sequences and final outcomes of various
games played
Credit assignment problem - the degree to which
each move in the sequence deserves credit or
blame for the final outcome - game can be lost
even when early moves are optimal, if these are
followed later by poor moves or vice versa
October 14, 2023

15
Teacher or not?
Degree to which learner controls the sequence of training examples
1. Teacher selects informative board states & provides the correct
moves
2. For each proposed board state the learner finds particularly
confusing it asks the teacher for correct move
3. Learner may have complete control as it does when it learns by
playing itself with no teacher - learner may choose between
experimenting with novel board states or honing its skill by
playing minor variations of promising lines of play
October 14, 2023

16
1. Choose Training Experience
How well training experience represents the distribution of examples over
which the final system performance P must be measured
P is percent of games in the world tournament, obvious danger when E
consists of only games played against itself (probably can’t get world
champion to teach computer!)
Most current theories of machine learning assume that the distribution of
training examples is identical to the distribution of test examples
It is IMPORTANT to keep in mind that this assumption must often by
violated in practice.
E: play games against itself (advantage of getting a lot of data this way)
October 14, 2023

17
2. Choose a Target Function
The next design choice is to determine exactly what type of knowledge
will be learned and how this will be used by the performance
program.
ChooseMove: B -> M where B is any legal board state and M is a legal
move (hopefully the “best” legal move)
Alternatively, function V: B ->  which maps from B to some real value
where higher scores are assigned to better board states
Now use the legal moves to generate every subsequent board state
and use V to choose the best one and therefore the best legal move
October 14, 2023

18
Choose a Target Function II
Let us define the target value V(b) for an
arbitrary board state b in B, as follows
V(b) = 100, if b is a final board state that is won
V(b) = -100, if b is a final board state that is lost
V(b) = 0, if b is a final board state that is a draw
V(b) = V(b´), if b is not a final state where b´ is
the best final board state starting from b
assuming both players play optimally
October 14, 2023

3. Choosing a Representation for the Target
Function
 Given the ideal target function V, we will choose a representation that the
learning system will use to describe V' that it will learn.
 The function V' will be calculated as a linear combination of the following
board features:
 xl: the number of black pieces on the board
 x2: the number of red pieces on the board
 x3: the number of black kings on the board
 x4: the number of red kings on the board
 x5: the number of black pieces threatened by red (which can be captured
on red's next turn)
 x6: the number of red pieces threatened by black
19
October 14, 2023

3. Choosing a Representation for the
Target Function
 Thus, learning program will represent V'(b) as a linear
function of the form:
 V'(b) = w0+ w1x1+ w2x2+ w3x3+ w4x4+ w5x5+ w6x6
 where wi is the numerical coefficient or weight to
determine the relative importance of the various board
features and xi is the number of i-th objects on the board.
 where w0 through w6 are numerical coefficients or weights
to be chosen by the learning algorithm
20
October 14, 2023

21
Design So Far
T: Checkers
P: percent of games won in world tournament
E: games played against self
V: Board -> 
Target Function Representation:
V´(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6
October 14, 2023

22
4. Choose Function Approximation Algorithm
 In order to learn the target function f we require a set of training
examples, each describing a specific board state b and the training
value Vtrain(b) for b.
 In other words, each training example is an ordered pair of the form
(b,Vtrain(b)).
 First need Set of training examples <b,Vtrain(b)>
 For instance, the following training example describes a board state b in
which black has won the game (note x2 = 0 indicates that red has no
remaining pieces) and for which the target function value Vtrain(b) is
therefore +100.
<(x1=3,x2=0,x3=1,x4=0,x5=0,x6=0),+100> because x2=0
a) Estimating Training Values:
b) Adjusting the weights:
October 14, 2023

5. The Final Design
 The final design of our checkers learning system can be naturally described by four
distinct program modules that represent the central components in many learning
systems. These four modules
1. Performance System: Solve performance task using learned target function(s). It
takes instance of new problem as input and a trace of its solution (history) as output.
2. Critic: Take history of problem as input and produce a set of training examples of
target function as output.
3. Generalizer: Take training examples as input and produce estimate of target function
as output hypothesis. It generalizes from specific training examples, hypothesizing a
general function that covers all examples.
4. Experiment Generator: Take current hypothesis (currently learned function) as
input and outputs a new problem (i.e., initial board state) for Performance System to
explore. Its role is to pick new practice problems that will maximize the learning rate
of the overall system.
23
October 14, 2023

5. The Final Design
Fig. Final Design of checkers learner problem
24
October 14, 2023

What is Machine Learning?
 A branch of artificial intelligence, concerned with the
design and development of algorithms that allow
computers to evolve behaviors based on empirical data.
 As intelligence requires knowledge, it is necessary for
the computers to acquire knowledge.
 “Machine learning refers to a system capable of the
autonomous acquisition and integration of knowledge.”
 https://www.youtube.com/watch?v=Cx5aNwnZYDc
 https://www.youtube.com/watch?v=YhSeTEumjVA
 https://www.youtube.com/watch?v=ZoemTySxFso
October 14, 2023 25

Machine Learning Paradigms
 rote learning
 learning by being told (advice-taking)
 learning from examples (induction)
 learning by analogy
 speed-up learning
 concept learning
 clustering
 discovery
26
October 14, 2023

Why Machine Learning?
 No human experts
 industrial/manufacturing control
 mass spectrometer analysis, drug design, astronomic discovery
 Black-box human expertise
 face/handwriting/speech recognition
 driving a car, flying a plane
 Rapidly changing phenomena
 credit scoring, financial modeling
 diagnosis, fraud detection
 Need for customization/personalization
 personalized news reader
 movie/book recommendation
 Recent progress in algorithms and theory
 Growing Flood of online data
 Computational power is available
October 14, 2023 27

AI vs ML vs DL
October 14, 2023 28

Continue……
October 14, 2023 29

S. No.
Data Science Machine Learning
1.
Data Science is a field about processes
and systems to extract data from
structured and semi-structured data.
Machine Learning is a field of study that gives
computers the capability to learn without being
explicitly programmed.
2. Need the entire analytics universe. Combination of Machine and Data Science.
3. Branch that deals with data.
Machines utilize data science techniques to
learn about the data.
4.
Data in Data Science maybe or maybe
not evolved from a machine or
mechanical process.
It uses various techniques like regression and
supervised clustering.
5.
Data Science as a broader term not only
focuses on algorithms statistics but also
takes care of the data processing.
But it is only focused on algorithm statistics.
6.
It is a broad term for multiple
disciplines.
It fits within data science.
7.
Many operations of data science that is,
data gathering, data cleaning, data
manipulation, etc.
It is three types: Unsupervised learning,
Reinforcement learning, Supervised learning.
8.
Example: Netflix uses Data Science
technology.
Example: Facebook uses Machine Learning
technology. 30
October 14, 2023

Tools used for AI,ML and Deep Learning
October 14, 2023 31

Continue……
1. Tensorflow
 TensorFlow is basically an open source software library that is used for numerical computation with the help of
data flow graph. It came into sight by the dedicated efforts of engineers and researchers working on the Google
Brain Team. The flexible architecture of Tensorflow allows you to deploy computation to multiple GPUs or CPUs
in a server/mobile device/desktop by using just a single API.
2. IBM Watson
 IBM has been a viking in the field of Artificial Intelligence as it is working on this technology for a very long time.
The company has its own AI platform named Watson that comes housing numerous AI Tools for both business
users and developers. Watson is available as a set of open APIs, by which users can simply access a lot of starter
kits and sample codes. Users can use them to make virtual agents and cognitive search engines. Moreover, the
cherry on the cake for Watson is its chatbot building platform that is developed focusing on beginners and requires
little machine learning skills.
3. Caffe
 Caffe is a deep learning C++ framework that has been developed keeping modularity, expression, and speed in
mind. Talking about its working, Caffe’s focus remains stable on Convolutional Networks for computer vision
applications.
October 14, 2023 32

Continue……
4. Deeplearning4j
 Deeplearning4j is termed as the first open-source, commercial grade, distributed deep learning library
developed for Scala and Java. It's easy to use infrastructure makes it a panacea for non-researchers. The
most fascinating quality of DL4J is that it can import neural net models from many major frameworks via
Keras, which include Theano, Caffe, and TensorFlow.
5. Torch
 Torch is also an open source machine learning library, which is being used by many giant IT firms
including Yandex, IBM, Idiap Research Institute, & Facebook AI Research Group. It can also be termed
as a scientific computing framework and a script language that is based on Lua programming language.
After its successful execution on web platforms, Torch has also been extended for the use on iOS and
Android.
October 14, 2023 33

Learning System Model
Input
Samples
Learning
Method
System
Training
Testing
October 14, 2023 34

Training and Testing
Training set
(observed)
Universal set
(unobserved)
Testing set
(unobserved)
Data
acquisition
Practical
usage
October 14, 2023 35

Training and Testing
 Training is the process of making the system able to learn.
 No free lunch rule:
 Training set and testing set come from the same distribution
 Need to make some assumptions or bias
October 14, 2023 36

Algorithms
Supervised
learning
Unsupervised
learning
October 14, 2023 37

EXAMPLES OF ML
•Personalization: Online services like Amazon/Netflix
use AI to personalize our experience. They learn from
our, other users previous purchases and recommend
relevant content for us.
•Image recognition: ML can be used for face detection
in an image. There is a separate category for each
person in a database of several people.
•Medical diagnoses: ML is trained to recognize
cancerous tissues.
38
October 14, 2023

•Speech Recognition: It is translation of spoken words
in text. It is used in voice searches and more. Voice
user interfaces include voice dialing, call routing, and
appliance control. (also Natural language processing)
•Data mining: The application of ML methods to large
databases.
•Fraud detection: Banks use AI to determine strange
activity on our account. Unexpected activity, such as
foreign transactions, could be flagged by the algorithm.
39
October 14, 2023

DATA MINING (KDD)
40
October 14, 2023

KDD Process
October 14, 2023 41
Selection: Obtain data from various sources.
Preprocessing: Cleanse data.
Transformation: Convert to common format. Transform
to new format.
Data Mining: Obtain desired results.
Interpretation/Evaluation: Present results to user in
meaningful manner.

October 14, 2023 42
KDD Process: Several Key Steps
Many people treat data mining as a synonym for another popularly used term, Knowledge
Discovery from Data, or KDD. Alternatively, others view data mining as simply an essential
step in the process of knowledge discovery. Knowledge discovery as a process
is depicted in Figure 1.4 and consists of an iterative sequence of the following steps:
1. Data cleaning (to remove noise and inconsistent data)
2. Data integration (where multiple data sources may be combined)
3. Data selection (where data relevant to the analysis task are retrieved from the
database)
4. Data transformation (where data are transformed or consolidated into forms
appropriate for mining by performing summary or aggregation operations, for instance)
5. Data mining (an essential process where intelligent methods are applied in order to
extract data patterns)
6. Pattern evaluation (to identify the truly interesting patterns representing knowledge
based on some interestingness measures.
7. Knowledge presentation (where visualization and knowledge representation
techniques are used to present the mined knowledge to the user)
Steps 1 to 4 are different forms of data preprocessing, where the data are prepared for
mining. The data mining step may interact with the user or a knowledge base. The
interesting patterns are presented to the user and may be stored as new knowledge in the
knowledge base

•Email filtering: Email services use AI to filter
incoming emails. Users can train their spam filters
by marking emails as spam.
•Prediction: ML can be used in prediction systems.
Considering the loan example, to compute the
probability of a fault, the system will need to
classify the available data in groups.
•Computer vision, Computational biology, Robot
control, Handwriting recognition
43
October 14, 2023

History of ML
 1950 — Alan Turing creates the “Turing Test” to determine if a computer has
real intelligence. To pass the test, a computer must be able to fool a human into
believing it is also human.
 1952 — Arthur Samuel wrote the first computer learning program. The program
was the game of checkers, and the IBM computer improved at the game the more it
played, studying which moves made up winning strategies and incorporating those
moves into its program.
 1957 — Frank Rosenblatt designed the first neural network for computers (the
perceptron), which simulate the thought processes of the human brain.
 1967 — The “nearest neighbor” algorithm was written, allowing computers to
begin using very basic pattern recognition. This could be used to map a route for
traveling salesmen, starting at a random city but ensuring they visit all cities during a
short tour.
 1979 — Students at Stanford University invent the “Stanford Cart” which can
navigate obstacles in a room on its own.
 1981 — Gerald Dejong introduces the concept of Explanation Based Learning
(EBL), in which a computer analyses training data and creates a general rule it can
follow by discarding unimportant data.
 45
October 14, 2023

History of ML
 1985 — Terry Sejnowski invents NetTalk, which learns to pronounce words
the same way a baby does.
 1990s — Work on machine learning shifts from a knowledge-driven
approach to a data-driven approach. Scientists begin creating programs for
computers to analyze large amounts of data and draw conclusions — or “learn” —
from the results.
 1997 — IBM’s Deep Blue beats the world champion at chess.
 2006 — Geoffrey Hinton coins the term “deep learning” to explain new
algorithms that let computers “see” and distinguish objects and text in images
and videos.
 2010 — The Microsoft Kinect can track 20 human features at a rate of 30
times per second, allowing people to interact with the computer via movements
and gestures.
 2011 — IBM’s Watson beats its human competitors at Jeopardy.
 2011 — Google Brain is developed, and its deep neural network can learn to
discover and categorize objects much the way a cat does.
46
October 14, 2023

History of ML
 2012 – Google’s X Lab develops a machine learning algorithm that is able to
autonomously browse YouTube videos to identify the videos that contain cats.
 2014 – Facebook develops DeepFace, a software algorithm that is able to
recognize or verify individuals on photos to the same level as humans can.
 2015 – Amazon launches its own machine learning platform.
 2015 – Microsoft creates the Distributed Machine Learning Toolkit, which
enables the efficient distribution of machine learning problems across multiple
computers.
 2015 – Over 3,000 AI and Robotics researchers, endorsed by Stephen
Hawking, Elon Musk and Steve Wozniak (among many others), sign an open letter
warning of the danger of autonomous weapons which select and engage targets
without human intervention.
 2016 – Google’s artificial intelligence algorithm beats a professional player at
the Chinese board game Go, which is considered the world’s most complex board
game and is many times harder than chess. The AlphaGo algorithm developed by
Google DeepMind managed to win five games out of five in the Go competition.
47
October 14, 2023

Some Issues in Machine Learning
 What algorithms can approximate functions well (and when)?
 How does number of training examples influence accuracy?
 How does complexity of hypothesis representation impact it?
 How does noisy data influence accuracy?
 What are the theoretical limits of learnability?
 How can prior knowledge of learner help?
 What clues can we get from biological learning systems?
 How can systems alter their own representations?
 Understanding Which Processes Need Automation.
 Lack of Quality Data.
 Inadequate Infrastructure.
 Implementation.
 Lack of Skilled Resources.
48
October 14, 2023

TRADITIONAL PROGRAMMING VS ML
49
October 14, 2023

Machine Learning Approaches
50
October 14, 2023

Machine Learning Approaches
51
October 14, 2023

TYPES OF ML
Using data for answering questions
Training Predicting
52
October 14, 2023

Supervised vs. Unsupervised Learning
 Supervised learning (classification)
 Supervision: The training data (observations, measurements, etc.)
are accompanied by labels indicating the class of the observations
 New data is classified based on the training set
 No new class is generated
 Unsupervised learning (clustering)
 The class labels of training data is unknown
 Given a set of measurements, observations, etc. with the aim of
establishing the existence of classes or clusters in the data
 New classes can be generated.
October 14, 2023 53

1. SUPERVISED LEARNING
• Supervised Learning Algorithms are the ones that involve direct
supervision of the operation.
• Developer labels sample data and set strict boundaries upon which the
algorithm operates.
• Learn through examples of which we know the desired output (what we
want to predict).
• The primary purpose of supervised learning is to scale the scope of data
and to make predictions of unavailable, future or unseen data based on
labeled sample data
54
October 14, 2023

1. SUPERVISED LEARNING
• It is a spoon-fed version of machine learning:
 you select what kind of information output (samples) to “feed” the
algorithm;
 what kind of results it is desired (for example “yes/no” or
“true/false”).
• Example:
 Is this a cat or a dog?
 Are these emails spam or not?
 Predict the market value of houses, given the square meters, number
of rooms, neighborhood, etc.
55
October 14, 2023

TYPES OF SUPERVISED LEARNING
59
• Classification separates the data, Regression fits the data.
October 14, 2023

I. Classification (Categorial Target Variable) –
• Classification is the process where incoming data is labeled
based on past data samples and manually trains the algorithm to
recognize certain types of objects and categorize them
accordingly.
• The system has to know how to differentiate types of
information, perform an optical character, image, or binary
recognition (whether a particular bit of data is compliant or non-
compliant to specific requirements in a manner of “yes” or
“no”).
• eg. Medical Imaging.
60
October 14, 2023

II. Regression (Continuous Target Variable)
• Regression is the process of identifying patterns and
calculating the predictions of continuous outcomes.
• The system has to understand the numbers, their values,
grouping (for example, heights and widths), etc.
• eg. Housing Price Prediction
61
October 14, 2023

PROS & CONS OF SUPERVISED LEARNING
 PROS
• It allows to collect and produce data from previous experience.
• It is more trustworthy compared to unsupervised learning, which can be
computationally complex and less accurate in some instances.
 CONS
• Concrete examples are required for training classifiers.
• Decision boundaries can be over trained in absence of right examples.
• Difficulty in classifying big data.
62
October 14, 2023

EXAMPLE OF SUPERVISED
LEARNING ALGORITHMS
• Linear Regression
• k-Nearest Neighbor
• Naive Bayes
• Decision Trees
• Support Vector Machine (SVM)
• Random Forest
• Neural Networks (Deep learning)
63
October 14, 2023

October 14, 2023 64
Classification by Decision Tree Induction (DTI)
 DTI is the learning of decision trees from class_labeled training tuples.
 A decision tree is a flowchart-like tree structure, where each internal node
(non-leaf node) denotes a test on an attribute, each branch represents an
outcome of the test, and each leaf node (or terminal node) holds a class
label. The top most node is the root node.
 Why are DT Classifier so popular ?
 The construction of DT classifiers does not require any domain
knowledge or parameter setting, and therefore is appropriate for
exploratory knowledge discovery.
 DT can handle high dimensional data.
 Their representation of acquired knowledge in tree form is intuitive
and generally easy to assimilate by humans
 They have good accuracy.
 They may be used in medicine manufacturing, production, financial
analysis, astronomy and molecular biology.

October 14, 2023 65
Output: A Decision Tree for “buys_computer”
age?
overcast
student? credit rating?
<=30 >40
no yes yes
yes
31..40
fair
excellent
yes
no

October 14, 2023 66
Bayesian Classification: Why?
 A statistical classifier: performs probabilistic prediction, i.e., predicts
class membership probabilities( that a given tuple belongs to a particular
class)
 Foundation: Based on Bayes’ Theorem given by Thomas Bayes
 Performance: A simple Bayesian classifier, naïve Bayesian classifier,
has comparable performance with decision tree and selected neural
network classifiers.
 Class Conditional Independence : Naïve Bayesian Classifiers assume
that the effect of an attribute value on a given class is independent of the
values of the other attributes. This assumption is called class conditional
independence.
 Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct — prior
knowledge can be combined with observed data
 Standard: Even when Bayesian methods are computationally
intractable, they can provide a standard of optimal decision making
against which other methods can be measured
 Bayesian Belief Network: are graphical models that allow the
representation of dependencies among subsets of attributes

October 14, 2023 67
Naïve Bayesian Classification
 Naïve Bayes classifier use all the attributes
 Two assumptions:
 –Attributes are equally important
 – Attributes are statistically independent
i.e., knowing the value of one attribute
says nothing about the value of another
 Equally important & independence assumptions
are never correct in real-life datasets

October 14, 2023 68
Bayesian Theorem: Basics
 Let X be a data sample (“evidence”): class label is unknown
 Let H be a hypothesis that X belongs to class C
 E.g. Our world of tuples is confined to customers described by the attributes age and
income. X is a35 year old customer with an income $40,000.
 Classification is to determine P(H|X), the probability that the hypothesis holds given
the observed data sample X. P(H|X) reflects the probability that customer X will buy a
computer given that we know the customer’s age and income.
 P(H) (prior probability), the initial probability
 E.g., X will buy computer, regardless of age, income, …
 P(X): prior probability of X. probability that sample data is observed( that a person
from our set of customers is 35 years old and earns $40,000
 P(X|H) (posteriori probability), the probability of observing the sample X, given that
the hypothesis holds
 E.g., Given that X will buy computer, the prob. that X is 31..40, medium income

October 14, 2023 69
Bayesian Theorem
 Given training data X, posteriori probability of a hypothesis H,
P(H|X), follows the Bayes theorem
 Informally, this can be written as
posteriori = likelihood x prior/evidence
 Predicts X belongs to Ci iff the probability P(Ci|X) is the
highest among all the P(Ck|X) for all the k classes
 Practical difficulty: require initial knowledge of many
probabilities, significant computational cost
)
(
)
(
)
|
(
)
|
(
X
X
X
P
H
P
H
P
H
P 

October 14, 2023 70
Artificial Neural Networks
 Artificial Neural Networks (ANN) Started by psychologists and neurobiologists to develop
and test computational analogues of neurons
 Other names:
1.Connectionist learning 2.Prediction by N N 3. Adaptive networks,
4. Neural computation 5.Parallel distributed processing 6. Collective computation
 Artificial neural networks components:
 Units : A neural network is composed of a number of nodes, or units. It is Metaphor for
nerve cell body
 Links: Units connected by links. Links represent synaptic connections from one unit to
another
 Weight : Each link has a numeric weight

October 14, 2023 71
Genetic Algorithms (GA)
 Genetic Algorithm: based on an analogy to biological evolution
 An initial population is created consisting of randomly generated rules
 Each rule is represented by a string of bits
 E.g., if A1 and ¬A2 then C2 can be encoded as 100
 If an attribute has k > 2 values, k bits can be used
 Based on the notion of survival of the fittest, a new population is formed to
consist of the fittest rules and their offsprings
 The fitness of a rule is represented by its classification accuracy on a set of
training examples
 Offsprings are generated by crossover and mutation
 The process continues until a population P evolves when each rule in P
satisfies a prespecified threshold
 Slow but easily parallelizable

October 14, 2023 72
Genetic Algorithms
 A Genetic Algorithm (GA) is a computational model
consisting of five parts:
 A starting set of individuals, P.
 Crossover: technique to combine two parents to
create offspring.
 Mutation: randomly change an individual.
 Fitness: determine the best individuals.
 Algorithm which applies the crossover and
mutation techniques to P iteratively using the
fitness function to determine the best
individuals in P to keep.

What is the Support Vector Machine?
 “Support Vector Machine” (SVM) is a
supervised machine learning algorithm that can be
used for both classification or regression
challenges. However, it is mostly used in
classification problems. In the SVM algorithm, we
plot each data item as a point in n-dimensional
space (where n is a number of features you have)
with the value of each feature being the value of a
particular coordinate. Then, we perform
classification by finding the hyper-plane that
differentiates the two classes very.
 Support Vectors are simply the coordinates of
individual observation. The SVM classifier is a
frontier that best segregates the two classes
(hyper-plane/ line).
73
October 14, 2023

2. UNSUPERVISED LEARNING
• Unsupervised learning feeds on unlabeled data.
• Supervised Learning needs to know the results and sort
out the data, whereas in unsupervised machine learning
algorithms the desired results are unknown and yet to
be defined.
• As no teacher is provided that means no training will
be given to the machine. Therefore machine is restricted
to find the hidden structure in unlabeled data by itself.
74
October 14, 2023

2. UNSUPERVISED LEARNING
• The unsupervised machine learning algorithm is used
for:
 exploring the structure of the information;
 extracting valuable insights;
 detecting patterns;
 descriptive modeling.
• Eg. I have photos and want to put them in 20 groups.
75
October 14, 2023

TYPES OF UNSUPERVISED LEARNING
76
October 14, 2023

I. Clustering(Target Variable not available) –
• It is an exploration of data used to segment it into meaningful
groups (i.e., clusters) based on their internal patterns without
prior knowledge of group credentials.
• The credentials are defined by similarity of individual data
objects and also aspects of its dissimilarity from the rest.
• eg. Customer segmentation -grouping customers by purchasing
behavior.
77
October 14, 2023

II. Association(Target Variable not available) –
• An association rule learning problem is where you want to
discover rules that describe large portions of your data, such as
people that buy X also tend to buy Y.
eg. Market Basket Analysis
78
October 14, 2023

EXAMPLE OF UNSUPERVISED
LEARNING ALGORITHMS
•PCA
•t-SNE
•k-means
•DBSCAN
•Apriori algorithm
• FP – Growth
Dimensionality reduction: There is a lot of noise in the incoming
data. Machine learning algorithms use dimensionality reduction to
remove this noise while distilling the relevant information.
79
October 14, 2023

REINFORCEMENT LEARNING
• Reinforcement learning is about taking suitable action to
maximize reward in a particular situation.
• It uses exploration/exploitation. Action takes place,
consequences are observed and the next action considers
the results of first action.
• In supervised learning, training data has answer key with it
so the model is trained with correct answer itself. Whereas,
in reinforcement learning there is no answer but agent
decides what to do to perform given task. In the absence of
a training dataset, it is bound to learn from its experience.
80
October 14, 2023

• Agent is an assumed entity which performs actions in an
environment to gain some reward.
• Environment is a scenario that
 an agent has to face & gives
 feedback via positive or negative
 reward signal.
• State (s) is the current situation returned by the environment.
81
October 14, 2023

• Two main types of reward signals are:
 Positive reward signal encourages continuing performance in a
particular sequence of action.
 Negative reward signal penalizes for performing certain activities
and urges to correct algorithm to stop getting penalties.
• However, the function of reward signal may vary
depending on the nature of information.
• Overall, the system tries to maximize positive rewards and
minimize the negatives.
 https://www.yotube.com/watch?v=KiHdKynXDtw 82
October 14, 2023

• Input: It is initial state from which model will start.
• Output: There are many possible output as there are variety of solution to a
particular problem.
• Training: Training is based upon input, the model will return a state and
user will decide to reward or punish the model based on its output.
• The model keeps continues to learn.
• The best solution is decided based on the maximum reward.
83
October 14, 2023

Various Practical applications of Reinforcement
Learning –
• RL can be used in robotics for industrial automation.
• RL can be used in machine learning and data processing.
• RL can be used to create training systems that provide
custom instruction and materials according to the
requirement of students.
84
October 14, 2023

There are two important learning models
reinforcement learning:
• Markov Decision Process
• Q learning
85
October 14, 2023

APPLICATIONS OF
SUPERVISED LEARNING,
UNSUPERVISED LEARNING
& REINFORCEMENT
LEARNING
86
October 14, 2023

88
TYPES OF ML
88
UNSUPERVISED LEARNING
(Data Driven)
(Identitfy Clusters)
•Clustering
SVD
PCA
K - Means
•Dimensionality Reduction
Text Mining
Face Recognition
Big Data Visualization
Image Recognition
•Association Analysis
Apriori
FP – Growth
•Hidden Markov Model
REINFORCEMENT
LEARNING
(Learn from errors)
•Dynamic Programming
•Monte Carlo Tree Search
(MCTS)
•Heuristic Methods
•Q-Learning;
•Deep Adversarial
Networks
•Temporal Difference (TD)
•Asynchronous Actor-Critic
Agents (A3C)
SUPERVISED LEARNING
(Task Driven)
(Predict next values)
•Regression
Linear
Polynomial
•Decision Tree
•Random Forest
•Classification
KNN
Trees
Logistic Regression
NaiveBayes
SVM
October 14, 2023

STEPS TO SOLVE A MACHINE
LEARNING PROBLEM
89
Data Gathering Collect data from various sources
Data Preprocessing Clean data to have homogeneity
Feature Engineering Making your data more useful
Algorithm Selection &
Training
Selecting the right machine learning
model
Making Predictions Evaluate the model
October 14, 2023

1. Data Gathering
• Might depend on human work-
 Manual labeling for supervised learning.
 Domain knowledge. Maybe even experts.
• May come for free, or “sort of”
 E.g., Machine Translation.
• The more the better: Some algorithms need large amounts
of data to be useful (e.g., neural networks).
• Quantity and quality of data dictate model accuracy.
90
October 14, 2023

2. Data Preprocessing
• Is there anything wrong with the data?
 Missing values
 Outliers
 Bad encoding (for text)
 Wrongly-labeled examples
 Biased data
• Do I have many more
samples of one class than the rest?
• Need to fix/remove data?
91
October 14, 2023

3. Feature Engineering
• A feature is an individual measurable property of a
phenomenon being observed.
• Our inputs are represented by a set of features.
• To classify spam email, features could be:
 Number of words that have been ch4ng3d like this.
 Language of the email (0=English, 1=Spanish).
 Number of emojis.
92
October 14, 2023

3. Feature Engineering
• Extract more information from existing data-
 Make it more useful
 With good features, most algorithms can learn faster
• Requires thought and knowledge of the data
• Two steps:
 Variable transformation (e.g., dates into weekdays,
normalizing)
 Feature creation (e.g., n-grams for texts, if word is
capitalized to detect names, etc.) 93
October 14, 2023

4. Algorithm Selection & Training
94
• Supervised
•Linear classifier
•Naive Bayes
•Support Vector Machines
(SVM)
•Decision Tree
•Random Forests
•k-Nearest Neighbors
•Neural Networks (Deep
learning)
• Unsupervised
•PCA
•t-SNE
•k-means
•DBSCAN
•Apriori algorithm
•FP – Growth
• Reinforcement
•SARSA–λ
•Q-Learning
•Markov Decision
Process
October 14, 2023

4. Algorithm Selection & Training
• Goal of training: making the correct prediction as often as
possible .
• Incremental improvement:
• Use of metrics for evaluating performance and comparing
solutions.
• Hyperparameter tuning (A hyperparameter is a parameter whose value
is used to control the learning process)
95
October 14, 2023

5. Making Predictions
96
October 14, 2023

October 14, 2023 97
Type of data in clustering analysis
 Interval-Scaled Attributes
 Binary Attributes
 Nominal Attributes
 Ordinal Attributes
 Ratio-Scaled Attributes
 Attributes of Mixed Type

October 14, 2023 98
Data Types
Interval-Scaled Attributes
 Continuous measurements on a roughly
linear scale Example
Height Scale Weight Scale
1. Scale ranges over the
metre or foot scale
2. Need to standardize
heights as different scale
can be used to express
same absolute
measurement
1. Scale ranges over the
kilogram or pound scale
20kg
40kg
60kg 100kg
80kg 120kg

October 14, 2023 99
Binary Variables
 A contingency table for binary
data
 Distance measure for
symmetric binary variables:
 Distance measure for
asymmetric binary variables:
 Jaccard coefficient (similarity
measure for asymmetric binary
variables):
d
c
b
a
c
b
j
i
d





)
,
(
c
b
a
c
b
j
i
d




)
,
(
p
d
b
c
a
sum
d
c
d
c
b
a
b
a
sum




0
1
0
1
Object i
Object j
c
b
a
a
j
i
simJaccard



)
,
(

October 14, 2023 100
Nominal / Categorical Variables
 A generalization of the binary variable in that it can take more
than 2 states, e.g., red, yellow, blue, green
 Method 1: Simple matching
 m: # of matches, p: total # of variables
 Method 2: use a large number of binary variables
 creating a new binary variable for each of the M nominal
states
p
m
p
j
i
d 

)
,
(

October 14, 2023 Data Mining: Concepts and Techniques 101
Ratio-Scaled Variables
 Ratio-scaled variable: a positive measurement on a nonlinear
scale, approximately at exponential scale, such as AeBt or Ae-Bt
 Methods:
 treat them like interval-scaled variables—not a good
choice! (why?—the scale can be distorted)
 apply logarithmic transformation
yif = log(xif)
 treat them as continuous ordinal data treat their rank as
interval-scaled

S.No Machine Learning Deep Learning
1. Machine Learning is a superset of Deep Learning Deep Learning is a subset of Machine Learning
2.
The data represented in Machine Learning is
quite different as compared to Deep Learning as
it uses structured data
The data representation is used in Deep
Learning is quite different as it uses neural
networks(ANN).
3. Machine Learning is an evolution of AI
Deep Learning is an evolution to Machine
Learning. Basically it is how deep is the
machine learning.
4.
Machine learning consists of thousands of data
points.
Big Data: Millions of data points.
5.
Outputs: Numerical Value, like classification of
score
Anything from numerical values to free-form
elements, such as free text and sound.
6.
Uses various types of automated algorithms that
turn to model functions and predict future action
from data.
Uses neural network that passes data through
processing layers to the interpret data features
and relations.
7.
Algorithms are detected by data analysts to
examine specific variables in data sets.
Algorithms are largely self-depicted on data
analysis once they’re put into production.
8.
Machine Learning is highly used to stay in the
competition and learn new things.
Deep Learning solves complex machine
learning issues.
October 14, 2023 102

ML Unit 1 CS.ppt

Recommended

Recommended

More Related Content

Similar to ML Unit 1 CS.ppt

Similar to ML Unit 1 CS.ppt (20)

Recently uploaded

Recently uploaded (20)

ML Unit 1 CS.ppt