SlideShare a Scribd company logo
1 of 51
Download to read offline
Lecture - 1
CSEE-4142 & PHDCS-834: Machine Learning
CSEC-4142 & PHDCS-4 @2022
Introduction
• Analytics is the systematic computational analysis of data or statistics.
• It is used for the discovery, interpretation, and communication of meaningful patterns in
data.
• It also entails applying data patterns towards effective decision-making.
• It can be valuable in areas rich with recorded information
• Analytics relies on the simultaneous application of statistics, computer programming and
operation research to qualify performance.
CSEC-4142 & PHDCS-4 @2022
Data Analysis vs Data Analytics
• Data analysis focuses on the process of examining past data through collection, inspection,
modelling and questioning.
• It is a subset of data analytics, which takes multiple data analysis processes to focus on why
an event happened and what may happen in the future based on the previous data.
• Data analytics is used to formulate larger organization decisions.
• Data analytics use extensive computer skills, mathematics, statistics, descriptive techniques
and predictive models to gain valuable knowledge from data through analytics.
CSEC-4142 & PHDCS-4 @2022
Techniques and Tools of Analytics
• Analytics is a collection of techniques and tools for creating value from data.
• The Techniques included concepts such as Artificial Intelligence (AI), Machine Learning
(ML) and Deep Learning (DL) algorithms
• Artificial Intelligence is the algorithms and systems that exhibit human like intelligence
• Machine learning is a subset of AI that can learn to perform a task with extracted data and or
models
• Deep Learning is a subset of machine learning that imitate the functioning of human brain to
solve problem
CSEC-4142 & PHDCS-4 @2022
Relationship between AI, ML and DL (?)
CSEC-4142 & PHDCS-4 @2022
Introduction to Machine Learning
• Machine learning (ML) is an area in computer science which involves teaching computers to
do things naturally by teaching through experience.
• Machine Learning deals with systems that are trained from data rather than being explicitly
programmed.
• It is centred on the creation of algorithms that can learn from past experience and data, as
opposed to a computer being pre-programmed to carry out a task in a specific manner.
• Put simply, if a computer program improves with experience then we can say that it has
learned.
CSEC-4142 & PHDCS-4 @2022
Definition
• Tom Mitchell, one of the patrons of machine learning define machine learning as “A
computer program is said to learn from experience ‘E’, with respect to some class of tasks
‘T’ and performance measure ‘P’ if its performance at tasks in ‘T’ as measured by ‘P’
improves with experience ‘E’.
• Stanford University define it as “Machine learning is the science of getting computers to
act without being explicitly programmed”.
• McKinsey et al state that “Machine learning is based on algorithm that can learn from data
without relying on rule based program”
CSEC-4142 & PHDCS-4 @2022
Example-1: Handwriting Recognition
Task T : Recognising and classifying handwritten words within images
Performance P : Percent of words correctly classified
Training experience E: A dataset of handwritten words with given classifications
CSEC-4142 & PHDCS-4 @2022
Example-2: Driving Robot Training
Task T : Driving on highways using vision sensors
Performance measure P : Average distance traveled before an error
Training experience E: A sequence of images and steering commands recorded
while observing a human driver
CSEC-4142 & PHDCS-4 @2022
Example-3: Chess Learning
Task T : Playing chess
Performance measure P : Percent of games won against opponents
Training experience E: Playing practice games against itself
CSEC-4142 & PHDCS-4 @2022
Basic components of learning process
• The learning process can be divided into four components:
• data storage
• abstraction
• generalization
• evaluation
Fig 1. Components of Learning Process
CSEC-4142 & PHDCS-4 @2022
Data Storage
• Facilities for storing and retrieving huge amounts of data are an important
component of the learning process. Humans and computers alike utilize data
storage as a foundation for advanced reasoning.
• In human being, the data is stored in the brain and data is retrieved using
electrochemical signals.
• Computers use hard disk drives, flash memory, random access memory and similar
devices to store data and retrieve data.
CSEC-4142 & PHDCS-4 @2022
Abstraction
• The second component of the learning process is known as abstraction.
• Abstraction is the process of extracting knowledge about stored data. This involves
creating general concepts about the data as a whole.
• The creation of knowledge involves application of known models and creation of
new models.
• The process of fitting a model to a dataset is known as training.
• When the model has been trained, the data is transformed into an abstract form that
summarizes the original information.
CSEC-4142 & PHDCS-4 @2022
Generalization
• The third component of the learning process is known as generalization.
• The term generalization describes the process of turning the knowledge about
stored data into a form that can be utilized for future action.
• These actions are to be carried out on tasks that are similar, but not identical, to
those what have been seen before.
• In generalization, the goal is to discover those properties of the data that will be
most relevant to future tasks.
CSEC-4142 & PHDCS-4 @2022
Evaluation
• Evaluation is the last component of the learning process.
• It is the process of giving feedback to the user to measure the utility of the learned
knowledge.
• This feedback is then utilized to improve in the whole learning process.
CSEC-4142 & PHDCS-4 @2022
Applications of machine learning
• In retail business, machine learning is used to study consumer behaviour.
• In finance, banks analyze their past data to build models to use in credit
application fraud detection and the stock market.
• In manufacturing, learning models are used for optimization, control, and
troubleshooting.
• In medicine, learning programs are used for medical diagnosis.
CSEC-4142 & PHDCS-4 @2022
Applications of machine learning (cont.)
• In telecommunications, call patterns are analyzed for network optimization and
maximizing the quality of service.
• It is used to find solutions to many problems in vision, speech recognition, and
robotics.
• Machine learning methods are applied in the design of computer-controlled
vehicles
• Machine learning methods have been used to develop programmes for playing games
CSEC-4142 & PHDCS-4 @2022
Understanding data
• Unit of observation
 Unit of observation means the smallest entity with measurable properties of interest for a
study.
 Examples:
o A person, an object or a thing
o A time point (second, minute, day, month, year etc.)
o A geographic region
o A measurement (Height, weight etc.)
• Examples
 An “example” is an instance of the unit of observation for which properties have
been recorded. An “example” is also referred to as an “instance”, or “case” or
“record.”
CSEC-4142 & PHDCS-4 @2022
Understanding data (cont.)
• Features
 A “feature” is a recorded property or a characteristic of examples. It is also referred to as
“attribute”, or “variable” or “feature.”
• Examples and features are generally collected in a “matrix format”.
Table 1. Examples and Features
CSEC-4142 & PHDCS-4 @2022
Understanding data (cont.)
• Consider the problem of developing an algorithm for detecting cancer. In this study we note the
following.
 The units of observation are the patients.
 The examples are patients investigated for cancer (both positive and negative).
 The following attributes of the patients may be chosen as the features:
o gender
o age
o blood pressure
o the findings of the pathology report after a biopsy
CSEC-4142 & PHDCS-4 @2022
Understanding data (cont.)
• Spam e-mail detection
 The unit of observation could be an e-mail messages.
 The examples would be specific messages (both spam and genuine email).
 The features might consist of the words used in the messages.
CSEC-4142 & PHDCS-4 @2022
Different forms of features (data)
• Numeric
 If a feature represents a characteristic measured in numbers, it is called a numeric feature.
o Example: “year”, “price” and “mileage” (ref. to Table-1)
• Categorical or nominal
 A categorical feature is an attribute that can take on one of a limited, and usually fixed,
number of possible values on the basis of some qualitative property. A categorical feature is
also called a nominal feature.
o Example: “model”, “color” and “transmission” (ref. to Table-1)
• Ordinal data
 This denotes a nominal variable with categories falling in an ordered list.
o Examples include clothing sizes such as small, medium, and large
o measurement of customer satisfaction on a scale from “not at all happy” to “very happy.”
CSEC-4142 & PHDCS-4 @2022
Mathematical Basics: Probability Theory
• Broadly speaking, probability theory is the mathematical study of uncertainty
• It plays the central role in machine learning and pattern recognition as the design of learning
algorithms often relies on probability assumption of data
CSEC-4142 & PHDCS-4 @2022
Probability Space
• When we speak about probability, we often refer to the probability of an event of
uncertain nature taking place
• Therefore, in order to discuss probability theory formally, we must first classify what
the possible events are to which we would like to attach probability.
• Formally, a probability space is defined by the triple (Ω, Ƒ, p), where
Ω is the space of possible outcomes or outcome space.
ℱ ⊆ 2Ω is the space of measurable events or event space.
P is the probability measure or probability distribution that maps
an event 𝐸𝐸 ∈ ℱ to a real value between 0 to 1.
CSEC-4142 & PHDCS-4 @2022
Probability Space
Given the outcome space Ω, there are some restrictions as to what subset of 2Ω
can be
considered as event space Ƒ:
(i) Trivial event Ω and the empty event 𝜙𝜙 is in Ƒ
(ii) The event space Ƒ is closed under union, that 𝛼𝛼, 𝛽𝛽 ∈ ℱ then 𝛼𝛼 ∪ 𝛽𝛽 ∈ ℱ
(iii) The event space ℱ is closed under complement, i.e. if α ∈ ℱ then Ω − 𝛼𝛼 ∈ ℱ
CSEC-4142 & PHDCS-4 @2022
Probability Space (cont.)
• Given the event space Ƒ, the probability measures P must satisfy certain axioms:
i) Non-negativity: For all 𝛼𝛼 ∈ ℱ, 𝑝𝑝(𝛼𝛼) ≥ 0
ii) Trivial event: 𝑝𝑝 Ω = 1
iii) Additivity : For all 𝛼𝛼, 𝛽𝛽 ∈ ℱ and 𝛼𝛼 ∩ 𝛽𝛽 = 𝜙𝜙, P 𝛼𝛼 ∪ 𝛽𝛽 = 𝑝𝑝 𝛼𝛼 + 𝑝𝑝 𝛽𝛽
CSEC-4142 & PHDCS-4 @2022
Random variable
• Random variables play important role in probability theory
• The most important fact is the random variables are not variables but functions, that map
outcomes (or outcome space) to real values.
• Random variables are denoted by capital letter
Example:
• Consider the process of throwing a dice. Let X be a random variable that depends on the
outcome of the throw
• A natural choice for X would be to map the outcome 𝑖𝑖 to the value 𝑖𝑖, that is mapping the
event of throwing an ‘one’ to 1.
• Suppose we have another random variable Y that maps all outcome to 0
• Suppose we have yet another random variable Z that maps all even outcome to 2𝑖𝑖 and odd
outcomes to (-𝑖𝑖).
CSEC-4142 & PHDCS-4 @2022
Random variable (cont.)
• In other sense, random variables allow us to abstract away the formal notation of event space, as
we can define random variables that captures the appropriate events.
• For example, consider the event space of odd and even dice throw. We could have a random
variable that takes on value 1 if outcome 𝑖𝑖 is even and 0 otherwise.
• These type of binary random variables are very common and known as indicator variable as
they indicates whether certain event happened or not.
CSEC-4142 & PHDCS-4 @2022
Joint Probability
• Joint probability is a statistical measure that calculates the likelihood of two events
occurring together at the same point of time
• We denote the probability of X taking a and Y taking b by (p(X=a, Y=b) or 𝑝𝑝𝑥𝑥,𝑦𝑦 𝑎𝑎, 𝑏𝑏
• Example. Let random variable X be defined on the outcome space Ω of a dice throw
𝑝𝑝𝑥𝑥 1 = 𝑝𝑝𝑥𝑥 2 = 𝑝𝑝𝑥𝑥 3 … . = 𝑝𝑝𝑥𝑥 6 =
1
6
Let Y be an indicator variable that takes a value 1 if a coin flip turns head and 0 if tail
Assuming both dice and coin are fair, the probability distribution of X and Y is given by
CSEC-4142 & PHDCS-4 @2022
Joint Probability
P X=1 X=2 X=3 X=4 X=5 X=6 𝒑𝒑𝒙𝒙(𝒊𝒊)
Y=0 1
12
1
12
1
12
1
12
1
12
1
12
1
2
Y=1 1
12
1
12
1
12
1
12
1
12
1
12
1
2
𝒑𝒑𝒚𝒚(𝒋𝒋) 1
6
1
6
1
6
1
6
1
6
1
6
Here , 𝑝𝑝 𝑋𝑋 = 1, 𝑌𝑌 = 0 =
1
6
∗
1
2
=
1
12
𝒑𝒑𝒙𝒙 𝒊𝒊 , 𝒑𝒑𝒚𝒚(𝒋𝒋) are called marginal probability
Definition: Marginal Probability Distribution
• The marginal distribution refers to the probability distribution of a random
variable on its own
• Given a joint distribution, say over random variables X and Y, we can find the
marginal distribution of X or that of Y
CSEC-4142 & PHDCS-4 @2022
Joint Probability
• If X and Y are discrete random variables and p(𝑥𝑥, 𝑦𝑦) is the value of their joint probability
distribution at (𝑥𝑥, 𝑦𝑦) the function given by
𝑝𝑝 𝑥𝑥 = �
𝑏𝑏∈𝑣𝑣𝑣𝑣𝑣𝑣(𝑦𝑦)
𝑝𝑝(𝑥𝑥, 𝑦𝑦 = 𝑏𝑏)
and
𝑝𝑝 𝑦𝑦 = �
𝑎𝑎∈𝑣𝑣𝑣𝑣𝑣𝑣(𝑥𝑥)
𝑝𝑝(𝑥𝑥 = 𝑎𝑎, 𝑦𝑦)
𝑝𝑝 𝑥𝑥 and 𝑝𝑝 𝑦𝑦 are the marginal distribution of X and Y respectively.
CSEC-4142 & PHDCS-4 @2022
Conditional Distribution
• Conditional distributions specify the distribution of a random variable when the value of
another random variable is known or more generally when some events are know to be true.
• Formally, conditional probability of X=a given Y=b is defined as:
𝑝𝑝 𝑋𝑋 = 𝑎𝑎 𝑌𝑌 = 𝑏𝑏 =
𝑝𝑝(𝑋𝑋 = 𝑎𝑎, 𝑌𝑌 = 𝑏𝑏)
𝑝𝑝(𝑌𝑌 = 𝑏𝑏)
• This is not defined when 𝑝𝑝 𝑌𝑌 = 𝑏𝑏 = 0
CSEC-4142 & PHDCS-4 @2022
Some Important Theorems on Probability
Theorem-1: If A and B are mutually exclusive events, then 𝑝𝑝 𝐴𝐴 ∪ 𝐵𝐵 = 𝑝𝑝 𝐴𝐴 + 𝑝𝑝(𝐵𝐵)
Theorem-2: If two events A and B are exhaustive and mutually exclusive, then p 𝐴𝐴 +
𝑝𝑝 𝐵𝐵 = 1
Theorem-3: For any event A, p 𝐴𝐴′ = 1 − 𝑝𝑝(𝐴𝐴)
Theorem-4: If event A implies event B, then
(i)𝑝𝑝 𝐴𝐴 ≤ 𝑝𝑝(𝐵𝐵)
(ii) 𝑝𝑝 𝐵𝐵 − 𝐴𝐴 = 𝑝𝑝 𝐵𝐵 − 𝑝𝑝(𝐴𝐴)
Theorem-5: Suppose A and B are events that are not necessarily mutually exclusive, then
𝑝𝑝 𝐴𝐴 ∪ 𝐵𝐵 = 𝑝𝑝 𝐴𝐴 + 𝑝𝑝 𝐵𝐵 − 𝑝𝑝 𝐴𝐴 ∩ 𝐵𝐵
𝐴𝐴 → 𝐵𝐵 i. e. , 𝐴𝐴 ⊂ 𝐵𝐵
CSEC-4142 & PHDCS-4 @2022
Independence & Chain Rule
Independence
• In probability theory, independence means that the distribution of a random variable does
not change on learning t he value of another variable
• Mathematically if
𝑝𝑝 𝐴𝐴 = 𝑝𝑝 𝐴𝐴 𝐵𝐵
Then A and B are independent.
Example: The result of successive tosses of a coin is independent
Chain Rule
The Chain Rule is often used to evaluate the joint probability of some random variables and is
specially useful when there are conditional independence across variables
𝑃𝑃 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3, … . , 𝑥𝑥𝑛𝑛 = p(𝑥𝑥1).p(𝑥𝑥2|𝑥𝑥1) … 𝑝𝑝(𝑥𝑥𝑛𝑛|𝑥𝑥1𝑥𝑥2 … 𝑥𝑥𝑛𝑛−1)
CSEC-4142 & PHDCS-4 @2022
Bayes Rule
• The Bayes Rule allows us to compute the conditional probability 𝑝𝑝 𝑥𝑥 𝑦𝑦 from 𝑝𝑝 𝑦𝑦 𝑥𝑥 in a
sense inverting the condition.
𝑝𝑝 𝑥𝑥 𝑦𝑦 =
𝑝𝑝 𝑦𝑦 𝑥𝑥 𝑝𝑝(𝑥𝑥)
𝑝𝑝(𝑦𝑦)
• 𝒑𝒑 𝒙𝒙 𝒚𝒚 is called the posterior; this is what we are trying to estimate. For example,
“probability of having cancer given that the person is a smoker”.
• 𝒑𝒑 𝒚𝒚 𝒙𝒙 is called the likelihood; this is the probability of observing the new evidence, given
our initial hypothesis. In the above example, this would be the “probability of being a smoker
given that the person has cancer”.
• 𝒑𝒑(𝒙𝒙) is called the prior; this is the probability of our hypothesis without any additional prior
information. In the above example, this would be the “probability of having cancer”.
• 𝒑𝒑(𝒚𝒚) is called the marginal likelihood; this is the total probability of observing the evidence.
In the above example, this would be the “probability of being a smoker”. In many applications
of Bayes Rule, this is ignored, as it mainly serves as normalization.
CSEC-4142 & PHDCS-4 @2022
Probability Distribution
• The probability distribution describes how the total probability is distributed over the possible
values of a random variable
• It tells us what the possible values of a random variable X are and how the probability are
assigned to those values.
• The set of possible values of a random variable together with their respective probability is
called the probability distribution of the random variable.
• Probability distribution can be continuous or random.
• The probability distribution of a continuous random variable is called continuous probability
distribution and the probability distribution of a random variable is called discrete probability
distribution
CSEC-4142 & PHDCS-4 @2022
PMF & CDF
Probability Mass Function (PMF)
The probability mass function (pmf) of a discrete random variable X is a function denoted by
𝑝𝑝 𝑥𝑥 or 𝑓𝑓 𝑥𝑥 , that satisfies:
i) 𝑝𝑝 𝑥𝑥 ≥ 0 for all values of X and
ii) ∑𝑖𝑖 𝑝𝑝(𝑥𝑥𝑖𝑖) = 1
Where 𝑝𝑝 𝑥𝑥 = 𝑝𝑝(𝑋𝑋 = 𝑥𝑥), is the probability that random variable X takes the value x
Cumulative Distribution Function (CDF)
For any real number x, the function 𝐹𝐹 𝑥𝑥 = 𝑝𝑝 𝑋𝑋 ≤ 𝑥𝑥 is known as the cumulative distribution
function (cdf) or simply distribution function.
CSEC-4142 & PHDCS-4 @2022
Mean and Variance of a Random Variable
• Mean of a random variable X, denoted by µ, describes where the population distribution of
X is centered.
• Mean or expectation of a discrete random variable X having p.m.f. p(x) is defined as
𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = �
𝑥𝑥
𝑥𝑥. 𝑝𝑝(𝑥𝑥)
• Provided the sum exists, otherwise we say that mean does not exist.
Sum exist mean sum is
absolutely convergent
∑𝐱𝐱 𝐱𝐱. 𝐩𝐩(𝐱𝐱) < ∞
Variance (𝜎𝜎2
) of a discrete random variable X is defined as
𝜎𝜎2
= 𝑉𝑉 𝑋𝑋 = 𝐸𝐸(𝑋𝑋 − 𝐸𝐸 𝑋𝑋 )2
= 𝐸𝐸(𝑋𝑋 − 𝜇𝜇)2
= �
𝑥𝑥
(𝑥𝑥 − 𝜇𝜇)2
𝑝𝑝(𝑥𝑥)
Or 𝜎𝜎2
= 𝑉𝑉 𝑋𝑋 = 𝐸𝐸(𝑋𝑋 − 𝐸𝐸 𝑋𝑋 )2
= 𝐸𝐸 𝑋𝑋2
− 𝐸𝐸2
𝑋𝑋 = ∑𝑥𝑥 𝑥𝑥2
𝑝𝑝 𝑥𝑥 − 𝜇𝜇2
The standard deviation (𝜎𝜎) of a random variable is the positive square root of its variance, i.e.
𝜎𝜎 = + 𝜎𝜎2
CSEC-4142 & PHDCS-4 @2022
Median and Mode of a Random Variable
• Median of a random variable X is the value below (or above) which half of the total
probability lies.
• This if F(x) is the cumulative distribution function of a random variable X and ̅
𝜇𝜇 be the
median, then
𝐹𝐹 ̅
𝜇𝜇 =
1
2
Or we may write
𝑝𝑝 𝑋𝑋 ≤ ̅
𝜇𝜇 ≥
1
2
and 𝑝𝑝 𝑋𝑋 ≥ ̅
𝜇𝜇 ≥
1
2
• Mode of a random variable X is the value which occurs with the highest probability, i.e. the
value of X at which pmf 𝑓𝑓 𝑥𝑥 is maximum.
• Thus, if f 𝑥𝑥 be a pmf of a random variable X and �
𝜇𝜇 be the mode, then
𝑓𝑓 �
𝜇𝜇 ≥ 𝑓𝑓 𝑥𝑥 , ∀𝑥𝑥
• Mode is not unique
CSEC-4142 & PHDCS-4 @2022
Conditional Expectation
• The conditional expectation of X, given that the event A has happed is defined as
𝐸𝐸 𝑋𝑋 𝐴𝐴 = �
𝑥𝑥|𝐴𝐴
𝑥𝑥. 𝑝𝑝(𝑋𝑋 = 𝑥𝑥|𝐴𝐴)
Where (𝑋𝑋 = 𝑥𝑥|𝐴𝐴) indicates the value of X that are favourable to A.
CSEC-4142 & PHDCS-4 @2022
Probability Density Function
• To define a continuous distribution, we make use of probability density function (PDF).
• A probability density function f, is a non-negative, integrable function such that
�
𝑣𝑣𝑎𝑎𝑎𝑎(𝑋𝑋)
𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑 = 1
The probability of a random variable X distributed according to a PDF f is computed as
follows:
𝑃𝑃 𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏 = �
𝑎𝑎
𝑏𝑏
𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑
This implies that the probability of a continuously distributed random variable taking on any
given single value is zero.
CSEC-4142 & PHDCS-4 @2022
Skewness and Kurtosis
• Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution,
or data set, is symmetric if it looks the same to the left and right of the center point.
• Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal
distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data
sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution
would be the extreme case.
• A heavy tailed distribution has a tail that’s heavier than an exponential
distribution (Bryson, 1974).
• In other words, a distribution that is heavy tailed goes to zero slower than one with lighter
tails.
• Heavy tailed distributions tend to have many outliers. An outlier is a data point that differs
significantly from other observations
• The histogram is an effective graphical technique for showing both the skewness and
kurtosis of data set.
CSEC-4142 & PHDCS-4 @2022
Skewness and Kurtosis (cont.)
CSEC-4142 & PHDCS-4 @2022
Symmetrical and Asymmetrical Distribution
• A symmetric distribution is a type of distribution where the left side of the distribution mirrors
the right side.
• The normal distribution is symmetric. It is also a unimodal distribution (it has one peak).
• Distributions don’t have to be unimodal to be symmetric. They can be bimodal (two peaks)
or multimodal (many peaks). The following bimodal distribution is symmetric, as the two
halves are mirror images of each other.
• In a symmetric distribution, the mean, mode and median all fall at the same point. The mode is
the most common number and it matches with the highest peak
• An exception is the bimodal distribution. The mean and median are still in the center, but there
are two modes: one on each peak.
CSEC-4142 & PHDCS-4 @2022
Symmetrical and Asymmetrical Distribution (cont.)
• If one tail is longer than another, the distribution is skewed. These distributions are sometimes
called asymmetric or asymmetrical distributions as they don’t show any kind of symmetry.
• A left-skewed distribution has a long left tail. Left-skewed distributions are also
called negatively-skewed distributions. That’s because there is a long tail in the negative
direction on the number line. The mean is also to the left of the peak.
• A right-skewed distribution has a long right tail. Right-skewed distributions are also called
positive-skew distributions. That’s because there is a long tail in the positive direction on the
number line. The mean is also to the right of the peak.
CSEC-4142 & PHDCS-4 @2022
Symmetrical and Asymmetrical Distribution (cont.)
• A left-skewed, negative distribution will have the mean to the left of the median.
• A right-skewed distribution will have the mean to the right of the median.
Real life distributions are usually skewed.
• If the skewness is high, and many statistical techniques don’t work.
• As a result, advanced mathematical techniques like logarithms are used.
CSEC-4142 & PHDCS-4 @2022
Some important Distribution: Binomial
• A binomial experiment is one that possesses the following properties:
1) The experiment consists of n repeated trials
2) Each trial results in an outcome that may be classified as a success or a failure (hence the
name, binomial)
3) The probability of a success, denoted by p, remains constant from trial to trial and repeated
trials are independent.
• The number of successes X in n trials of a binomial experiment is called a binomial random
variable.
• The probability distribution of the random variable X is called a binomial distribution, and
is given by the formula:
𝑃𝑃 𝑋𝑋 =
𝑛𝑛
𝑥𝑥
𝑝𝑝𝑥𝑥(1 − 𝑝𝑝)𝑛𝑛−𝑥𝑥, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑥𝑥 = 0, 1, 2, … , 𝑛𝑛
• 𝑃𝑃 𝑋𝑋 gives the probability of successes in n binomial trials
CSEC-4142 & PHDCS-4 @2022
Some important Distribution: Bernoulli
• The Bernoulli distribution is a discrete distribution having two possible outcomes labelled
by n=0 and n=1 in which n=1 ("success") occurs with probability p and n=0 ("failure")
occurs with probability q=1-p, where 0<p<1.
• It therefore has probability density function
𝑝𝑝 𝑛𝑛 = �
1 − 𝑝𝑝 for n = 0
𝑝𝑝 for n = 1
Which can also be written as
𝑝𝑝 𝑛𝑛 = 𝑝𝑝𝑛𝑛
(1 − 𝑝𝑝)1−𝑛𝑛
• Bernoulli distribution is a special case of Binomial Distribution with n=1
CSEC-4142 & PHDCS-4 @2022
Some important Distribution: Poisson
• The Poisson distribution is a very useful distribution that deals with the arrival of events.
• It is a discrete probability distribution. It measures probability of the number of events
happening over a fixed period of time.
• Given a fixed average rate of occurrence, and that the events take place independently of the
time since the last event.
• It is parameterized by the average arrival rate 𝜆𝜆.
• The probability mass function (the probability of observing k events over the time period) is
given by
𝑃𝑃 𝑋𝑋 = 𝑘𝑘 =
exp(−𝜆𝜆)𝜆𝜆𝑘𝑘
𝑘𝑘!
CSEC-4142 & PHDCS-4 @2022
Some important Distribution: Gaussian
• A random variable X whose distribution has the shape of a normal curve is called a normal
random variable.
• This random variable X is said to be normally distributed with mean μ and standard
deviation σ if its probability distribution is given by
𝑓𝑓 𝑥𝑥 =
1
𝜎𝜎 2𝜋𝜋
exp −
(𝑥𝑥 − 𝜇𝜇)2
2𝜎𝜎2
• Normal distribution is also called Gaussian distribution
CSEC-4142 & PHDCS-4 @2022
Some important Distribution: Gaussian (cont.)
• Gaussian distribution is the most versatile distribution in probability theory and appears in
wide range of context.
• Whenever you measure things like people's height, weight, salary, opinions or votes, the
probability distribution is very often Gaussian
• It can be used to approximate the binomial distribution when the number of experiments is
large and Poisson distribution when the average arrival rate is high
• In may cases we deals with multivariate Gaussian Distributions.
• A k-dimensional multivariate Gaussian distribution is parameterized by (𝜇𝜇, Σ), where 𝜇𝜇 is a
vector of means in ℜ𝑛𝑛 and Σ is a covariance matrix in ℜ𝑛𝑛𝑥𝑥𝑥𝑥. In other words Σ𝑖𝑖𝑖𝑖 = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖)
and Σ𝑖𝑖𝑗𝑗 = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖, 𝑋𝑋𝑗𝑗).
• The probability density function is defined by
𝑓𝑓 𝑥𝑥 =
1
2𝜋𝜋𝑘𝑘 Σ
exp −
1
2
(𝑥𝑥 − 𝜇𝜇)𝑇𝑇Σ−1 (𝑥𝑥 − 𝜇𝜇)

More Related Content

Similar to Lecture-1.pdf

2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3Ferdin Joe John Joseph PhD
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]ssuser23e4f31
 
Bda life cycle slideshare
Bda life cycle   slideshareBda life cycle   slideshare
Bda life cycle slideshareSathyaseelanK1
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSpartan60
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxsumitkumar600840
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...ertekg
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aRai University
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdfWinduGata3
 
Mis system analysis and system design
Mis   system analysis and system designMis   system analysis and system design
Mis system analysis and system designRahul Hedau
 
best data science course institutes in Hyderabad
best data science course institutes in Hyderabadbest data science course institutes in Hyderabad
best data science course institutes in Hyderabadrajasrichalamala3zen
 
Data Science course in Hyderabad .
Data Science course in Hyderabad            .Data Science course in Hyderabad            .
Data Science course in Hyderabad .rajasrichalamala3zen
 
Data Science course in Hyderabad .
Data Science course in Hyderabad         .Data Science course in Hyderabad         .
Data Science course in Hyderabad .rajasrichalamala3zen
 
data science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabaddata science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabadakhilamadupativibhin
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabadmadhupriya3zen
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabadmadhupriya3zen
 

Similar to Lecture-1.pdf (20)

2019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 32019 DSA 105 Introduction to Data Science Week 3
2019 DSA 105 Introduction to Data Science Week 3
 
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
Data Analytics Life Cycle [EMC² - Data Science and Big data analytics]
 
Bda life cycle slideshare
Bda life cycle   slideshareBda life cycle   slideshare
Bda life cycle slideshare
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptxData Science Introduction: Concepts, lifecycle, applications.pptx
Data Science Introduction: Concepts, lifecycle, applications.pptx
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...Analyzing the solutions of DEA through information visualization and data min...
Analyzing the solutions of DEA through information visualization and data min...
 
Mba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation aMba ii rm unit-4.1 data analysis & presentation a
Mba ii rm unit-4.1 data analysis & presentation a
 
Data Science and Analytics
Data Science and Analytics Data Science and Analytics
Data Science and Analytics
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdf
 
Mis system analysis and system design
Mis   system analysis and system designMis   system analysis and system design
Mis system analysis and system design
 
best data science course institutes in Hyderabad
best data science course institutes in Hyderabadbest data science course institutes in Hyderabad
best data science course institutes in Hyderabad
 
Data Science course in Hyderabad .
Data Science course in Hyderabad            .Data Science course in Hyderabad            .
Data Science course in Hyderabad .
 
Data Science course in Hyderabad .
Data Science course in Hyderabad         .Data Science course in Hyderabad         .
Data Science course in Hyderabad .
 
data science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabaddata science course in Hyderabad data science course in Hyderabad
data science course in Hyderabad data science course in Hyderabad
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabad
 
data science course training in Hyderabad
data science course training in Hyderabaddata science course training in Hyderabad
data science course training in Hyderabad
 
data science.pptx
data science.pptxdata science.pptx
data science.pptx
 
7 QC Tools Training
7 QC Tools Training7 QC Tools Training
7 QC Tools Training
 

Recently uploaded

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptMsecMca
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 

Recently uploaded (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 

Lecture-1.pdf

  • 1. Lecture - 1 CSEE-4142 & PHDCS-834: Machine Learning
  • 2. CSEC-4142 & PHDCS-4 @2022 Introduction • Analytics is the systematic computational analysis of data or statistics. • It is used for the discovery, interpretation, and communication of meaningful patterns in data. • It also entails applying data patterns towards effective decision-making. • It can be valuable in areas rich with recorded information • Analytics relies on the simultaneous application of statistics, computer programming and operation research to qualify performance.
  • 3. CSEC-4142 & PHDCS-4 @2022 Data Analysis vs Data Analytics • Data analysis focuses on the process of examining past data through collection, inspection, modelling and questioning. • It is a subset of data analytics, which takes multiple data analysis processes to focus on why an event happened and what may happen in the future based on the previous data. • Data analytics is used to formulate larger organization decisions. • Data analytics use extensive computer skills, mathematics, statistics, descriptive techniques and predictive models to gain valuable knowledge from data through analytics.
  • 4. CSEC-4142 & PHDCS-4 @2022 Techniques and Tools of Analytics • Analytics is a collection of techniques and tools for creating value from data. • The Techniques included concepts such as Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) algorithms • Artificial Intelligence is the algorithms and systems that exhibit human like intelligence • Machine learning is a subset of AI that can learn to perform a task with extracted data and or models • Deep Learning is a subset of machine learning that imitate the functioning of human brain to solve problem
  • 5. CSEC-4142 & PHDCS-4 @2022 Relationship between AI, ML and DL (?)
  • 6. CSEC-4142 & PHDCS-4 @2022 Introduction to Machine Learning • Machine learning (ML) is an area in computer science which involves teaching computers to do things naturally by teaching through experience. • Machine Learning deals with systems that are trained from data rather than being explicitly programmed. • It is centred on the creation of algorithms that can learn from past experience and data, as opposed to a computer being pre-programmed to carry out a task in a specific manner. • Put simply, if a computer program improves with experience then we can say that it has learned.
  • 7. CSEC-4142 & PHDCS-4 @2022 Definition • Tom Mitchell, one of the patrons of machine learning define machine learning as “A computer program is said to learn from experience ‘E’, with respect to some class of tasks ‘T’ and performance measure ‘P’ if its performance at tasks in ‘T’ as measured by ‘P’ improves with experience ‘E’. • Stanford University define it as “Machine learning is the science of getting computers to act without being explicitly programmed”. • McKinsey et al state that “Machine learning is based on algorithm that can learn from data without relying on rule based program”
  • 8. CSEC-4142 & PHDCS-4 @2022 Example-1: Handwriting Recognition Task T : Recognising and classifying handwritten words within images Performance P : Percent of words correctly classified Training experience E: A dataset of handwritten words with given classifications
  • 9. CSEC-4142 & PHDCS-4 @2022 Example-2: Driving Robot Training Task T : Driving on highways using vision sensors Performance measure P : Average distance traveled before an error Training experience E: A sequence of images and steering commands recorded while observing a human driver
  • 10. CSEC-4142 & PHDCS-4 @2022 Example-3: Chess Learning Task T : Playing chess Performance measure P : Percent of games won against opponents Training experience E: Playing practice games against itself
  • 11. CSEC-4142 & PHDCS-4 @2022 Basic components of learning process • The learning process can be divided into four components: • data storage • abstraction • generalization • evaluation Fig 1. Components of Learning Process
  • 12. CSEC-4142 & PHDCS-4 @2022 Data Storage • Facilities for storing and retrieving huge amounts of data are an important component of the learning process. Humans and computers alike utilize data storage as a foundation for advanced reasoning. • In human being, the data is stored in the brain and data is retrieved using electrochemical signals. • Computers use hard disk drives, flash memory, random access memory and similar devices to store data and retrieve data.
  • 13. CSEC-4142 & PHDCS-4 @2022 Abstraction • The second component of the learning process is known as abstraction. • Abstraction is the process of extracting knowledge about stored data. This involves creating general concepts about the data as a whole. • The creation of knowledge involves application of known models and creation of new models. • The process of fitting a model to a dataset is known as training. • When the model has been trained, the data is transformed into an abstract form that summarizes the original information.
  • 14. CSEC-4142 & PHDCS-4 @2022 Generalization • The third component of the learning process is known as generalization. • The term generalization describes the process of turning the knowledge about stored data into a form that can be utilized for future action. • These actions are to be carried out on tasks that are similar, but not identical, to those what have been seen before. • In generalization, the goal is to discover those properties of the data that will be most relevant to future tasks.
  • 15. CSEC-4142 & PHDCS-4 @2022 Evaluation • Evaluation is the last component of the learning process. • It is the process of giving feedback to the user to measure the utility of the learned knowledge. • This feedback is then utilized to improve in the whole learning process.
  • 16. CSEC-4142 & PHDCS-4 @2022 Applications of machine learning • In retail business, machine learning is used to study consumer behaviour. • In finance, banks analyze their past data to build models to use in credit application fraud detection and the stock market. • In manufacturing, learning models are used for optimization, control, and troubleshooting. • In medicine, learning programs are used for medical diagnosis.
  • 17. CSEC-4142 & PHDCS-4 @2022 Applications of machine learning (cont.) • In telecommunications, call patterns are analyzed for network optimization and maximizing the quality of service. • It is used to find solutions to many problems in vision, speech recognition, and robotics. • Machine learning methods are applied in the design of computer-controlled vehicles • Machine learning methods have been used to develop programmes for playing games
  • 18. CSEC-4142 & PHDCS-4 @2022 Understanding data • Unit of observation  Unit of observation means the smallest entity with measurable properties of interest for a study.  Examples: o A person, an object or a thing o A time point (second, minute, day, month, year etc.) o A geographic region o A measurement (Height, weight etc.) • Examples  An “example” is an instance of the unit of observation for which properties have been recorded. An “example” is also referred to as an “instance”, or “case” or “record.”
  • 19. CSEC-4142 & PHDCS-4 @2022 Understanding data (cont.) • Features  A “feature” is a recorded property or a characteristic of examples. It is also referred to as “attribute”, or “variable” or “feature.” • Examples and features are generally collected in a “matrix format”. Table 1. Examples and Features
  • 20. CSEC-4142 & PHDCS-4 @2022 Understanding data (cont.) • Consider the problem of developing an algorithm for detecting cancer. In this study we note the following.  The units of observation are the patients.  The examples are patients investigated for cancer (both positive and negative).  The following attributes of the patients may be chosen as the features: o gender o age o blood pressure o the findings of the pathology report after a biopsy
  • 21. CSEC-4142 & PHDCS-4 @2022 Understanding data (cont.) • Spam e-mail detection  The unit of observation could be an e-mail messages.  The examples would be specific messages (both spam and genuine email).  The features might consist of the words used in the messages.
  • 22. CSEC-4142 & PHDCS-4 @2022 Different forms of features (data) • Numeric  If a feature represents a characteristic measured in numbers, it is called a numeric feature. o Example: “year”, “price” and “mileage” (ref. to Table-1) • Categorical or nominal  A categorical feature is an attribute that can take on one of a limited, and usually fixed, number of possible values on the basis of some qualitative property. A categorical feature is also called a nominal feature. o Example: “model”, “color” and “transmission” (ref. to Table-1) • Ordinal data  This denotes a nominal variable with categories falling in an ordered list. o Examples include clothing sizes such as small, medium, and large o measurement of customer satisfaction on a scale from “not at all happy” to “very happy.”
  • 23. CSEC-4142 & PHDCS-4 @2022 Mathematical Basics: Probability Theory • Broadly speaking, probability theory is the mathematical study of uncertainty • It plays the central role in machine learning and pattern recognition as the design of learning algorithms often relies on probability assumption of data
  • 24. CSEC-4142 & PHDCS-4 @2022 Probability Space • When we speak about probability, we often refer to the probability of an event of uncertain nature taking place • Therefore, in order to discuss probability theory formally, we must first classify what the possible events are to which we would like to attach probability. • Formally, a probability space is defined by the triple (Ω, Ƒ, p), where Ω is the space of possible outcomes or outcome space. ℱ ⊆ 2Ω is the space of measurable events or event space. P is the probability measure or probability distribution that maps an event 𝐸𝐸 ∈ ℱ to a real value between 0 to 1.
  • 25. CSEC-4142 & PHDCS-4 @2022 Probability Space Given the outcome space Ω, there are some restrictions as to what subset of 2Ω can be considered as event space Ƒ: (i) Trivial event Ω and the empty event 𝜙𝜙 is in Ƒ (ii) The event space Ƒ is closed under union, that 𝛼𝛼, 𝛽𝛽 ∈ ℱ then 𝛼𝛼 ∪ 𝛽𝛽 ∈ ℱ (iii) The event space ℱ is closed under complement, i.e. if α ∈ ℱ then Ω − 𝛼𝛼 ∈ ℱ
  • 26. CSEC-4142 & PHDCS-4 @2022 Probability Space (cont.) • Given the event space Ƒ, the probability measures P must satisfy certain axioms: i) Non-negativity: For all 𝛼𝛼 ∈ ℱ, 𝑝𝑝(𝛼𝛼) ≥ 0 ii) Trivial event: 𝑝𝑝 Ω = 1 iii) Additivity : For all 𝛼𝛼, 𝛽𝛽 ∈ ℱ and 𝛼𝛼 ∩ 𝛽𝛽 = 𝜙𝜙, P 𝛼𝛼 ∪ 𝛽𝛽 = 𝑝𝑝 𝛼𝛼 + 𝑝𝑝 𝛽𝛽
  • 27. CSEC-4142 & PHDCS-4 @2022 Random variable • Random variables play important role in probability theory • The most important fact is the random variables are not variables but functions, that map outcomes (or outcome space) to real values. • Random variables are denoted by capital letter Example: • Consider the process of throwing a dice. Let X be a random variable that depends on the outcome of the throw • A natural choice for X would be to map the outcome 𝑖𝑖 to the value 𝑖𝑖, that is mapping the event of throwing an ‘one’ to 1. • Suppose we have another random variable Y that maps all outcome to 0 • Suppose we have yet another random variable Z that maps all even outcome to 2𝑖𝑖 and odd outcomes to (-𝑖𝑖).
  • 28. CSEC-4142 & PHDCS-4 @2022 Random variable (cont.) • In other sense, random variables allow us to abstract away the formal notation of event space, as we can define random variables that captures the appropriate events. • For example, consider the event space of odd and even dice throw. We could have a random variable that takes on value 1 if outcome 𝑖𝑖 is even and 0 otherwise. • These type of binary random variables are very common and known as indicator variable as they indicates whether certain event happened or not.
  • 29. CSEC-4142 & PHDCS-4 @2022 Joint Probability • Joint probability is a statistical measure that calculates the likelihood of two events occurring together at the same point of time • We denote the probability of X taking a and Y taking b by (p(X=a, Y=b) or 𝑝𝑝𝑥𝑥,𝑦𝑦 𝑎𝑎, 𝑏𝑏 • Example. Let random variable X be defined on the outcome space Ω of a dice throw 𝑝𝑝𝑥𝑥 1 = 𝑝𝑝𝑥𝑥 2 = 𝑝𝑝𝑥𝑥 3 … . = 𝑝𝑝𝑥𝑥 6 = 1 6 Let Y be an indicator variable that takes a value 1 if a coin flip turns head and 0 if tail Assuming both dice and coin are fair, the probability distribution of X and Y is given by
  • 30. CSEC-4142 & PHDCS-4 @2022 Joint Probability P X=1 X=2 X=3 X=4 X=5 X=6 𝒑𝒑𝒙𝒙(𝒊𝒊) Y=0 1 12 1 12 1 12 1 12 1 12 1 12 1 2 Y=1 1 12 1 12 1 12 1 12 1 12 1 12 1 2 𝒑𝒑𝒚𝒚(𝒋𝒋) 1 6 1 6 1 6 1 6 1 6 1 6 Here , 𝑝𝑝 𝑋𝑋 = 1, 𝑌𝑌 = 0 = 1 6 ∗ 1 2 = 1 12 𝒑𝒑𝒙𝒙 𝒊𝒊 , 𝒑𝒑𝒚𝒚(𝒋𝒋) are called marginal probability Definition: Marginal Probability Distribution • The marginal distribution refers to the probability distribution of a random variable on its own • Given a joint distribution, say over random variables X and Y, we can find the marginal distribution of X or that of Y
  • 31. CSEC-4142 & PHDCS-4 @2022 Joint Probability • If X and Y are discrete random variables and p(𝑥𝑥, 𝑦𝑦) is the value of their joint probability distribution at (𝑥𝑥, 𝑦𝑦) the function given by 𝑝𝑝 𝑥𝑥 = � 𝑏𝑏∈𝑣𝑣𝑣𝑣𝑣𝑣(𝑦𝑦) 𝑝𝑝(𝑥𝑥, 𝑦𝑦 = 𝑏𝑏) and 𝑝𝑝 𝑦𝑦 = � 𝑎𝑎∈𝑣𝑣𝑣𝑣𝑣𝑣(𝑥𝑥) 𝑝𝑝(𝑥𝑥 = 𝑎𝑎, 𝑦𝑦) 𝑝𝑝 𝑥𝑥 and 𝑝𝑝 𝑦𝑦 are the marginal distribution of X and Y respectively.
  • 32. CSEC-4142 & PHDCS-4 @2022 Conditional Distribution • Conditional distributions specify the distribution of a random variable when the value of another random variable is known or more generally when some events are know to be true. • Formally, conditional probability of X=a given Y=b is defined as: 𝑝𝑝 𝑋𝑋 = 𝑎𝑎 𝑌𝑌 = 𝑏𝑏 = 𝑝𝑝(𝑋𝑋 = 𝑎𝑎, 𝑌𝑌 = 𝑏𝑏) 𝑝𝑝(𝑌𝑌 = 𝑏𝑏) • This is not defined when 𝑝𝑝 𝑌𝑌 = 𝑏𝑏 = 0
  • 33. CSEC-4142 & PHDCS-4 @2022 Some Important Theorems on Probability Theorem-1: If A and B are mutually exclusive events, then 𝑝𝑝 𝐴𝐴 ∪ 𝐵𝐵 = 𝑝𝑝 𝐴𝐴 + 𝑝𝑝(𝐵𝐵) Theorem-2: If two events A and B are exhaustive and mutually exclusive, then p 𝐴𝐴 + 𝑝𝑝 𝐵𝐵 = 1 Theorem-3: For any event A, p 𝐴𝐴′ = 1 − 𝑝𝑝(𝐴𝐴) Theorem-4: If event A implies event B, then (i)𝑝𝑝 𝐴𝐴 ≤ 𝑝𝑝(𝐵𝐵) (ii) 𝑝𝑝 𝐵𝐵 − 𝐴𝐴 = 𝑝𝑝 𝐵𝐵 − 𝑝𝑝(𝐴𝐴) Theorem-5: Suppose A and B are events that are not necessarily mutually exclusive, then 𝑝𝑝 𝐴𝐴 ∪ 𝐵𝐵 = 𝑝𝑝 𝐴𝐴 + 𝑝𝑝 𝐵𝐵 − 𝑝𝑝 𝐴𝐴 ∩ 𝐵𝐵 𝐴𝐴 → 𝐵𝐵 i. e. , 𝐴𝐴 ⊂ 𝐵𝐵
  • 34. CSEC-4142 & PHDCS-4 @2022 Independence & Chain Rule Independence • In probability theory, independence means that the distribution of a random variable does not change on learning t he value of another variable • Mathematically if 𝑝𝑝 𝐴𝐴 = 𝑝𝑝 𝐴𝐴 𝐵𝐵 Then A and B are independent. Example: The result of successive tosses of a coin is independent Chain Rule The Chain Rule is often used to evaluate the joint probability of some random variables and is specially useful when there are conditional independence across variables 𝑃𝑃 𝑥𝑥1, 𝑥𝑥2, 𝑥𝑥3, … . , 𝑥𝑥𝑛𝑛 = p(𝑥𝑥1).p(𝑥𝑥2|𝑥𝑥1) … 𝑝𝑝(𝑥𝑥𝑛𝑛|𝑥𝑥1𝑥𝑥2 … 𝑥𝑥𝑛𝑛−1)
  • 35. CSEC-4142 & PHDCS-4 @2022 Bayes Rule • The Bayes Rule allows us to compute the conditional probability 𝑝𝑝 𝑥𝑥 𝑦𝑦 from 𝑝𝑝 𝑦𝑦 𝑥𝑥 in a sense inverting the condition. 𝑝𝑝 𝑥𝑥 𝑦𝑦 = 𝑝𝑝 𝑦𝑦 𝑥𝑥 𝑝𝑝(𝑥𝑥) 𝑝𝑝(𝑦𝑦) • 𝒑𝒑 𝒙𝒙 𝒚𝒚 is called the posterior; this is what we are trying to estimate. For example, “probability of having cancer given that the person is a smoker”. • 𝒑𝒑 𝒚𝒚 𝒙𝒙 is called the likelihood; this is the probability of observing the new evidence, given our initial hypothesis. In the above example, this would be the “probability of being a smoker given that the person has cancer”. • 𝒑𝒑(𝒙𝒙) is called the prior; this is the probability of our hypothesis without any additional prior information. In the above example, this would be the “probability of having cancer”. • 𝒑𝒑(𝒚𝒚) is called the marginal likelihood; this is the total probability of observing the evidence. In the above example, this would be the “probability of being a smoker”. In many applications of Bayes Rule, this is ignored, as it mainly serves as normalization.
  • 36. CSEC-4142 & PHDCS-4 @2022 Probability Distribution • The probability distribution describes how the total probability is distributed over the possible values of a random variable • It tells us what the possible values of a random variable X are and how the probability are assigned to those values. • The set of possible values of a random variable together with their respective probability is called the probability distribution of the random variable. • Probability distribution can be continuous or random. • The probability distribution of a continuous random variable is called continuous probability distribution and the probability distribution of a random variable is called discrete probability distribution
  • 37. CSEC-4142 & PHDCS-4 @2022 PMF & CDF Probability Mass Function (PMF) The probability mass function (pmf) of a discrete random variable X is a function denoted by 𝑝𝑝 𝑥𝑥 or 𝑓𝑓 𝑥𝑥 , that satisfies: i) 𝑝𝑝 𝑥𝑥 ≥ 0 for all values of X and ii) ∑𝑖𝑖 𝑝𝑝(𝑥𝑥𝑖𝑖) = 1 Where 𝑝𝑝 𝑥𝑥 = 𝑝𝑝(𝑋𝑋 = 𝑥𝑥), is the probability that random variable X takes the value x Cumulative Distribution Function (CDF) For any real number x, the function 𝐹𝐹 𝑥𝑥 = 𝑝𝑝 𝑋𝑋 ≤ 𝑥𝑥 is known as the cumulative distribution function (cdf) or simply distribution function.
  • 38. CSEC-4142 & PHDCS-4 @2022 Mean and Variance of a Random Variable • Mean of a random variable X, denoted by µ, describes where the population distribution of X is centered. • Mean or expectation of a discrete random variable X having p.m.f. p(x) is defined as 𝜇𝜇 = 𝐸𝐸 𝑋𝑋 = � 𝑥𝑥 𝑥𝑥. 𝑝𝑝(𝑥𝑥) • Provided the sum exists, otherwise we say that mean does not exist. Sum exist mean sum is absolutely convergent ∑𝐱𝐱 𝐱𝐱. 𝐩𝐩(𝐱𝐱) < ∞ Variance (𝜎𝜎2 ) of a discrete random variable X is defined as 𝜎𝜎2 = 𝑉𝑉 𝑋𝑋 = 𝐸𝐸(𝑋𝑋 − 𝐸𝐸 𝑋𝑋 )2 = 𝐸𝐸(𝑋𝑋 − 𝜇𝜇)2 = � 𝑥𝑥 (𝑥𝑥 − 𝜇𝜇)2 𝑝𝑝(𝑥𝑥) Or 𝜎𝜎2 = 𝑉𝑉 𝑋𝑋 = 𝐸𝐸(𝑋𝑋 − 𝐸𝐸 𝑋𝑋 )2 = 𝐸𝐸 𝑋𝑋2 − 𝐸𝐸2 𝑋𝑋 = ∑𝑥𝑥 𝑥𝑥2 𝑝𝑝 𝑥𝑥 − 𝜇𝜇2 The standard deviation (𝜎𝜎) of a random variable is the positive square root of its variance, i.e. 𝜎𝜎 = + 𝜎𝜎2
  • 39. CSEC-4142 & PHDCS-4 @2022 Median and Mode of a Random Variable • Median of a random variable X is the value below (or above) which half of the total probability lies. • This if F(x) is the cumulative distribution function of a random variable X and ̅ 𝜇𝜇 be the median, then 𝐹𝐹 ̅ 𝜇𝜇 = 1 2 Or we may write 𝑝𝑝 𝑋𝑋 ≤ ̅ 𝜇𝜇 ≥ 1 2 and 𝑝𝑝 𝑋𝑋 ≥ ̅ 𝜇𝜇 ≥ 1 2 • Mode of a random variable X is the value which occurs with the highest probability, i.e. the value of X at which pmf 𝑓𝑓 𝑥𝑥 is maximum. • Thus, if f 𝑥𝑥 be a pmf of a random variable X and � 𝜇𝜇 be the mode, then 𝑓𝑓 � 𝜇𝜇 ≥ 𝑓𝑓 𝑥𝑥 , ∀𝑥𝑥 • Mode is not unique
  • 40. CSEC-4142 & PHDCS-4 @2022 Conditional Expectation • The conditional expectation of X, given that the event A has happed is defined as 𝐸𝐸 𝑋𝑋 𝐴𝐴 = � 𝑥𝑥|𝐴𝐴 𝑥𝑥. 𝑝𝑝(𝑋𝑋 = 𝑥𝑥|𝐴𝐴) Where (𝑋𝑋 = 𝑥𝑥|𝐴𝐴) indicates the value of X that are favourable to A.
  • 41. CSEC-4142 & PHDCS-4 @2022 Probability Density Function • To define a continuous distribution, we make use of probability density function (PDF). • A probability density function f, is a non-negative, integrable function such that � 𝑣𝑣𝑎𝑎𝑎𝑎(𝑋𝑋) 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑 = 1 The probability of a random variable X distributed according to a PDF f is computed as follows: 𝑃𝑃 𝑎𝑎 ≤ 𝑋𝑋 ≤ 𝑏𝑏 = � 𝑎𝑎 𝑏𝑏 𝑓𝑓 𝑥𝑥 𝑑𝑑𝑑𝑑 This implies that the probability of a continuously distributed random variable taking on any given single value is zero.
  • 42. CSEC-4142 & PHDCS-4 @2022 Skewness and Kurtosis • Skewness is a measure of symmetry, or more precisely, the lack of symmetry. A distribution, or data set, is symmetric if it looks the same to the left and right of the center point. • Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution. That is, data sets with high kurtosis tend to have heavy tails, or outliers. Data sets with low kurtosis tend to have light tails, or lack of outliers. A uniform distribution would be the extreme case. • A heavy tailed distribution has a tail that’s heavier than an exponential distribution (Bryson, 1974). • In other words, a distribution that is heavy tailed goes to zero slower than one with lighter tails. • Heavy tailed distributions tend to have many outliers. An outlier is a data point that differs significantly from other observations • The histogram is an effective graphical technique for showing both the skewness and kurtosis of data set.
  • 43. CSEC-4142 & PHDCS-4 @2022 Skewness and Kurtosis (cont.)
  • 44. CSEC-4142 & PHDCS-4 @2022 Symmetrical and Asymmetrical Distribution • A symmetric distribution is a type of distribution where the left side of the distribution mirrors the right side. • The normal distribution is symmetric. It is also a unimodal distribution (it has one peak). • Distributions don’t have to be unimodal to be symmetric. They can be bimodal (two peaks) or multimodal (many peaks). The following bimodal distribution is symmetric, as the two halves are mirror images of each other. • In a symmetric distribution, the mean, mode and median all fall at the same point. The mode is the most common number and it matches with the highest peak • An exception is the bimodal distribution. The mean and median are still in the center, but there are two modes: one on each peak.
  • 45. CSEC-4142 & PHDCS-4 @2022 Symmetrical and Asymmetrical Distribution (cont.) • If one tail is longer than another, the distribution is skewed. These distributions are sometimes called asymmetric or asymmetrical distributions as they don’t show any kind of symmetry. • A left-skewed distribution has a long left tail. Left-skewed distributions are also called negatively-skewed distributions. That’s because there is a long tail in the negative direction on the number line. The mean is also to the left of the peak. • A right-skewed distribution has a long right tail. Right-skewed distributions are also called positive-skew distributions. That’s because there is a long tail in the positive direction on the number line. The mean is also to the right of the peak.
  • 46. CSEC-4142 & PHDCS-4 @2022 Symmetrical and Asymmetrical Distribution (cont.) • A left-skewed, negative distribution will have the mean to the left of the median. • A right-skewed distribution will have the mean to the right of the median. Real life distributions are usually skewed. • If the skewness is high, and many statistical techniques don’t work. • As a result, advanced mathematical techniques like logarithms are used.
  • 47. CSEC-4142 & PHDCS-4 @2022 Some important Distribution: Binomial • A binomial experiment is one that possesses the following properties: 1) The experiment consists of n repeated trials 2) Each trial results in an outcome that may be classified as a success or a failure (hence the name, binomial) 3) The probability of a success, denoted by p, remains constant from trial to trial and repeated trials are independent. • The number of successes X in n trials of a binomial experiment is called a binomial random variable. • The probability distribution of the random variable X is called a binomial distribution, and is given by the formula: 𝑃𝑃 𝑋𝑋 = 𝑛𝑛 𝑥𝑥 𝑝𝑝𝑥𝑥(1 − 𝑝𝑝)𝑛𝑛−𝑥𝑥, 𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤𝑤 𝑥𝑥 = 0, 1, 2, … , 𝑛𝑛 • 𝑃𝑃 𝑋𝑋 gives the probability of successes in n binomial trials
  • 48. CSEC-4142 & PHDCS-4 @2022 Some important Distribution: Bernoulli • The Bernoulli distribution is a discrete distribution having two possible outcomes labelled by n=0 and n=1 in which n=1 ("success") occurs with probability p and n=0 ("failure") occurs with probability q=1-p, where 0<p<1. • It therefore has probability density function 𝑝𝑝 𝑛𝑛 = � 1 − 𝑝𝑝 for n = 0 𝑝𝑝 for n = 1 Which can also be written as 𝑝𝑝 𝑛𝑛 = 𝑝𝑝𝑛𝑛 (1 − 𝑝𝑝)1−𝑛𝑛 • Bernoulli distribution is a special case of Binomial Distribution with n=1
  • 49. CSEC-4142 & PHDCS-4 @2022 Some important Distribution: Poisson • The Poisson distribution is a very useful distribution that deals with the arrival of events. • It is a discrete probability distribution. It measures probability of the number of events happening over a fixed period of time. • Given a fixed average rate of occurrence, and that the events take place independently of the time since the last event. • It is parameterized by the average arrival rate 𝜆𝜆. • The probability mass function (the probability of observing k events over the time period) is given by 𝑃𝑃 𝑋𝑋 = 𝑘𝑘 = exp(−𝜆𝜆)𝜆𝜆𝑘𝑘 𝑘𝑘!
  • 50. CSEC-4142 & PHDCS-4 @2022 Some important Distribution: Gaussian • A random variable X whose distribution has the shape of a normal curve is called a normal random variable. • This random variable X is said to be normally distributed with mean μ and standard deviation σ if its probability distribution is given by 𝑓𝑓 𝑥𝑥 = 1 𝜎𝜎 2𝜋𝜋 exp − (𝑥𝑥 − 𝜇𝜇)2 2𝜎𝜎2 • Normal distribution is also called Gaussian distribution
  • 51. CSEC-4142 & PHDCS-4 @2022 Some important Distribution: Gaussian (cont.) • Gaussian distribution is the most versatile distribution in probability theory and appears in wide range of context. • Whenever you measure things like people's height, weight, salary, opinions or votes, the probability distribution is very often Gaussian • It can be used to approximate the binomial distribution when the number of experiments is large and Poisson distribution when the average arrival rate is high • In may cases we deals with multivariate Gaussian Distributions. • A k-dimensional multivariate Gaussian distribution is parameterized by (𝜇𝜇, Σ), where 𝜇𝜇 is a vector of means in ℜ𝑛𝑛 and Σ is a covariance matrix in ℜ𝑛𝑛𝑥𝑥𝑥𝑥. In other words Σ𝑖𝑖𝑖𝑖 = 𝑣𝑣𝑣𝑣𝑣𝑣(𝑋𝑋𝑖𝑖) and Σ𝑖𝑖𝑗𝑗 = 𝑐𝑐𝑐𝑐𝑐𝑐(𝑋𝑋𝑖𝑖, 𝑋𝑋𝑗𝑗). • The probability density function is defined by 𝑓𝑓 𝑥𝑥 = 1 2𝜋𝜋𝑘𝑘 Σ exp − 1 2 (𝑥𝑥 − 𝜇𝜇)𝑇𝑇Σ−1 (𝑥𝑥 − 𝜇𝜇)