Probability Theory for Data Scientists

Probability for Data Scientists
Dr. Ferdin Joe John Joseph

Machine Learning
Machine Learning is an interdisciplinary field in Data Science that uses
• statistics
• probability
• algorithms
to learn from data and provide insights which can be used to build
intelligent applications.
2

Probability for Data Science
•Probability deals with predicting the likelihood of
future events, while statistics involves the
analysis of the frequency of past events.
8

Terminologies
• Event
• Random Variable
• Empirical Probability
• Theoretical Probability
• Joint Probability
• Conditional Probability
9

Event
• An event is a set of outcomes of an experiment to which a probability
is assigned.
• E represents event
• P(E) is the probability that the event E occur.
• A situation where E might happen (success) or might not happen
(failure) is called a trial.
10

Event
• Pulling colored ball out of the bag
13

Random Variable
• The variable that represents the outcome of an events is called a
random variable.
• Eg. Getting head or tail in tossing a coin
14

Random variable in tossing a coin
• If we toss a coin, the chances for getting head or tail is 50-50
• The probability of getting head or tail is ½ or 50%
• Random variable range between 0 and 1
15

Empirical Probability
• Also known as practical probability
• It is the number of times the event occurs divided by the total
number of incidents observed.
• If for ‘n’ trials and we observe ‘s’ successes, the probability of success
is s/n.
• Toss a coin 4 times. The outcome is H, H, H, T
• P(Head) =3/4=0.75
• P(Tail)=1/4=0.25
16

Theoretical probability
• The number of ways the particular event can occur divided by the
total number of possible outcomes.
• A head can occur once and possible outcomes are two (head, tail).
The true (theoretical) probability of a head is 1/2.
17

Exercise 1
A die is rolled, find the probability that an even number is obtained.
18

Exercise 1
A die is rolled, find the probability that an even number is obtained.
Solution:
Let us first write the sample space S of the experiment.
S = {1,2,3,4,5,6}
Let E be the event "an even number is obtained" and write it down.
E = {2,4,6}
We now use the formula of the classical probability.
P(E) = n(E) / n(S) = 3 / 6 = 1 / 2
19

Exercise 2
Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
20

Exercise 2
Two coins are tossed, find the probability that two heads are obtained.
Note: Each coin has two possible outcomes H (heads) and T (Tails).
The sample space S is given by.
S = {(H,T),(H,H),(T,H),(T,T)}
Let E be the event "two heads are obtained".
E = {(H,H)}
We use the formula of the classical probability.
P(E) = n(E) / n(S) = 1 / 4
21

Exercise 3
A card is drawn at random from a deck of cards. Find the probability of
getting the 3 of diamond.
22

Exercise 3
The sample space S of the experiment in question 6 is shown below
23

Exercise 3
24

Exercise 3
Let E be the event "getting the 3 of diamond". An examination of the
sample space shows that there is one "3 of diamond" so that n(E) = 1
and n(S) = 52. Hence the probability of event E occurring is given by
P(E) = 1 / 52
25

Exercise 4
The blood groups of 200 people is distributed as follows:
50 have type A blood,
65 have B blood type,
70 have O blood type and
15 have type AB blood.
If a person from this group is selected at random, what is the
probability that this person has O blood type?
26

Exercise 4
We construct a table of frequencies for the the blood groups as follows
group frequency
A 50
B 65
O 70
AB 15
We use the empirical formula of the probability
P(E) = Frequency for O blood / Total frequencies
= 70 / 200 = 0.35
27

Classwork 1
What is the probability of throwing one dice and getting the number
greater than 4 ?
28

Classwork 2
The customer wants to buy a bread and a can. There are 30 pieces of
bread in the shop, including 5 from the previous day, and 20 cans with
unreadable expiration date, of which one has expired. What is the
probability that the customer will buy a fresh bread and a tin under
warranty ?
29

Classwork 3
What is the probability that if we choose a trinity from 19 boys and 12
girls, we will have :
a) three boys
b) three girls
c) two boys and one girl ?
30

Joint Probability
• Probability of events A and B denoted by P(A and B) or P(A ∩ B) is the
probability that events A and B both occur.
• P(A ∩ B) = P(A). P(B)
• This only applies if A and B are independent, which means that if A
occurred, that doesn’t change the probability of B, and vice versa.
31

Conditional Probability
• A and B are not independent
• When A and B are not independent, it is often useful to compute the
conditional probability, P (A|B)
• The probability of A given that B occurred: P(A|B) =
P(A ∩ B)
P(B)
• Similarly, P(B|A) =
P(A ∩ B)
P(A)
32

• Joint probability of A and B can be denoted as
• P(A ∩ B)= p(A).P(B|A)
33

Bayes Theorem
• Used in Naïve Bayes Classifier
35

Types of Events
• Independent
• Mutually Exclusive
37

Independent Events
• Two or more events not having control over the outcome of the
others.
38

Mutually Exclusive Events
• If two events are NOT independent, then we say that they are dependent.
• Sampling may be done with replacement or without replacement.
• With replacement: If each member of a population is replaced after it is
picked, then that member has the possibility of being chosen more than
once. When sampling is done with replacement, then events are
considered to be independent, meaning the result of the first pick will not
change the probabilities for the second pick.
• Without replacement: When sampling is done without replacement, each
member of a population may be chosen only once. In this case, the
probabilities for the second pick are affected by the result of the first pick.
The events are considered to be dependent or not independent.
39

Sampling with replacement
• Suppose you pick three cards with replacement. The first card you
pick out of the 52 cards is the
• Q of spades. You put this card back, reshuffle the cards and pick a
second card from the 52-card deck. It is the ten of clubs. You put this
card back, reshuffle the cards and pick a third card from the 52-card
deck. This time, the card is the Q of spades again. Your picks are {Q of
spades, ten of clubs, Q of spades}. You have picked the Q of spades
twice. You pick each card from the 52-card deck.
40

Sampling without replacement
• Suppose you pick three cards without replacement. The first card you
pick out of the 52 cards is the
• K of hearts. You put this card aside and pick the second card from the
51 cards remaining in the deck. It is the three of diamonds. You put
this card aside and pick the third card from the remaining 50 cards in
the deck. The third card is the J of spades. Your picks are {K of hearts,
three of diamonds, J of spades}. Because you have picked the cards
without replacement, you cannot pick the same card twice.
41

Probability Distribution
• A probability distribution is a list of all of the possible outcomes of a
random variable along with their corresponding probability values.
42

Discrete Probability Distribution
• If we consider 1 and 2 as outcomes of rolling a six-sided die, then we
can’t have an outcome in between that (e.g. I can’t have an outcome
of 1.5).
• This is called probability mass function
43

Continuous Probability Distribution
• Sometimes we are concerned with the probabilities of random
variables that have continuous outcomes.
• Eg. The height of an adult picked at random from a population or the
amount of time that a taxi driver has to wait before their next job.
• When we use a probability function to describe a continuous
probability distribution we call it a probability density function
(commonly abbreviated as pdf).
44

Central Limit Theorem
• The central limit theorem states that if you have a population with
mean μ and standard deviation σ and take sufficiently large random
samples from the population with replacement text annotation
indicator, then the distribution of the sample means will be
approximately normally distributed.
45

Normal Distribution
• Uses the Central Limit Theorem
• Known as Bell Curve
47

Genetic Algorithm
Genetic algorithm is a search heuristic that is inspired by Charles
Darwin’s theory of natural evolution.
This algorithm reflects the process of natural selection where the fittest
individuals are selected for reproduction in order to produce offspring
of the next generation.
50

Phases of Genetic Algorithm
Initial population
Fitness function
Selection
Crossover
Mutation
52

Initial Population
The process begins with a set of individuals which is called a
Population. Each individual is a solution to the problem you want to
solve.
An individual is characterized by a set of parameters (variables) known
as Genes. Genes are joined into a string to form a Chromosome
(solution).
In a genetic algorithm, the set of genes of an individual is represented
using a string, in terms of an alphabet. Usually, binary values are used
(string of 1s and 0s). We say that we encode the genes in a
chromosome.
53

Fitness Function
The fitness function determines how fit an individual is (the ability of
an individual to compete with other individuals).
It gives a fitness score to each individual.
The probability that an individual will be selected for reproduction is
based on its fitness score.
55

Selection
The idea of selection phase is to select the fittest individuals and let
them pass their genes to the next generation.
Two pairs of individuals (parents) are selected based on their fitness
scores. Individuals with high fitness have more chance to be selected
for reproduction.
56

Crossover
Crossover is the most significant phase in a genetic algorithm. For each
pair of parents to be mated, a crossover point is chosen at random
from within the genes.
For example, consider the crossover point to be 3 as shown below.
57

Crossover
• Offspring are created by exchanging the genes of parents among
themselves until the crossover point is reached.
• The new offsprings A5 and A6 are added to the population.
58

Probability in crossover
• Choosing which chromosome to perform crossover
• Choosing the pair to perform crossover
• Choosing the part of chromosome to perform crossover
59

Mutation
• In certain new offspring formed, some of their genes can be
subjected to a mutation with a low random probability.
• This implies that some of the bits in the bit string can be flipped.
60

Probability in mutation
• Choosing which chromosome to perform mutation
• Choosing whether to perform mutation or not
• Choosing the part of chromosome to perform mutation
61

Sample Java Code
https://github.com/ferdinjoe/Genetic-Algorithm
62

Probability usage in programming
63

64
# generate random floating point values
from random import seed
from random import random
# seed random number generator
seed(1)
# generate random numbers between 0-1
for _ in range(10):
value = random()
print(value)

65
# generate random integer values
from random import randint
seed(1)
# generate some integers
for _ in range(10):
value = randint(0, 10)
print(value)

66
# choose a random element from a list
from random import choice
seed(1)
# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)
# make choices from the sequence
for _ in range(5):
selection = choice(sequence)
print(selection)

67
# randomly shuffle a sequence
from random import shuffle
seed(1)
# prepare a sequence
sequence = [i for i in range(20)]
print(sequence)
# randomly shuffle the sequence
shuffle(sequence)
print(sequence)

Slides Available in link below
www.slideshare.net/ferdinjoe
68

More topics recommended to learn
• Queueing Theory
• Statistics
• Numerical Methods
• Discrete Mathematics
• Optimization problems in Operations Research
69

Probability Theory for Data Scientists

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Probability Theory for Data Scientists

Similar to Probability Theory for Data Scientists (20)

More from Ferdin Joe John Joseph PhD

More from Ferdin Joe John Joseph PhD (20)

Recently uploaded

Recently uploaded (20)

Probability Theory for Data Scientists