HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
Statistics and probability
1. Statistics and Probability
Final Project
Names ID Department
Shahwar Irshad 29787 BS(SE)
Syed Haseeb Hussain 30413 BS(SE)
Adnan Ahmad 30481 BS(SE)
Muhammad Ansar 30461 BS(SE)
Submitted to: Mr. Abrar Khalid
Submission Date: 29/05/2020
2. Measures of variation:
An average is an attempt to summarize a set of data using just one number.an average taken by
itself may not always be very meaningful. We need a statistical cross-reference that measures
the spread of the data. Two sets could have the same mean and look very different in terms of
spread.
Example:
Set A: 10, 10, 11, 12, 12
Set B: 2, 4, 11, 18, 20
Both have a mean of 11.
Example 2:
A testing lab wishes to test two experimental brands of outdoor paint to see how long each will
last before fading. The testing lab makes 6 gallons of each paint to test. Since different chemical
agents are added to each group and only six cans are involved, these two groups constitute two
small populations. The results (in months) are shown. Find the mean of each group.
Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
Mean for brand A: 𝜇 = ∑
𝑥
𝑁
=
210
6
Mean for brand B: 𝜇 = ∑
𝑥
𝑁
=
210
6
3. Used to determine the scatter of values in a distribution. In this presentation, we will consider
the six measures of variation:
1. Range.
2. Quartile deviation.
3. Mean deviation.
4. Variance.
5. Standard deviation.
6. The coefficient of variation.
1)Range:
The range is a measure of variation, it is the difference between the largest and smallest
values of a data distribution. Does not tell how much other values vary from one another or
from the mean…
Range = h – l
Where:
H= represents the highest value.
L = represents the lower value.
Example:
A testing lab wishes to test two experimental brands of outdoor paint to see how long each will
last before fading. The testing lab makes 6 gallons of each paint to test. Since different chemical
agents are added to each group and only six cans are involved, these two groups constitute two
small populations. The results (in months) are shown. Find the mean of each group.
Brand A Brand B
10 35
60 45
50 30
30 35
40 40
20 25
Rang for brand A: 60 - 10 = 50.
Rang for brand B: 45 – 25 = 20.
4. 2)Quartile Deviations:
Quartile Deviations Is a measure that describes the existing dispersion in terms of the
distance selected observation points. The smaller the quartiles deviation, the greater the
concentration in the middle half if the observation in the data set. Are measures of
variation which uses percentiles, deciles, or quartiles?
Quartile Deviation (QD) means the semi variation between the upper quartiles (Q3) and
lower quartiles (Q1) in a distribution. Q3 - Q1 is referred as the interquartile range.
Formula:
QD = Q3 - Q1/2 where and are the first and third quartiles and is the interquartile range.
UngroupedData Example:
33 56 74 82 51 48 65 81 52 71 85 50 67 83 68 38 58 77
45 62 79 43 59 79 41
Arrange the 25 entries from lowest to highest.
33 48 58 68 79
38 50 59 71 81
41 51 62 74 83
43 52 65 77 83
45 56 67 79 85
(n=25)
For semi-interquartile range
Since Q3= P75 and Q1=P25 we use P75 and P25…..
For P75:
Cum.Freq.of P75= x = 18.75 or 19
This means that P75 is the 19th entry
Therefore, P75 =77
5. For P25
Cum. Freq. of P25= . 25=6.6
Which means that P25 is entry 6th
So P25= 48
Hence semi interquartile range = 14.5
Grouped Data Example:
class intervals f cf
21-23 3 3
24-26 4 7
27-29 6 13
30-32 10 23
33-35 5 28
36-38 2
N=30
30
Note that Q3-Q1= P75-P25
For P75
Cum freq. of P75 = x 75= 22.5 or 22
L= 29.5, f= 10, F=13, c=3, j= 75
P75= 32.35
For P25
Cum freq. of P25= x 25= 7.5 or 8
L= 26.5 , f= 6 , F=7, c=3 , j= 25
P25= 26.75
Finally the interquartile range is P75-P25= 32.35-26.75= 5.6
6. 3)Mean Deviation:
The mean deviation or average deviation is the arithmetic mean of the absolute deviations and
is denoted by
Example:
Calculate the mean deviation of the following distribution: 9 , 3 , 8 , 8 , 9 , 8 , 9 , 18.
For Grouped Data:
If the data Is grouped in a frequency table, the expression of the mean deviation is
Example:
Calculate the mean deviation of the following distribution:
X F x-f |x-x| |x-x|.f
[10 , 15) 12.5 3 37.5 9.286 27.858
[15,20) 17.5 5 87.5 4.286 21.43
[20,25) 22.5 7 157.5 0.714 4.998
[25,30) 27.5 4 110 5.714 22.856
7. [30,35) 32.5 2 65 10.714 21.428
21 457.5 98.57
4)Variance
In probability theory and statistics variance measures how far a set of numbers is spread out. A
variance of zero indicates that all the values are identical. Variance is always non-negative: a
small variance indicates that the data points tend to be very close to the mean expected value
and hence to each other, while a high variance indicates that the data points are very spread
out around the mean and from each other.
It is important to distinguish between the variance of a population and the variance of a
sample. They have different notation, and they are computed differently. The variance of a
population is denoted by σ2 ; and the variance of a sample, by s2 .
Variance of a population
The variance of a population is defined by the following formula: σ2 = Σ ( Xi - X )2 / N where σ2
is the population variance, X is the population mean, Xi is the ith element from the population,
and N is the number of elements in the population
Variance of a sample
The variance of a sample is defined by slightly different formula: s2 = Σ ( xi - x )2 / ( n - 1 )
where s2 is the sample variance, x is the sample mean, xi is the ith element from the sample,
and n is the number of elements in the sample. Using this formula, the variance of the sample is
an unbiased estimate of the variance of the population.
Example:
Suppose you want to find the variance of scores on a test. Suppose the scores are 67, 72, 85, 93
and 98.
Write down the formula for variance: σ2 = ∑ (x-µ)2 / N
There are five scores in total, so N = 5.
The mean (µ) for the five scores (67, 72, 85, 93, 98), so µ = 83
Then after summarize the numbers
σ2 =706 / 5
This is the variance for the dataset: σ2 = 141.2
8. 5)Standard Deviation
The Standard Deviation is a measure of how spread out numbers are. The symbol for Standard
Deviation is σ (the Greek letter sigma). This is the formula for Standard Deviation:
This is the essential idea of sampling. To find out information about the population (such as
mean and standard deviation), we do not need to look at all members of the population; we
only need a sample. But when we take a sample, we lose some accuracy
The Population Standard Deviation:
The Sample Standard Deviation:
9. 6)Coefficientof Variation:
Coefficient of Variation (CV) Refers to a statistical measure of the distribution of data points in a
data series around the mean. It represents the ratio of the Standard Deviation to the mean. The
coefficient of variation is a helpful statistic in comparing the degree of variation from one data
series to the other, although the means are considerably different from each other.
Coefficient of VariationFormula
Coefficient of Variation is expressed as the ratio of standard deviation and mean. It is often
abbreviated as CV. Coefficient of variation is the measure of variability of the data. When the
value of coefficient of variation is higher, it means that the data has high variability and less
stability. When the value of coefficient of variation is lower, it means the data has less
variability and high stability. The formula for coefficient of variation is:
Coefficient of Variation = Standard Deviation / Mean
Example:
Find the coefficient of variation of 5, 10, 15, and 20?
Formula for the mean: x =
∑ 𝑥
𝑁
x = 50
50
4
= 12.5
X x- ẋ (x− ẋ)2
5 -7.5 56.25
10 -2.5 6.25
15 2.5 6.25
20 7.5 56.25
∑X=50 ∑(x- ẋ)2
=125
10. Formula for population standard deviation:
S= √∑(x − x¯)2/N
=
125
4
=5.56
Coefficient of variation= standard deviation / mean =
5.59 /12.5
Coefficient of variation = 0.4470
Chebyshev’s Theorem:
specifies the proportions of the spread in terms of the standard deviation (for any shaped
distribution)
standard deviations of the mean, will be at least
Where k is a number greater than 1 (k is not necessarily an integer).
Example
What percent of the data in a set should fall within 3 standard deviations of the mean?
1 −
1
𝑘2
= 1 −
1
32
= 1 −
1
9
=
8
9
= 𝟖𝟗%
So, 89% of the numbers in the set fall within 3 standard deviations of the mean