2. Table of Contents
STATISTICS.................................................................................................................................................. 3
INTRODUCTION ......................................................................................................................................... 3
FREQUENCY DISTRIBUTION...................................................................................................................... 3
E.1....................................................................................................................................................... 3
E.2....................................................................................................................................................... 6
VISUAL REPRESENTATIONS: CHARTS AND GRAPHS........................................................................................ 7
PIE CHART ............................................................................................................................................... 7
E.3....................................................................................................................................................... 8
BAR GRAPH............................................................................................................................................. 9
E.4....................................................................................................................................................... 9
HISTOGRAM.......................................................................................................................................... 10
E.5..................................................................................................................................................... 10
FREQUENCY POLYGON.......................................................................................................................... 11
MEASURE OF CENTRAL TENDENCY ......................................................................................................... 11
MEAN.................................................................................................................................................... 12
E.6..................................................................................................................................................... 12
E.7..................................................................................................................................................... 12
MEDIAN ................................................................................................................................................ 14
E.8..................................................................................................................................................... 14
MODE.................................................................................................................................................... 14
E.9..................................................................................................................................................... 14
CUMLATIVE FREQUENCY CURVE (OGIVE) ............................................................................................... 15
E.10................................................................................................................................................... 15
APPLICATION......................................................................................................................................... 16
E.11................................................................................................................................................... 16
MEASURE OF DISPERSION....................................................................................................................... 17
RANGE................................................................................................................................................... 17
INTERQUARTILE RANGE (IQR)............................................................................................................... 17
Outliers ............................................................................................................................................. 18
Upper & Lower Fences...................................................................................................................... 18
VARIANCE ............................................................................................................................................. 19
E.12................................................................................................................................................... 19
3. STANDARD DEVIATION ......................................................................................................................... 21
MEAN DEVIATION................................................................................................................................. 21
E.13................................................................................................................................................... 21
SUMMARY ............................................................................................................................................... 23
PROBABILITY ............................................................................................................................................ 24
INTRODUCTION ....................................................................................................................................... 24
EXPERIMENTAL AND THEORETICAL PROBABILITY ................................................................................. 25
E.14................................................................................................................................................... 25
E.15................................................................................................................................................... 25
ADDITION OF PROBABILITIES FOR MUTUALLY EXCLUSIVE AND NON-MUTUALLY EXCLUSIVE EVENTS
.................................................................................................................................................................. 26
E.16................................................................................................................................................... 26
E.17................................................................................................................................................... 27
Probabilities with Replacement & without Replacement ................................................................. 27
MULTIPLICATION OF PROBABILITIES FOR INDEPENDENT EVENTS........................................................ 28
E.18................................................................................................................................................... 28
SUMMARY ............................................................................................................................................... 29
QUIZ.......................................................................................................................................................... 30
REFERENCES ................................................................................................. Error! Bookmark not defined.
4. STATISTICS
INTRODUCTION
Predictions are usually based on quantitative and qualitative data. These decisions are easy if the
data sends a clear message, but, sometimes, the message is obscured by variability. Statistics is a
form of mathematics that describes deviations in data by analyzing and interpreting it. Data is
collected first, then it is displayed in tabular form. Finally, we analyze and interpret it to discover
patterns and variability from the patterns.
An analysis of quantitative data includes, measure of shape, center and spread. The shape of data
can be symmetric, skewed, bell-shaped or flat. These shapes are summarized by measure of
central tendency (mean, mode and median) and measure of dispersion (range, interquartile range,
standard deviation and variance). These parameters can be used to compare different
distributions numerically. Similarly, different plots are used to compare these parameters
visually.
FREQUENCY DISTRIBUTION
Before the analysis of the quantitative data is being done, the data is summarized in a table called
frequency distribution table. The list of orderly arranged data is called frequency distribution. It
is called frequency distribution table because the frequencies of variables in a data are distributed
and presented in a table.
E.1
Someone from the class collected the data of Statistic exam score of 24 students. The total marks
of the exam are 100. The results are as follows:
5. 92 88 86 91 92 77 74 79 88 81 65 92
68 77 73 77 85 89 92 86 71 86 91 74
Step 1: To draw a frequency distribution table, we need to re-arrange these marks in ascending
order. Give heading and write the repeated marks one time.
Marks
65
68
71
73
74
77
79
81
85
86
88
89
91
92
Step 2: Now make another table of frequency in front of each score and, below it, write the
number of times that score occurred in the given data.
6. Marks Frequency
65 1
68 1
71 1
73 1
74 2
77 3
79 1
81 1
85 1
86 3
88 2
89 1
91 2
92 4
This is our frequency table.
Step 3: To verify, if you have drawn a right table, add the frequencies. The sum should be equal
to the sample size i.e. 24.
Apart from frequency, the frequency tables consist of various components. We sometimes group
the values if the data is large. This is called class. We will discuss the components of class
below:
7. Class interval: It is a size of each class in a frequency distribution table. We use it when have a
large range of data. We classify the data in the number of groups according to the size. Each of
these groups called class interval. The sizes of class intervals are mostly of the same size. For
example, in the above example, the class intervals could be: 65-74, 75-84 and 85-94.
Class limit: It separates one class from another in the frequency distribution table. The lower-
class limit of a particular class is a smallest value of that class. Similarly, the upper-class limit of
a particular class is a highest value of that class. In class interval 65-74, lower-class limit is 65
and upper-class limit is 74.
Class mark: It is a middle value of the class interval. The method to calculate class mark of a
particular class interval is to add the upper and lower limits and divide the answer by two i.e.
πΏππ€ππ πΏππππ‘ + πππππ πΏππππ‘
2
In above example, class mark of 65-74 class interval is
65+74
2
= 69.5
Class boundary: It separates one class from another. Lower class boundary is found by
subtracting 0.5 units from the lower-class limit and upper-class boundary is found by adding 0.5
units to the upper-class limit. In the above example, class boundary for class interval 65-74 is
64.5-74.5.
Class width: It is a difference between the upper and lower boundaries of any class interval.
E.2
Classify the table of E.1 (calculations are shown above):
Class Limit Class Mark Class Boundary Frequency
8. 65-74 69.5 64.5-74.5 6
75-84 79.5 74.5-84.5 5
85-94 89.5 84.5-94.5 13
The sum of frequencies will again remain same i.e. equal to the sample size.
VISUAL REPRESENTATIONS: CHARTS AND GRAPHS
Different charts and graphs can be drawn using the data in frequency distribution table. The most
common visual representations can be achieved using pie charts, bar charts, histograms and
frequency polygons.
PIE CHART
Pie chart is used when you want to display the percentages of each category. It consists of
multiple parts, each display the size of particular category. The sizes are actually the angles of a
circle. We know a circle has 360 degrees, so if you are drawing it manually, you can find the
angle of a particular category by using the percentage you get for this category.
9. E.3
Draw a pie chart for example E.2.
Re-draw the table below.
Class Limit Frequency
65-74 6
75-84 5
85-94 13
Total number of students in a sample = 24
Percentage of students who get marks in between 65-74 =
6
24
Γ 100 = 25%
Percentage of students who get marks in between 75-84 =
5
24
Γ 100 = 20.83%
Percentage of students who get marks in between 85-94 =
13
24
Γ 100 = 54.17%
Hence, class interval 85-94 will cover maximum area and class 65-74 will cover slightly more
area than 75-84 in the pie chart.
10. To verify that you draw a right chart, add the percentages. Their sum must be equal to 100%.
BAR GRAPH
Bar graph is drawn on π₯-π¦ plane when you want to display relationships between categorical
data. The size and length of the bars represent different values. Bars are either vertical or
horizontal. This graph is drawn when data is non-continuous. It compares the size of different
categories.
E.4
Compare the frequencies of scores of different class intervals in E.2 using a bar graph.
6
5
13
Frequency of Scores
65-74
75-84
85-94
11. HISTOGRAM
Histogram is a set of vertical bars drawn using the class interval. The area of the vertical bars is
proportional to the frequencies. The trend is to keep variable on horizontal axis and frequencies
on vertical axis. Histograms are drawn when data is continuous (no gap between the bars).
E.5
Draw a histogram for example E.2.
0 2 4 6 8 10 12 14
65-74
75-84
85-94
Frequency of scores
Frequency of scores
12. FREQUENCY POLYGON
Frequency polygon is a sub-category of histogram. The difference is that in frequency polygon
class mark is kept on the horizonal axis, instead of class interval, and the frequency, as-usual on
the vertical axis. The points are marked and joined using a line then. At the end you join the two
ends of the line segments to x-axis.
Frequency polygons are not as accurate as histograms.
MEASURE OF CENTRAL TENDENCY
A measure of central tendency describes the ways to define the center of a data set. There are
three measures of central tendency, which we will see below.
13. MEAN
Mean is defined as a numerical average of the given data. For a randomly given data, it can be
calculated using the formula:
π₯Μ =
β π₯π
π
π=1
π
where π is a sample size, π is an entry number in a given data and π₯π is ππ‘β entry and β π₯π
π
π=1 is a
sum of all entries.
E.6
Calculate the mean of the data given below.
92 88 86 91 92 77 74 79 88 81 65 92
68 77 73 77 85 89 92 86 71 86 91 74
Using the formula, mean is
π₯Μ =
β π₯π
π
π=1
π
=
1974
24
= 82.25
If the data is given in frequency form, then mean can be calculated as:
π₯Μ =
β ππ π₯π
π
π=1
β ππ
π
π=1
Letβs now calculate mean from the frequency table we drawn for above example and see if we
get the same answer. For that, draw another table (ππ π₯π).
E.7
Find the mean of the above data using the frequency method and compare your answer to E6.
15. MEDIAN
The value that lies at center of the distribution is called median. To find the median, you need to
arrange data in ascending order first. If the sample size is odd, the middle value would be the
median, but if the sample size is even, the median would be the average of the middle two
values.
E.8
For the data in E.5, find the median.
Sample size of that data is 24 (even).
Hence, the median would be the average of 12th
and 13th
value:
65 68 71 73 74 74 77 77 77 79 81 85
86 86 86 88 88 89 91 91 92 92 92 92
ππππππ =
85 + 86
2
= 85.5
MODE
The most frequent value that occurs in a data is called mode. There is no mode if no value is
present more than once. If there are two values that occur equally, then both of them are the
modes.
E.9
What is a mode in the above example?
In the example above, 92 is a mode because it occurs four times.
16. CUMLATIVE FREQUENCY CURVE (OGIVE)
Cumulative frequency curves are drawn by keeping upper class boundary on the π₯-axis and
cumulative frequency on the π¦-axis. Cumulative frequencies are found by adding frequencies in
turn. These curves are called ogives because of their typical S-shape.
E.10
Draw a cumulative frequency curve for the example E.2.
We need to add a class with 0 frequency, so that we can take a start from π₯-axis. Here is a table
of our E.2. Cumulative frequency column is added.
Class Limit Upper Class
Boundary
Frequency Cumulative
Frequency
55-64 64.5 0 0
65-74 74.5 6 0+6=6
75-84 84.5 5 6+5=11
85-94 94.5 13 11+13=24
17. APPLICATION
Cumulative frequencies and curves have the use in real life as well.
E.11
How many students have scored marks under 85 in E.2? Also find the students who have scored
94 or below.
To find the number of students who have scored marks under 85, we will simply add all the
students in the list those have scored below 85 i.e. 6 + 5 = 11.
Similarly, the students who have scored 94 or below, the answer is 6 + 5 + 13 = 24.
Class Limit Frequency Cumulative
Frequency
55-64 0 0
65-74 6 6
0
5
10
15
20
25
30
0 20 40 60 80 100
CumulativeFrequency
Upper Class Boundary
Ogive Chart
18. 75-84 5 11
85-94 13 24
If we want to see students who have scored 90 or below, we will redraw this table using different
class intervals: 61-65, 66-70, β¦, 86-90, 91-95, and calculate frequencies and cumulative
frequencies accordingly.
MEASURE OF DISPERSION
Sometimes, measure of central tendency is not enough to describe the data. We, then, make use
of variability to further describe it. Two data sets can have same mean, but they still can be
different. This can be determine using the measure of dispersion. Its main parameters include
range, interquartile range, variance, mean deviation, and standard deviation.
RANGE
Range is a difference between the largest and smallest values in the data set. In our example, the
largest value is 92 and the smallest value is 65, hence the range is:
92 β 65 = 27
INTERQUARTILE RANGE (IQR)
IQR is defined as a difference between the 25th
(first quartile, π1) and 75th
(third quartile, π3)
percentiles:
πΌππ = π3 β π1
where,
19. π1 =
π + 1
4
th πππ πππ£ππ‘πππ
ππππππ = π2 =
2( π + 1)
4
=
π + 1
2
th πππ πππ£ππ‘πππ
π3 =
3( π + 1)
4
th πππ πππ£ππ‘πππ
IQR defines the middle 50% of the observations. A bigger IQR means the spread of the curve is
wide.
Semi-interquartile range is half of IQR:
πΌππ
2
=
π3 β π1
2
Outliers
Outliers are the values that are either much bigger or much smaller than rest of the data. The
criteria to mark any data value as an outlier is that if that value is either larger than π3 by at least
1.5 times the IQR or smaller than π1 by at least 1.5 times the IQR i.e.
1.5( πΌππ ) > π3
or
1.5(πΌππ ) < π1
Upper & Lower Fences
Lower fence is a lower limit and upper fence is an upper limit of the data. Any value outside
them will consider as an outlier.
20. Lower Fence = π1 β 1.5( πΌππ )
Upper Fence = π3 + 1.5(πΌππ )
πππππ πΉππππ < ππ’π‘ππππ < πΏππ€ππ πΉππππ
VARIANCE
Variance is an average of the squared difference from the mean. It tells about how far a data is
spread out. A variance of 0 means no variability. Greater the difference in values of a data set,
greater the variance of that data set.
Population variance (π2
) and sample variance (π 2
) can be calculated as follow:
π2
= ( π β π) πΜ Μ Μ Μ Μ Μ Μ Μ Μ Μ Μ =
β( π β π) π
π
ππ
β π π
β
β( π) π
π
π
π 2
= ( π β πΜ ) πΜ Μ Μ Μ Μ Μ Μ Μ Μ Μ Μ =
β( π β πΜ ) π
π β 1
ππ
β π π
β
β( π) π
π
π β 1
22. STANDARD DEVIATION
Standard deviation is a measure of spread of data about the mean. It can be found by taking a
square root of variance. It is more important because variance just gives rough idea of spread,
while it gives exact distances from the mean.
Population standard deviation and sample standard deviation are defined as:
π = β
β( π β π) π
π
ππ
ββ π π β
β( π) π
π
π
π = β
β( π β πΜ ) π
π β 1
ππ
ββ π π β
β( π) π
π
π β 1
For the above example,
π = β68.271 = 8.263
MEAN DEVIATION
Mean deviation is the mean of the absolute deviations from the mean, unlike variance, which is
the square of the standard deviation. It calculates the average of how far the values are from the
center.
ππππ π·ππ£πππ‘πππ = | π β π|Μ Μ Μ Μ Μ Μ Μ Μ Μ =
β |π β π|
π
The modulus sign means the absolute value of the differences.
E.13
What is the mean deviation of a data provided in E.1?
24. SUMMARY
Frequency distribution helps to analyze data and estimate the frequencies of population based on
a given sample.
Class interval is a size of each class in a frequency distribution table.
Class limit separates one class from another in the frequency distribution table.
Class mark is a middle value of the class interval.
The values in frequency distribution table help to draw charts and graphs.
Pie chart is used when you want to display the percentages of each category.
Bar graph is drawn on π₯-π¦ plane when you want to display relationships between categorical
data.
Histogram is a set of vertical bars drawn using the class interval.
Frequency polygon is drawn by keeping class mark on the horizonal axis and the frequency on
the vertical axis.
Mean: π₯Μ =
β π₯π
π
π=1
π
The middle value is called median.
The most frequent value that occurs in a data is called mode.
Cumulative frequency curves are drawn by keeping upper class boundary on the π₯-axis and
cumulative frequency on the π¦-axis.
Range is a difference between the largest and smallest values in a data set.
First quartile: π1 =
π+1
4
th πππ πππ£ππ‘πππ
Median: π2 =
2(π+1)
4
=
π+1
2
th πππ πππ£ππ‘πππ
Third quartile: π3 =
3(π+1)
4
th πππ πππ£ππ‘πππ
25. Interquartile range: πΌππ = π3 β π1
Population variance: π2
=
β(πβπ) π
π
Sample variance: π 2
=
β(πβπΜ ) π
πβ1
Population standard deviation: π = β
β(πβπ) π
π
Sample standard deviation: π = β
β(πβπΜ ) π
πβ1
Mean Deviation:
β |πβπ|
π
PROBABILITY
INTRODUCTION
Probability is a measure of the likelihood that how often an event will happen after multiple
trials. Random processes are explained mathematically with the help of probability model. For
example, flipping a coin, rolling a dice, or drawing a card can result in various possibilities,
which have an equal chance of appearing. There is an equal chance of getting head and tail in
flipping a coin, an equal chance of getting number from 1 to 6 on rolling a dice, and so on.
Sample points represent the outcomes in a probability model. These sample points combine to
make up events. Addition and Multiplication rules are used to compute probabilities of events.
The two-way tables are constructed to find independence and conditional probabilities, which are
used to interpret the events.
26. EXPERIMENTAL AND THEORETICAL PROBABILITY
Experimental probability is a probability of what actually happens while Theoretical probability
is what we predicted to happen. Both probabilities are calculated in the same manner using the
number of ways an outcome can occur divided by the total number of outcomes i.e.
π( ππ£πππ‘) =
ππ’ππππ ππ π‘ππππ ππ ππ£πππ‘ ππππ’π
πππ‘ππ ππ’ππππ ππ ππ’π‘πππππ
E.14
What is the theoretical probability of getting a head on tossing a coin 10 times?
If you plan to toss a coin 10 times, 50% of the time you will expect it to land on heads and 50%
of the time you will expect it to land on tails. Following table display the theoretical results.
Outcomes Frequency
Heads 5
Tails 5
Total 10
Hence, the theoretical probability of getting heads is
5
10
=
1
2
ππ 50%.
27. E.15
John tossed a coin 10 times and get the following results. What is the experimental probability of
getting a head?
Outcomes Frequency
Heads 7
Tails 3
Total 10
The experimental probability of getting the heads is =
7
10
ππ 70%
ADDITION OF PROBABILITIES FOR MUTUALLY EXCLUSIVE AND NON-
MUTUALLY EXCLUSIVE EVENTS
Mutually exclusive events are events that cannot occur simultaneously. Probability of mutually
exclusive events can be found by adding their probabilities using the addition rule i.e.
π( π΄ ππ π΅) = π( π΄) + π( π΅)
E.16
What is a probability of getting a 3 or a 4 in rolling a six-sided die?
π(3 ππ 4) = π(3) + π(4) =
1
6
+
1
6
=
2
6
=
1
3
For non-mutually exclusive, the addition rule is a bit modified:
π( π΄ ππ π΅) = π( π΄) + π( π΅) β π(π΄ πππ π΅)
28. π(π΄ πππ π΅) means overlapping of two events.
E.17
What is a probability of choosing an ace or a heart from a deck of 52 cards?
π( πππ ππ βππππ‘) = π( πππ) + π(βππππ‘) β π(πππ ππ βππππ‘)
π( πππ ππ βππππ‘) =
4
52
+
13
52
β
1
52
=
4
13
We subtract π(πππ πππ βππππ‘) because the addition causes an ace of heart to be counted twice.
The visual representation and formulas of both type of events is shown below:
Probabilities with Replacement & without Replacement
Probability with replacement is a probability of an event where the total quantity of a particular
item is given and you can replace the item every time you choose one. For example, if there are
15 marbles in a bag, it will remain 15 for all probabilities.
30. SUMMARY
Probability is calculated by: π( ππ£πππ‘) =
ππ’ππππ ππ π‘ππππ ππ ππ£πππ‘ ππππ’π
πππ‘ππ ππ’ππππ ππ ππ’π‘πππππ
Probability of mutually exclusive events: π( π΄ ππ π΅) = π( π΄) + π( π΅)
Probability of non-mutual exclusive events: π( π΄ ππ π΅) = π( π΄) + π( π΅) β π(π΄ πππ π΅)
Probability of occurrence of two independent events: π( π΄ πππ π΅) = π( π΄) . π(π΅)
31. QUIZ
1. Height of five baseball players is recorded as 173 cm, 182 cm, 165 cm, 180 cm, and π₯
cm. What is the height of the fifth player if the mean height for these five players is 180?
A. 176 cm
B. 180 cm
C. 185 cm
D. 192 cm
E. 200 cm
2.
Person Energy burned per hour
(kilojoules)
A 3020
B 2960
C 3140
D 2860
E 3240
F 3000
ESTIMATED NUMBER OF
CALORIES BURNED
32. G 3220
H 2890
I 2990
J 3100
The table above shows the calories burned (in kilojoules) by the ten persons of different
weights when they jogged at different speeds for an hour.
What is the median number of calories burned for all the ten people in an hour?
A. 2890 kilojoules
B. 3000 kilojoules
C. 3005 kilojoules
D. 3010 kilojoules
E. 3020 kilojoules
3. The Finance department of a company has published an annual report regarding the
performance of a company. A simple random sample of the losses of previous 6 years is
shown in the table below (in million dollars).
55,000 40,000 32,000
22,000 14,000 8,000
33. What is a standard deviation of this sample?
A. $957
B. $1338
C. $1580
D. $17433
E. None of these.
4. Consider a data set having 8 similar values (π§, π§, β¦ . , π§, π§). If two more values added to
this set, π§ β 2 at the left end and π§ + 2 added at the right end of the data set, which of the
following would change?
A. Mean
B. Mode
C. Range
D. Mean and Mode
E. Mean and Range
34. 5.
The Pie Chart above shows the favorite hobbies of students at Atlanta High School. What
is the ratio of the percentage of the students who like to go for fishing to the percentage
of the students who like to go for hoteling?
A. 1: 2
B. 2: 3
C. 5: 6
D. 6: 5
E. 9: 5
Diving
15%
Fishing
25%
Riding
20%
Hoteling
30%
Jogging
10%
Favorite hobbies at Atlanta High School
100% = 300 Students
35. 6.
An economist researches and finds that employment in Pakistan varied considerably after
1980βs as shown in the Histogram above. What is the average percentage change in
employment from 1981 to 1985?
A. 37.5%
B. 45%
C. 54.5%
D. 60%
E. 66%
7. Consider the two set below:
#1 = {A, C, E, Y, Z}
#2 = {B, D, M, N, O}
What is the probability of picking an E from set#1 and an N from set#2?
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
1981 1982 1983 1984 1985
PeopleEmployed
Year
Employment in Pakistan 1981-1985
36. A.
1
20
B.
1
10
C.
9
20
D.
1
4
E.
1
5
8. Suppose that a coin has been flipped twice. What is the probability of getting tail on the
first flip or a tail on the second flip (or both)?
A. 0.05
B. 0.25
C. 0.50
D. 0.75
E. 0.90
9. A rolling of a die followed tossing of a coin. What is the probability that you will get
neither a 3 on the die nor tail on the coin (or both)?
A.
1
21
B.
1
4
C.
5
12
D.
7
12
E.
5
6
37. 10.A local surveying body found out that 9 out of 10 people are facing depression in their
social life. If three people are chosen at random with replacement, what is the probability
that all three people are the ones dealing with depression problems?
A. 7.29
B. 6.39
C. 5.49
D. 4.59
E. 3.69