1. STAT2111
INTRODUCTION TO STATISTICS &
PROBABILITY
Instructor: Mr. HM Zahid
Lecturer,
Department of Information Sciences
University of Education, Lahore, Pakistan
PART 1 - DATA
3. PROBABILITY AND STATISTICS
◾ Statistics is the mathematical science behind the problem
“what can I know about a population if I’m unable to reach
every member?”
4. PROBABILITY AND STATISTICS
◾ If we could measure the height of every resident of Australia,
then we could make a statement about the average height of
Australians at the time we took our measurement.
◾ This is where random sampling comes in.
5. PROBABILITY AND STATISTICS
◾ If we take a reasonably sized random sample of Australians and
measure their heights, we can form a statistical inference about
the population of Australia.
◾ Probability helps us know how sure we are of our conclusions!
7. WHAT IS DATA
◾ Data = the collected observations we have about
something.
◾ Data can be continuous
"What is the stock
price?”
◾ or categorical
"What car has the best repair history?”
8. WHY DATA MATTERS
◾ Helps us understand things as they are:
"What relationships if any exist between two events?"
"Do people who eat an apple a day enjoy fewer doctor's visits
than those who don't?"
9. WHY DATA MATTERS
◾ Helps us predict future behavior to guide
business decisions:
"Based on a user's click history which ad is
more likely to bring them to our site?"
11. VISUALIZING DATA
◾ To a graph:
◾ The graph uncovers
two distinct trends
1. an increase in
passengers flying
over the years
2. a greater number of
passengers flying in
the summer
months.
Flight
s
14. LEVELS OF MEASUREMENTS
◾ Nominal
◾ Predetermined categories
◾ Can’t be sorted
◾ Animal classification (mammal, fish, or
reptile)
◾ Political party (PTI, PPP, or PMLN)
25. EQUATION EXAMPLE
◾ Formula for calculating a sample
mean:
◾ Read out loud:
◾ “𝒙 bar (the symbol for the sample mean) is equal to the sum (indicated
by the Greek letter sigma) of all the 𝒙-sub-𝒊 values in the series as 𝒊
goes from 1 to the number 𝒏 items in the series divided by 𝒏."
29. MEASUREMENTS OF DATA
◾ “What was the average return?”
Measures of Central Tendency
◾ “How far from the average did individual values
stray?”
Measures of Dispersion
30. MEASURES OF CENTRAL TENDENCY
(MEAN, MEDIAN, MODE)
◾ Describe the “location” of the
data
◾ Fail to describe the “shape” of the
data
◾ mean= “calculated average”
◾ median= “middle value”
34. MEAN VS. MEDIAN
◾ The mean can be influenced by outliers.
◾ The mean of {2,3,2,3,2,12} is 4
◾ The median is 2.5
◾ The median is much closer to most of the values
in the series!
36. MEASURES OF CENTRAL TENDENCY
◾ Data Set:
3,4,3,1,2,3,9,5,6,7,4,8
◾ Mean
◾ Median
1,2,3,3,3,4,4,5,6,
7,8,9
◾ Mode
Hence Answer =
4
The value 3 appears 3 times, and 4 appears 2 times and all other values appear
once.
Hence 3 is the mode
38. MEASURE OF DISPERSION
(RANGE,VARIANCE, STANDARD DEVIATION)
9 10 11 13 15 16 19 19 21 23 28 30 33 34 36
39
◾ In this sample the mean is 22.25
◾ How do we describe how “spread out” the
sample is?
40. VARIANCE
◾ Calculated as the sum of square distances from each
point to the mean
◾ There’s a difference between the SAMPLE variance
and the POPULATION variance
◾ subject to Bessel's correction (𝒏−𝟏)
43. STANDARD DEVIATION
◾ square root of the variance
◾ benefit: same units as the sample
◾ meaningful to talk about
“values that lie within one standard deviation of the
mean”
44. STANDARD DEVIATION
◾ 68-95-99.7
Rule
◾ The normal distribution is commonly associated with the
68-95-
99.7 rule
◾ 68% of the data is within 1 standard deviation (σ) of the mean
(μ),
◾ 95% of the data is within 2 standard deviations (σ) of the
mean (μ),
49. QUARTILES AND IQR
◾ The quartile concept is used to divide the data into
four parts.
◾ It is the same as median where it divides the given data
into two equal parts
◾ This quartile concept comes under the subject of
statistics which is a study of the collection of data
analyzing it, interpreting, presenting organized data.
50. QUARTILES AND IQR
◾ Another way to describe data is through quartiles
and the interquartile range (IQR)
◾ Has the advantage that every data point is
considered, not aggregated!
51. Quartile Formula
As mentioned above Quartile divides the data into 4 equal parts. This can
be represented visually by the below figure.
Quartile 1 lies between starting term and the middle term.
Quartile 2 lies between starting terms and last term i.e., Middle term.
Quartile 3 lies between quartile 2 and last term.
There is a separate formula for finding each quartile value. And in order
to find these quartile values first, sort the given number series data into
ascending order.
52. Quartile Formula
Steps to obtain quartile formula are as shown below as follows:
1. Sort the given data in ascending order.
2. Find respective quartile values/terms as per need from the below
formulae.
Where n is the total count of numbers in the given data. And this formula
is valid for even number
53. Examples: Question 1: Find the Quartile 1 for the given data 10, 30, 5, 12,
20, 40, 25, 15, 18.
Step 1: Sort the given data in ny order ( ascending order / descending
order)
5, 10, 12, 15, 18, 20, 25, 30, 40
Step 2: Find 1st Quartile [Note: These formulas are for odd number]
First Quartile = (frac{n + 1}{4})th
term
Second Quartile = (frac{n + 1}{2})th
term
Third Quartile = (frac{3(n + 1)}{4})th
term
Here n = 9 because there are total 9 numbers in the given data.
First Quartile = ((9 + 1)/4)th term
54. Examples: Question 1: Find the Quartile 1 for the given data 10, 30, 5, 12,
20, 40, 25, 15, 18.
2.5th term = 2nd term + (0.5) (3rd term - 2nd term)
= (10) + (0.5) (12 - 10)
= 10+1
= 11
The First Quartile value is 11.
55. Question 2: Find the Second Quartile for the data 10, 30, 5, 12, 20, 40, 25,
15, 18.
Sort the given data in the ascending order
5, 10, 12, 15, 18, 20, 25, 30, 40
Step 2: Find 2nd Quartile
Here i=2
Here n = 9 because there are total 9 numbers in the given data.
Second Quartile = 2(9)/4 th term
= (10/2)th term
= 5th term
5th term is 18
So the second Quartile value is 18.
56. QUARTILES AND IQR
◾ Consider the following series of 20 values:
9 10 10 11 13 15 16 19 19 21 23 28 30 33 34 36 44 45
47 60
1. Divide the series
2. Divide each subseries
3. These become
quartiles
59. FENCES & OUTLIERS
◾ What is considered an “outlier”?
◾ A common practice is to set a “fence” that is 1.5
times the width of the IQR
◾ Anything outside the fence is an outlier
◾ This is determined by the data, not an arbitrary
percentage!