SlideShare a Scribd company logo
 Form Latin word ‘Statis’ means ‘Political
State’.
 Science of Uncertainty
 It Deals with what could be, what might be or
what probably is.
The basis to verify theories and laws in every
discipline.
Overall, it is a method which deals with
numerical facts and figures.
 The Indian army is going to grow by 9-10%
per annum in coming 5 yrs.
 The male female ratio in India is 972 as per
2001 census.
 Indian population is growing by 2% every
year.
 Attendance of a student should be 75% for
appearing in exams.
And many more……
 What is Science?
 Originated form Latin word “Scientia” meaning knowledge.
 Knowledge attained through study or
practice.
 Knowledge covering general truths of the
operation of generals laws (esp obtained and tested
through scientific method) and concerned with physical
world.
 Statistics is not a body of substantive
knowledge, but a body of methods for
obtaining knowledge.
 It can be accepted as scientific method than
a complete science.
Scientific Methods:-Classifies facts, sees their mutual
relation through experimentation, observation, logical
arguments from accepted postulates
Research Process
 Population. Universe. The entire category under consideration. This
is the data which we have not completely examined but to which our
conclusions refer. The population size is usually indicated by a
capital N.
◦ Examples: every user of twitter; all female user of facebook.
 Sample. That portion of the population that is available, or to be
made available, for analysis. A good sample is representative of the
population. We will learn about probability samples and how they
provide assurance that a sample is indeed representative. The
sample size is shown as lower case n.
◦ If your company manufactures one million laptops, they might take a
sample of say, 500, of them to test quality. The population size is N =
1,000,000 and the sample size is n= 500.
Introduction 7
Parameter. A characteristic of a population. The
population mean, µ and the population standard
deviation, σ, are two examples of population
parameters. If you want to determine the
population parameters, you have to take a
census of the entire population. Taking a census
is very costly.
Statistic. A statistic is a measure that is derived
from the sample data. For example, the sample
mean, 𝑋, and the sample standard deviation, s,
are statistics. They are used to estimate the
population parameters.
Introduction 8
Introduction 9
Example of statistical inference from quality control:
 GE manufactures LED bulbs and wants to know how
many are defective. Suppose one million bulbs a
year are produced in its new plant in Staten Island.
The company might sample, say, 500 bulbs to
estimate the proportion of defectives.
◦ N = 1,000,000 and n = 500
◦ If 5 out of 500 bulbs tested are defective, the sample
proportion of defectives will be 1% (5/500). This statistic
may be used to estimate the true proportion of defective
bulbs (the population proportion).
◦ In this case, the sample proportion is used to make
inferences about the population proportion.
Introduction 10
 Descriptive Statistics. Those statistics that summarize a
sample of numerical data in terms of averages and other
measures for the purpose of description, such as the mean
and standard deviation.
◦ Descriptive statistics, as opposed to inferential statistics, are not
concerned with the theory and methodology for drawing inferences
that extend beyond the particular set of data examined, in other
words from the sample to the entire population. All that we care about
are the summary measurements such as the average (mean).
◦ Thus, a teacher who gives a class, of say, 35 students, an exam is
interested in the descriptive statistics to assess the performance of
the class. What was the class average, the median grade, the standard
deviation, etc.? The teacher is not interested in making any inferences
to some larger population.
◦ This includes the presentation of data in the form of graphs, charts,
and tables.
Introduction 11
 Primary data. This is data that has been compiled
by the researcher using such techniques as
surveys, experiments, depth interviews,
observation, focus groups.
 Types of surveys. A lot of data is obtained using
surveys. Each survey type has advantages and
disadvantages.
◦ Mail: lowest rate of response; usually the lowest cost
◦ Personally administered: can “probe”; most costly;
interviewer effects (the interviewer might influence the
response)
◦ Telephone: fastest
◦ Web: fast and inexpensive
Introduction 12
 Secondary data. This is data that has been
compiled or published elsewhere, e.g., census
data.
◦ The trick is to find data that is useful. The data was
probably collected for some purpose other than
helping to solve the researcher’s problem at hand.
◦ Advantages: It can be gathered quickly and
inexpensively. It enables researchers to build on past
research.
◦ Problems: Data may be outdated. Variation in
definition of terms. Different units of measurement.
May not be accurate (e.g., census undercount).
Introduction 13
 Nonprobability Samples – based on convenience or
judgment
◦ Convenience (or chunk) sample - students in a class, mall
intercept
◦ Judgment sample - based on the researcher’s judgment as to
what constitutes “representativeness” e.g., he/she might say these
20 stores are representative of the whole chain.
◦ Quota sample - interviewers are given quotas based on
demographics for instance, they may each be told to interview
100 subjects – 50 males and 50 females. Of the 50, say, 10
nonwhite and 40 white.
 The problem with a nonprobability sample is that we do not
know how representative our sample is of the population.
Introduction 14
 Probability Sample. A sample collected in such
a way that every element in the population has
a known chance of being selected.
 One type of probability sample is a Simple
Random Sample. This is a sample collected in
such a way that every element in the
population has an equal chance of being
selected.
 How do we collect a simple random sample?
◦ Use a table of random numbers or a random number
generator.
Introduction 15
 Other kinds of probability samples (beyond the
scope of this course).
◦ systematic random sample.
 Choose the first element randomly, then every kth
observation, where k = N/n
◦ stratified random sample.
 The population is sub-divided based on a characteristic
and a simple random sample is conducted within each
stratum
◦ cluster sample
 First take a random sample of clusters from the
population of cluster. Then, a simple random sample
within each cluster. Example, election district, orchard.
Introduction 16
◦ Measures of Location
 Measures of central tendency: Mean; Median; Mode
 Measures of noncentral tendency - Quantiles
 Quartiles; Quintiles; Percentiles
◦ Measures of Dispersion
 Range
 Interquartile range
 Variance
 Standard Deviation
 Coefficient of Variation
◦ Measures of Shape
◦ Skewness
Descriptive Statistics I 17
 Measures of location place the data set on the scale
of real numbers.
 Measures of central tendency (i.e., central location)
help find the approximate center of the dataset.
 These include the mean, the median, and the
mode.
Descriptive Statistics I 18
 The sample mean is the sum of all the observations
(∑Xi) divided by the number of observations (n):
𝑋 = 𝑖=1
𝑛
𝑋𝑖
𝑛
where ΣXi = X1 + X2 + X3 + X4 + … + Xn
 Example. 1, 2, 2, 4, 5, 10. Calculate the mean.
Note: n = 6 (six observations)
∑Xi = 1 + 2+ 2+ 4 + 5 + 10 = 24
𝑋= 24 / 6 = 4.0
Descriptive Statistics I 19
 The median is the middle value of the ordered data
 To get the median, we must first rearrange the
data into an ordered array (in ascending or
descending order). Generally, we order the data
from the lowest value to the highest value.
 Therefore, the median is the data value such that
half of the observations are larger and half are
smaller. It is also the 50th percentile (we will be
learning about percentiles in a bit).
 If n is odd, the median is the middle observation of
the ordered array. If n is even, it is midway between
the two central observations.
Descriptive Statistics I 20
 The mode is the value of the data that
occurs with the greatest frequency.
Example. 1, 1, 1, 2, 3, 4, 5
Answer. The mode is 1 since it occurs three times. The
other values each appear only once in the data set.
Example. 5, 5, 5, 6, 8, 10, 10, 10.
Answer. The mode is: 5, 10.
There are two modes. This is a bi-modal dataset.
Descriptive Statistics I 21
 Quartiles split a set of ordered data into four parts.
◦ Imagine cutting a chocolate bar into four equal pieces… How
many cuts would you make? (yes, 3!)
 Q1 is the First Quartile
◦ 25% of the observations are smaller than Q1 and 75% of the
observations are larger
 Q2 is the Second Quartile
◦ 50% of the observations are smaller than Q2 and 50% of the
observations are larger. Same as the Median. It is also the 50th
percentile.
 Q3 is the Third Quartile
◦ 75% of the observations are smaller than Q3and 25% of the
observations are larger
Descriptive Statistics I 22
 Dispersion is the amount of spread, or
variability, in a set of data.
 Why do we need to look at measures of
dispersion?
 Consider this example:
A company is about to buy computer chips that must
have an average life of 10 years. The company has a
choice of two suppliers. Whose chips should they buy?
They take a sample of 10 chips from each of the
suppliers and test them. See the data on the next slide.
Descriptive Statistics I 23
We see that supplier B’s chips have a longer average life.
However, what if the company offers
a 3-year warranty?
Then, computers manufactured
using the chips from supplier A
will have no returns
while using supplier B will result in
4/10 or 40% returns.
Descriptive Statistics I 24
Supplier A chips
(life in years)
Supplier B chips
(life in years)
11 170
11 1
10 1
10 160
11 2
11 150
11 150
11 170
10 2
12 140
𝑋A = 10.8 years 𝑋 𝐵 = 94.6 years
MedianA = 11 years MedianB = 145 years
sA = 0.63 years sB = 80.6 years
RangeA = 2 years RangeB = 169 years
 We will study these five measures of
dispersion
◦ Range
◦ Interquartile Range
◦ Standard Deviation
◦ Variance
◦ Coefficient of Variation
Descriptive Statistics I 25
 Range = Largest Value – Smallest Value
Example: 1, 2, 3, 4, 5, 8, 9, 21, 25, 30
Answer: Range = 30 – 1 = 29.
 The range is simple to use and to explain to
others.
 One problem with the range is that it is
influenced by extreme values at either end.
Descriptive Statistics I 26
 IQR = Q3 – Q1
 Example (n = 15):
0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98
Q1 = 3, Q3 = 22
IQR = 22 – 3 = 19 (Range = 98)
 This is basically the range of the central 50% of
the observations in the distribution.
 Problem: The interquartile range does not take
into account the variability of the total data (only
the central 50%). We are “throwing out” half of
the data.
Descriptive Statistics I 27
 The standard deviation, s, measures a kind of
“average” deviation about the mean. It is not really
the “average” deviation, even though we may think
of it that way.
 Why can’t we simply compute the average deviation
about the mean, if that’s what we want?
𝑖=1
𝑛
(𝑋𝑖 − 𝑋)
𝑛
 If you take a simple mean, and then add up the
deviations about the mean, as above, this sum will
be equal to 0. Therefore, a measure of “average
deviation” will not work.
Descriptive Statistics I 28
 Instead, we use:
𝑠 = 𝑖=1
𝑛
(𝑋𝑖− 𝑋)2
𝑛−1
 This is the “definitional formula” for standard deviation.
 The standard deviation has lots of nice properties,
including:
◦ By squaring the deviation, we eliminate the problem of the
deviations summing to zero.
◦ In addition, this sum is a minimum. No other value subtracted
from X and squared will result in a smaller sum of the deviation
squared. This is called the “least squares property.”
 Note we divide by (n-1), not n. This will be referred to
as a loss of one degree of freedom.
Descriptive Statistics I 29
Example. Two data sets, X and Y. Which of
the two data sets has greater variability?
Calculate the standard deviation for each.
We note that both sets of data have the
same mean:
𝑋 = 3
𝑌 = 3
(continued…)
Descriptive Statistics I 30
Xi Yi
1 0
2 0
3 0
4 5
5 10
SX = 10
4
= 1.58
SY = 80
4
= = 4.47
[Check these results with your calculator.]
Descriptive Statistics I 31
X 𝑋 (X-𝑋) (X-𝑋)2
1 3 -2 4
2 3 -1 1
3 3 0 0
4 3 1 1
5 3 2 4
∑=0 10
Y 𝑌 (Y-𝑌) (Y- 𝑌)2
0 3 -3 9
0 3 -3 9
0 3 -3 9
5 3 2 4
10 3 7 49
∑=0 80
The variance, s2, is the standard deviation (s)
squared. Conversely, 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒.
Definitional formula: 𝑠2 = 𝑖=1
𝑛
𝑋 𝑖−𝑋
2
𝑛−1
Computational formula: 𝑠2
=
𝑖=1
𝑛
𝑋 𝑖
2− 𝑖=1
𝑛 𝑋𝑖
2
𝑛
𝑛−1
This is what computer software
(e.g., MS Excel or your calculator key) uses.
Descriptive Statistics I 32
We see that supplier B’s chips have a longer average life.
However, what if the company offers
a 3-year warranty?
Then, computers manufactured
using the chips from supplier A
will have no returns
while using supplier B will result in
4/10 or 40% returns.
Descriptive Statistics I 33
Supplier A chips
(life in years)
Supplier B chips
(life in years)
11 170
11 1
10 1
10 160
11 2
11 150
11 150
11 170
10 2
12 140
𝑋A = 10.8 years 𝑋 𝐵 = 94.6 years
MedianA = 11 years MedianB = 145 years
sA = 0.63 years sB = 80.6 years
RangeA = 2 years RangeB = 169 years
 A sample space is the set of all possible
outcomes of an experiment.
 A random variable is a rule for associating a
number with each element in a sample space.
 Suppose there are 8 balls in a bag. The
random variable X is the weight, in kg, of a
ball selected at random. Balls 1, 2 and 3
weigh 0.1kg, balls 4 and 5 weigh 0.15kg and
balls 6, 7 and 8 weigh 0.2kg
34
 There are two types of random variables:
◦ A Discrete random variable can take on only
specified, distinct values.
◦ A Continuous random variable can take on any
value within an interval.
 Thus, there are also two types of probability
distributions:
◦ Discrete probability distributions
◦ Continuous probability distributions
Probability Distributions 35
 Called a Probability density function. The
probability is interpreted as "area under the
curve."
 1) The random variable takes on an infinite # of
values within a given interval
 2) the probability that X = any particular value is
0. Consequently, we talk about intervals. The
probability is = to the area under the curve.
 3) The area under the whole curve = 1.
Normal Distribution 36
 Probabilities are obtained by getting the area
under the curve inside of a particular interval.
The area under the curve = the proportion of
times under identical (repeated) conditions that a
particular range of values will occur.
 3 Characteristics of the Normal distribution:
◦ It is symmetric about the mean μ.
◦ Mean = median = mode. [“bell-shaped” curve]
◦ f(X) decreases as X gets farther and farther away from
the mean. It approaches horizontal axis asymptotically:
- ∞ < X < + ∞. This means that there is always some
probability (area) for extreme values.
Normal Distribution 37
 The probability density function for the normal
distribution:
𝑓 𝑋 =
1
σ 2π
𝑒
−
1
2
𝑋−μ
σ
2
X
f(X) the height of the curve, represents the relative
frequency at which the corresponding values
occur.
Normal Distribution 38
Note that the normal distribution is defined by
two parameters, μ and σ . You can draw a
normal distribution for any μ and σ
combination. There is one normal distribution,
Z, that is special. It has a μ = 0 and a σ = 1.
This is the Z distribution, also called the
standard normal distribution. It is one of
trillions of normal distributions we could have
selected.
Normal Distribution 39
 Any normal distribution can be converted into a standard normal
distribution by transforming the normal random variable into the
standard normal random variable:
𝑍 =
𝑋 − μ
σ
 This is called standardizing the data. It will result in (transformed) data
with μ = 0 and σ = 1.
 The areas under the curve for the Standard Normal Distribution (Z) has
been computed and tabled. See, for example
http://www.statsoft.com/textbook/distribution-tables/#z
 Please note that you may find different tables for the Z-distribution. The
table we use here gives you the area under the curve from 0 to z. Some
books provide a slightly different table, one that gives you the area in
the tail. If you check the diagram that is usually shown above the table,
you can determine which table you have. In the table on the next slide,
the area from 0 to z is shaded so you know that you are getting the area
from 0 to z. Also, note that table value can never be more than .5000.
The area from 0 to infinity is .5000.
Normal Distribution 40
 Estimation
 Hypothesis Testing
Both activities use sample statistics (for
example, X̅) to make inferences about a
population parameter (μ).
Estimation 41
 Why don’t we just use a single number (a point
estimate) like, say, X̅ to estimate a population
parameter, μ?
 The problem with using a single point (or value) is
that it will very probably be wrong. In fact, with a
continuous random variable, the probability that the
variable is equal to a particular value is zero. So,
P(X̅=μ) = 0.
 This is why we use an interval estimator.
 We can examine the probability that the interval
includes the population parameter.
Estimation 42
 How wide should the interval be? That depends upon how
much confidence you want in the estimate.
 For instance, say you wanted a confidence interval
estimator for the mean income of a college graduate:
 The wider the interval, the greater the confidence you will
have in it as containing the true population parameter μ.
Estimation 43
You might have That the mean income is between
100% confidence $0 and $∞
95% confidence $35,000 and $41,000
90% confidence $36,000 and $40,000
80% confidence $37,500 and $38,500
… …
0% confidence $38,000 (a point estimate)
 To construct a confidence interval estimator
of μ, we use:
X̅ ± Zα σ /√n (1-α) confidence
where we get Zα from the Z table.
 When n≥30, we use s as an estimator of σ.
Estimation 44
 To be more precise, the α is split in half since
we are constructing a two-sided confidence
interval. However, for the sake of simplicity,
we call the z-value Zα rather than Za/2 .
Estimation 45
-Z/2 Z/2
/2 /2
 You work for a company that makes smart TVs,
and your boss asks you to determine with
certainty the exact life of a smart TV. She tells
you to take a random sample of 100 TVs.
 What is the exact life of a smart TV made by this
company?
Sample Evidence:
n = 100
X̅ = 11.50 years
s = 2.50 years
Estimation 46
 Since your boss has asked for 100% confidence, the
only answer you can accurately provide is: -∞ to + ∞
years.
 After you are fired, perhaps you can get your job
back by explaining to your boss that statisticians
cannot work with 100% confidence if they are working
with data from a sample. If you want 100%
confidence, you must take a census. With a sample,
you can never be absolutely certain as to the value of
the population parameter.
 This is exactly what statistical inference is: Using
sample statistics to draw conclusions (e.g., estimates)
about population parameters.
Estimation 47
n = 100
X̅ = 11.50 years
S = 2.50 years
at 95% confidence:
11.50 ± 1.96*(2.50/√100)
11.50 ± 1.96*(.25)
11.50 ± .49
The 95% CIE is: 11.01 years ---- 11.99 years
Estimation 48
 We are 95% certain that the interval from 11.01
years to 11.99 years contains the true population
parameter, μ.
 Another way to put this is, in 95 out of 100
samples, the population mean would lie in
intervals constructed by the same procedure
(same n and same α).
 Remember – the population parameter (μ ) is
fixed, it is not a random variable. Thus, it is
incorrect to say that there is a 95% chance that
the population mean will “fall” in this interval.
Estimation 49
The sample:
n = 100
X̅ = 18 years
s = 4 years
 Construct a confidence interval estimator
(CIE) of the true population mean life (µ), at
each of the following levels of confidence:
◦ (a)100% (b) 99% (c) 95% (d) 90% (e) 68%
Estimation 50
 In this problem we use s as an unbiased estimator
of σ: E(s) = σ
 σ = s =
 95% Confidence Interval Estimator:
Estimation 51
(a) 100% Confidence
[α = 0, Zα = ∞]
100% CIE: −∞ years ↔ +∞ years
(b) 99% Confidence
α = .01, Zα = 2.575 (from Z table)
18 ± 2.575 (4/√100)
18 ± 1.03
99% CIE: 16.97 years ↔ 19.03 years
(c) 95% Confidence
α = .05, Zα = 1.96 (from Z table)
18 ± 1.96 (4/√100)
18 ± 0.78
95% CIE: 17.22 years ↔ 18.78 years
Estimation 52
(d) 90% Confidence
α = .10, Zα = 1.645 (from Z table)
18 ± 1.645 (4/√100)
18 ± 0.66
90% CIE: 17.34 years ↔ 18.66 years
(e) 68% Confidence
α = .32, Zα =1.0 (from Z table)
18 ± 1.0 (4/√100)
18 ± 0.4
68% CIE: 17.60 years ↔ 18.40 years
Estimation 53
 How can we keep the same level of confidence and
still construct a narrower CIE?
 Let’s look at the formula one more time: X̅ ± Zασ/√n
 The sample mean is in the center. The more
confidence you want, the higher the value of Z, the
larger the half-width of the interval.
 The larger the sample size, the smaller the half-
width, since we divide by √n.
 So, what can we do? If you want a narrower interval,
take a larger sample.
 What about a smaller standard deviation? Of course, this
depends on the variability of the population. However, a more
efficient sampling procedure (e.g., stratification) may help. That
topic is for a more advanced statistics course.
Estimation 54
 Once you are working with a sample, not the
entire population, you cannot be 100% certain of
population parameters. If you need to know the
value of a parameter certainty, take a census.
 The more confidence you want to have in the
estimator, the larger the interval is going to be.
 Traditionally, statisticians work with 95%
confidence. However, you should be able to use
the Z-table to construct a CIE at any level of
confidence.
Estimation 55

More Related Content

What's hot

Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statisticsalbertlaporte
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statisticsakbhanj
 
Statistics
StatisticsStatistics
Statisticsitutor
 
Statistics
StatisticsStatistics
Statistics
christianjustine
 
Descriptive statistics and Inferential Statistics
Descriptive statistics and Inferential StatisticsDescriptive statistics and Inferential Statistics
Descriptive statistics and Inferential Statistics
Shayan Zahid
 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
Muhammad Musawar Ali
 
statistic
statisticstatistic
statistic
Pwalmiki
 
The chi square test of indep of categorical variables
The chi square test of indep of categorical variablesThe chi square test of indep of categorical variables
The chi square test of indep of categorical variables
Regent University
 
Math 102- Statistics
Math 102- StatisticsMath 102- Statistics
Math 102- Statistics
Zahra Zulaikha
 
Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)
HennaAnsari
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
Dr. Senthilvel Vasudevan
 
z-scores
z-scoresz-scores
1.2 types of data
1.2 types of data1.2 types of data
1.2 types of data
Long Beach City College
 
Testing Hypothesis
Testing HypothesisTesting Hypothesis
Testing Hypothesis
Azmi Mohd Tamil
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
Saurav Shrestha
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inference
vasu Chemistry
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Murugesan Kandan
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
AhmedToheed3
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theoremVijeesh Soman
 

What's hot (20)

Introduction To Statistics
Introduction To StatisticsIntroduction To Statistics
Introduction To Statistics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Statistics
StatisticsStatistics
Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Descriptive statistics and Inferential Statistics
Descriptive statistics and Inferential StatisticsDescriptive statistics and Inferential Statistics
Descriptive statistics and Inferential Statistics
 
Quantitative Data analysis
Quantitative Data analysisQuantitative Data analysis
Quantitative Data analysis
 
statistic
statisticstatistic
statistic
 
The chi square test of indep of categorical variables
The chi square test of indep of categorical variablesThe chi square test of indep of categorical variables
The chi square test of indep of categorical variables
 
Math 102- Statistics
Math 102- StatisticsMath 102- Statistics
Math 102- Statistics
 
Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)Basics of Educational Statistics (Inferential statistics)
Basics of Educational Statistics (Inferential statistics)
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
z-scores
z-scoresz-scores
z-scores
 
1.2 types of data
1.2 types of data1.2 types of data
1.2 types of data
 
Testing Hypothesis
Testing HypothesisTesting Hypothesis
Testing Hypothesis
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Ppt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inferencePpt for 1.1 introduction to statistical inference
Ppt for 1.1 introduction to statistical inference
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Histogram
HistogramHistogram
Histogram
 
Business statistics (Basics)
Business statistics (Basics)Business statistics (Basics)
Business statistics (Basics)
 
Central limit theorem
Central limit theoremCentral limit theorem
Central limit theorem
 

Similar to Statistics

statics engineering mechanics slides.pdf
statics engineering mechanics slides.pdfstatics engineering mechanics slides.pdf
statics engineering mechanics slides.pdf
AurangzebRashidMasud2
 
Sampling
 Sampling Sampling
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
ObsaHassanMohamed
 
Sampling
SamplingSampling
Sampling
praveen3030
 
Qt business statistics-lesson1-2013
Qt business statistics-lesson1-2013Qt business statistics-lesson1-2013
Qt business statistics-lesson1-2013sonu kumar
 
Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptx
ABCraftsman
 
Introduction to statistics 2013
Introduction to statistics 2013Introduction to statistics 2013
Introduction to statistics 2013Mohammad Ihmeidan
 
Statistics for management
Statistics for managementStatistics for management
Statistics for managementJohn Prarthan
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
AnkurTiwari813070
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
GIEDEEAM SOLAR and Gajanana Publications, LIC
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
chezannedolor2
 
1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
Long Beach City College
 
SAMPLING.pptx
SAMPLING.pptxSAMPLING.pptx
SAMPLING Theory.ppt
SAMPLING Theory.pptSAMPLING Theory.ppt
SAMPLING Theory.ppt
ssuser72a1812
 
Data collection and_sampling sample an method
Data collection and_sampling sample an methodData collection and_sampling sample an method
Data collection and_sampling sample an methodNaume Jnfajeven
 
Sampling as data collection
Sampling as data collectionSampling as data collection
Sampling as data collection
Naume Jnfajeven
 
Introduction.pdf
Introduction.pdfIntroduction.pdf
Introduction.pdf
MuhammadFaizan389
 
Basics of Research Methodology- Part-II.ppt
Basics of Research Methodology- Part-II.pptBasics of Research Methodology- Part-II.ppt
Basics of Research Methodology- Part-II.ppt
Pratibha Jagtap
 

Similar to Statistics (20)

statics engineering mechanics slides.pdf
statics engineering mechanics slides.pdfstatics engineering mechanics slides.pdf
statics engineering mechanics slides.pdf
 
Sampling
 Sampling Sampling
Sampling
 
chapter 1.pptx
chapter 1.pptxchapter 1.pptx
chapter 1.pptx
 
Sampling
SamplingSampling
Sampling
 
Chapter 1
Chapter 1Chapter 1
Chapter 1
 
Qt business statistics-lesson1-2013
Qt business statistics-lesson1-2013Qt business statistics-lesson1-2013
Qt business statistics-lesson1-2013
 
Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptx
 
Introduction to statistics 2013
Introduction to statistics 2013Introduction to statistics 2013
Introduction to statistics 2013
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
Sampaling
SampalingSampaling
Sampaling
 
IDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notesIDS-Unit-II. bachelor of computer applicatio notes
IDS-Unit-II. bachelor of computer applicatio notes
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
 
chap1.ppt
chap1.pptchap1.ppt
chap1.ppt
 
1.1 statistical and critical thinking
1.1 statistical and critical thinking1.1 statistical and critical thinking
1.1 statistical and critical thinking
 
SAMPLING.pptx
SAMPLING.pptxSAMPLING.pptx
SAMPLING.pptx
 
SAMPLING Theory.ppt
SAMPLING Theory.pptSAMPLING Theory.ppt
SAMPLING Theory.ppt
 
Data collection and_sampling sample an method
Data collection and_sampling sample an methodData collection and_sampling sample an method
Data collection and_sampling sample an method
 
Sampling as data collection
Sampling as data collectionSampling as data collection
Sampling as data collection
 
Introduction.pdf
Introduction.pdfIntroduction.pdf
Introduction.pdf
 
Basics of Research Methodology- Part-II.ppt
Basics of Research Methodology- Part-II.pptBasics of Research Methodology- Part-II.ppt
Basics of Research Methodology- Part-II.ppt
 

Recently uploaded

Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
Intella Parts
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
Vijay Dialani, PhD
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
gerogepatton
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Teleport Manpower Consultant
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
BrazilAccount1
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
zwunae
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
obonagu
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
aqil azizi
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
Divya Somashekar
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
veerababupersonal22
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
thanhdowork
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
symbo111
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
SamSarthak3
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
ChristineTorrepenida1
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
manasideore6
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
top1002
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Soumen Santra
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
fxintegritypublishin
 

Recently uploaded (20)

Forklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella PartsForklift Classes Overview by Intella Parts
Forklift Classes Overview by Intella Parts
 
ML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptxML for identifying fraud using open blockchain data.pptx
ML for identifying fraud using open blockchain data.pptx
 
Immunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary AttacksImmunizing Image Classifiers Against Localized Adversary Attacks
Immunizing Image Classifiers Against Localized Adversary Attacks
 
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdfTop 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
Top 10 Oil and Gas Projects in Saudi Arabia 2024.pdf
 
English lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdfEnglish lab ppt no titlespecENG PPTt.pdf
English lab ppt no titlespecENG PPTt.pdf
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单专业办理
 
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
在线办理(ANU毕业证书)澳洲国立大学毕业证录取通知书一模一样
 
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdfTutorial for 16S rRNA Gene Analysis with QIIME2.pdf
Tutorial for 16S rRNA Gene Analysis with QIIME2.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
block diagram and signal flow graph representation
block diagram and signal flow graph representationblock diagram and signal flow graph representation
block diagram and signal flow graph representation
 
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSCW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERS
 
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Hori...
 
Building Electrical System Design & Installation
Building Electrical System Design & InstallationBuilding Electrical System Design & Installation
Building Electrical System Design & Installation
 
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdfAKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
AKS UNIVERSITY Satna Final Year Project By OM Hardaha.pdf
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Unbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptxUnbalanced Three Phase Systems and circuits.pptx
Unbalanced Three Phase Systems and circuits.pptx
 
Fundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptxFundamentals of Electric Drives and its applications.pptx
Fundamentals of Electric Drives and its applications.pptx
 
Basic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparelBasic Industrial Engineering terms for apparel
Basic Industrial Engineering terms for apparel
 
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTSHeap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
Heap Sort (SS).ppt FOR ENGINEERING GRADUATES, BCA, MCA, MTECH, BSC STUDENTS
 
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdfHybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdf
 

Statistics

  • 1.
  • 2.  Form Latin word ‘Statis’ means ‘Political State’.  Science of Uncertainty  It Deals with what could be, what might be or what probably is. The basis to verify theories and laws in every discipline. Overall, it is a method which deals with numerical facts and figures.
  • 3.  The Indian army is going to grow by 9-10% per annum in coming 5 yrs.  The male female ratio in India is 972 as per 2001 census.  Indian population is growing by 2% every year.  Attendance of a student should be 75% for appearing in exams. And many more……
  • 4.  What is Science?  Originated form Latin word “Scientia” meaning knowledge.  Knowledge attained through study or practice.  Knowledge covering general truths of the operation of generals laws (esp obtained and tested through scientific method) and concerned with physical world.
  • 5.  Statistics is not a body of substantive knowledge, but a body of methods for obtaining knowledge.  It can be accepted as scientific method than a complete science. Scientific Methods:-Classifies facts, sees their mutual relation through experimentation, observation, logical arguments from accepted postulates
  • 7.  Population. Universe. The entire category under consideration. This is the data which we have not completely examined but to which our conclusions refer. The population size is usually indicated by a capital N. ◦ Examples: every user of twitter; all female user of facebook.  Sample. That portion of the population that is available, or to be made available, for analysis. A good sample is representative of the population. We will learn about probability samples and how they provide assurance that a sample is indeed representative. The sample size is shown as lower case n. ◦ If your company manufactures one million laptops, they might take a sample of say, 500, of them to test quality. The population size is N = 1,000,000 and the sample size is n= 500. Introduction 7
  • 8. Parameter. A characteristic of a population. The population mean, µ and the population standard deviation, σ, are two examples of population parameters. If you want to determine the population parameters, you have to take a census of the entire population. Taking a census is very costly. Statistic. A statistic is a measure that is derived from the sample data. For example, the sample mean, 𝑋, and the sample standard deviation, s, are statistics. They are used to estimate the population parameters. Introduction 8
  • 10. Example of statistical inference from quality control:  GE manufactures LED bulbs and wants to know how many are defective. Suppose one million bulbs a year are produced in its new plant in Staten Island. The company might sample, say, 500 bulbs to estimate the proportion of defectives. ◦ N = 1,000,000 and n = 500 ◦ If 5 out of 500 bulbs tested are defective, the sample proportion of defectives will be 1% (5/500). This statistic may be used to estimate the true proportion of defective bulbs (the population proportion). ◦ In this case, the sample proportion is used to make inferences about the population proportion. Introduction 10
  • 11.  Descriptive Statistics. Those statistics that summarize a sample of numerical data in terms of averages and other measures for the purpose of description, such as the mean and standard deviation. ◦ Descriptive statistics, as opposed to inferential statistics, are not concerned with the theory and methodology for drawing inferences that extend beyond the particular set of data examined, in other words from the sample to the entire population. All that we care about are the summary measurements such as the average (mean). ◦ Thus, a teacher who gives a class, of say, 35 students, an exam is interested in the descriptive statistics to assess the performance of the class. What was the class average, the median grade, the standard deviation, etc.? The teacher is not interested in making any inferences to some larger population. ◦ This includes the presentation of data in the form of graphs, charts, and tables. Introduction 11
  • 12.  Primary data. This is data that has been compiled by the researcher using such techniques as surveys, experiments, depth interviews, observation, focus groups.  Types of surveys. A lot of data is obtained using surveys. Each survey type has advantages and disadvantages. ◦ Mail: lowest rate of response; usually the lowest cost ◦ Personally administered: can “probe”; most costly; interviewer effects (the interviewer might influence the response) ◦ Telephone: fastest ◦ Web: fast and inexpensive Introduction 12
  • 13.  Secondary data. This is data that has been compiled or published elsewhere, e.g., census data. ◦ The trick is to find data that is useful. The data was probably collected for some purpose other than helping to solve the researcher’s problem at hand. ◦ Advantages: It can be gathered quickly and inexpensively. It enables researchers to build on past research. ◦ Problems: Data may be outdated. Variation in definition of terms. Different units of measurement. May not be accurate (e.g., census undercount). Introduction 13
  • 14.  Nonprobability Samples – based on convenience or judgment ◦ Convenience (or chunk) sample - students in a class, mall intercept ◦ Judgment sample - based on the researcher’s judgment as to what constitutes “representativeness” e.g., he/she might say these 20 stores are representative of the whole chain. ◦ Quota sample - interviewers are given quotas based on demographics for instance, they may each be told to interview 100 subjects – 50 males and 50 females. Of the 50, say, 10 nonwhite and 40 white.  The problem with a nonprobability sample is that we do not know how representative our sample is of the population. Introduction 14
  • 15.  Probability Sample. A sample collected in such a way that every element in the population has a known chance of being selected.  One type of probability sample is a Simple Random Sample. This is a sample collected in such a way that every element in the population has an equal chance of being selected.  How do we collect a simple random sample? ◦ Use a table of random numbers or a random number generator. Introduction 15
  • 16.  Other kinds of probability samples (beyond the scope of this course). ◦ systematic random sample.  Choose the first element randomly, then every kth observation, where k = N/n ◦ stratified random sample.  The population is sub-divided based on a characteristic and a simple random sample is conducted within each stratum ◦ cluster sample  First take a random sample of clusters from the population of cluster. Then, a simple random sample within each cluster. Example, election district, orchard. Introduction 16
  • 17. ◦ Measures of Location  Measures of central tendency: Mean; Median; Mode  Measures of noncentral tendency - Quantiles  Quartiles; Quintiles; Percentiles ◦ Measures of Dispersion  Range  Interquartile range  Variance  Standard Deviation  Coefficient of Variation ◦ Measures of Shape ◦ Skewness Descriptive Statistics I 17
  • 18.  Measures of location place the data set on the scale of real numbers.  Measures of central tendency (i.e., central location) help find the approximate center of the dataset.  These include the mean, the median, and the mode. Descriptive Statistics I 18
  • 19.  The sample mean is the sum of all the observations (∑Xi) divided by the number of observations (n): 𝑋 = 𝑖=1 𝑛 𝑋𝑖 𝑛 where ΣXi = X1 + X2 + X3 + X4 + … + Xn  Example. 1, 2, 2, 4, 5, 10. Calculate the mean. Note: n = 6 (six observations) ∑Xi = 1 + 2+ 2+ 4 + 5 + 10 = 24 𝑋= 24 / 6 = 4.0 Descriptive Statistics I 19
  • 20.  The median is the middle value of the ordered data  To get the median, we must first rearrange the data into an ordered array (in ascending or descending order). Generally, we order the data from the lowest value to the highest value.  Therefore, the median is the data value such that half of the observations are larger and half are smaller. It is also the 50th percentile (we will be learning about percentiles in a bit).  If n is odd, the median is the middle observation of the ordered array. If n is even, it is midway between the two central observations. Descriptive Statistics I 20
  • 21.  The mode is the value of the data that occurs with the greatest frequency. Example. 1, 1, 1, 2, 3, 4, 5 Answer. The mode is 1 since it occurs three times. The other values each appear only once in the data set. Example. 5, 5, 5, 6, 8, 10, 10, 10. Answer. The mode is: 5, 10. There are two modes. This is a bi-modal dataset. Descriptive Statistics I 21
  • 22.  Quartiles split a set of ordered data into four parts. ◦ Imagine cutting a chocolate bar into four equal pieces… How many cuts would you make? (yes, 3!)  Q1 is the First Quartile ◦ 25% of the observations are smaller than Q1 and 75% of the observations are larger  Q2 is the Second Quartile ◦ 50% of the observations are smaller than Q2 and 50% of the observations are larger. Same as the Median. It is also the 50th percentile.  Q3 is the Third Quartile ◦ 75% of the observations are smaller than Q3and 25% of the observations are larger Descriptive Statistics I 22
  • 23.  Dispersion is the amount of spread, or variability, in a set of data.  Why do we need to look at measures of dispersion?  Consider this example: A company is about to buy computer chips that must have an average life of 10 years. The company has a choice of two suppliers. Whose chips should they buy? They take a sample of 10 chips from each of the suppliers and test them. See the data on the next slide. Descriptive Statistics I 23
  • 24. We see that supplier B’s chips have a longer average life. However, what if the company offers a 3-year warranty? Then, computers manufactured using the chips from supplier A will have no returns while using supplier B will result in 4/10 or 40% returns. Descriptive Statistics I 24 Supplier A chips (life in years) Supplier B chips (life in years) 11 170 11 1 10 1 10 160 11 2 11 150 11 150 11 170 10 2 12 140 𝑋A = 10.8 years 𝑋 𝐵 = 94.6 years MedianA = 11 years MedianB = 145 years sA = 0.63 years sB = 80.6 years RangeA = 2 years RangeB = 169 years
  • 25.  We will study these five measures of dispersion ◦ Range ◦ Interquartile Range ◦ Standard Deviation ◦ Variance ◦ Coefficient of Variation Descriptive Statistics I 25
  • 26.  Range = Largest Value – Smallest Value Example: 1, 2, 3, 4, 5, 8, 9, 21, 25, 30 Answer: Range = 30 – 1 = 29.  The range is simple to use and to explain to others.  One problem with the range is that it is influenced by extreme values at either end. Descriptive Statistics I 26
  • 27.  IQR = Q3 – Q1  Example (n = 15): 0, 0, 2, 3, 4, 7, 9, 12, 17, 18, 20, 22, 45, 56, 98 Q1 = 3, Q3 = 22 IQR = 22 – 3 = 19 (Range = 98)  This is basically the range of the central 50% of the observations in the distribution.  Problem: The interquartile range does not take into account the variability of the total data (only the central 50%). We are “throwing out” half of the data. Descriptive Statistics I 27
  • 28.  The standard deviation, s, measures a kind of “average” deviation about the mean. It is not really the “average” deviation, even though we may think of it that way.  Why can’t we simply compute the average deviation about the mean, if that’s what we want? 𝑖=1 𝑛 (𝑋𝑖 − 𝑋) 𝑛  If you take a simple mean, and then add up the deviations about the mean, as above, this sum will be equal to 0. Therefore, a measure of “average deviation” will not work. Descriptive Statistics I 28
  • 29.  Instead, we use: 𝑠 = 𝑖=1 𝑛 (𝑋𝑖− 𝑋)2 𝑛−1  This is the “definitional formula” for standard deviation.  The standard deviation has lots of nice properties, including: ◦ By squaring the deviation, we eliminate the problem of the deviations summing to zero. ◦ In addition, this sum is a minimum. No other value subtracted from X and squared will result in a smaller sum of the deviation squared. This is called the “least squares property.”  Note we divide by (n-1), not n. This will be referred to as a loss of one degree of freedom. Descriptive Statistics I 29
  • 30. Example. Two data sets, X and Y. Which of the two data sets has greater variability? Calculate the standard deviation for each. We note that both sets of data have the same mean: 𝑋 = 3 𝑌 = 3 (continued…) Descriptive Statistics I 30 Xi Yi 1 0 2 0 3 0 4 5 5 10
  • 31. SX = 10 4 = 1.58 SY = 80 4 = = 4.47 [Check these results with your calculator.] Descriptive Statistics I 31 X 𝑋 (X-𝑋) (X-𝑋)2 1 3 -2 4 2 3 -1 1 3 3 0 0 4 3 1 1 5 3 2 4 ∑=0 10 Y 𝑌 (Y-𝑌) (Y- 𝑌)2 0 3 -3 9 0 3 -3 9 0 3 -3 9 5 3 2 4 10 3 7 49 ∑=0 80
  • 32. The variance, s2, is the standard deviation (s) squared. Conversely, 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒. Definitional formula: 𝑠2 = 𝑖=1 𝑛 𝑋 𝑖−𝑋 2 𝑛−1 Computational formula: 𝑠2 = 𝑖=1 𝑛 𝑋 𝑖 2− 𝑖=1 𝑛 𝑋𝑖 2 𝑛 𝑛−1 This is what computer software (e.g., MS Excel or your calculator key) uses. Descriptive Statistics I 32
  • 33. We see that supplier B’s chips have a longer average life. However, what if the company offers a 3-year warranty? Then, computers manufactured using the chips from supplier A will have no returns while using supplier B will result in 4/10 or 40% returns. Descriptive Statistics I 33 Supplier A chips (life in years) Supplier B chips (life in years) 11 170 11 1 10 1 10 160 11 2 11 150 11 150 11 170 10 2 12 140 𝑋A = 10.8 years 𝑋 𝐵 = 94.6 years MedianA = 11 years MedianB = 145 years sA = 0.63 years sB = 80.6 years RangeA = 2 years RangeB = 169 years
  • 34.  A sample space is the set of all possible outcomes of an experiment.  A random variable is a rule for associating a number with each element in a sample space.  Suppose there are 8 balls in a bag. The random variable X is the weight, in kg, of a ball selected at random. Balls 1, 2 and 3 weigh 0.1kg, balls 4 and 5 weigh 0.15kg and balls 6, 7 and 8 weigh 0.2kg 34
  • 35.  There are two types of random variables: ◦ A Discrete random variable can take on only specified, distinct values. ◦ A Continuous random variable can take on any value within an interval.  Thus, there are also two types of probability distributions: ◦ Discrete probability distributions ◦ Continuous probability distributions Probability Distributions 35
  • 36.  Called a Probability density function. The probability is interpreted as "area under the curve."  1) The random variable takes on an infinite # of values within a given interval  2) the probability that X = any particular value is 0. Consequently, we talk about intervals. The probability is = to the area under the curve.  3) The area under the whole curve = 1. Normal Distribution 36
  • 37.  Probabilities are obtained by getting the area under the curve inside of a particular interval. The area under the curve = the proportion of times under identical (repeated) conditions that a particular range of values will occur.  3 Characteristics of the Normal distribution: ◦ It is symmetric about the mean μ. ◦ Mean = median = mode. [“bell-shaped” curve] ◦ f(X) decreases as X gets farther and farther away from the mean. It approaches horizontal axis asymptotically: - ∞ < X < + ∞. This means that there is always some probability (area) for extreme values. Normal Distribution 37
  • 38.  The probability density function for the normal distribution: 𝑓 𝑋 = 1 σ 2π 𝑒 − 1 2 𝑋−μ σ 2 X f(X) the height of the curve, represents the relative frequency at which the corresponding values occur. Normal Distribution 38
  • 39. Note that the normal distribution is defined by two parameters, μ and σ . You can draw a normal distribution for any μ and σ combination. There is one normal distribution, Z, that is special. It has a μ = 0 and a σ = 1. This is the Z distribution, also called the standard normal distribution. It is one of trillions of normal distributions we could have selected. Normal Distribution 39
  • 40.  Any normal distribution can be converted into a standard normal distribution by transforming the normal random variable into the standard normal random variable: 𝑍 = 𝑋 − μ σ  This is called standardizing the data. It will result in (transformed) data with μ = 0 and σ = 1.  The areas under the curve for the Standard Normal Distribution (Z) has been computed and tabled. See, for example http://www.statsoft.com/textbook/distribution-tables/#z  Please note that you may find different tables for the Z-distribution. The table we use here gives you the area under the curve from 0 to z. Some books provide a slightly different table, one that gives you the area in the tail. If you check the diagram that is usually shown above the table, you can determine which table you have. In the table on the next slide, the area from 0 to z is shaded so you know that you are getting the area from 0 to z. Also, note that table value can never be more than .5000. The area from 0 to infinity is .5000. Normal Distribution 40
  • 41.  Estimation  Hypothesis Testing Both activities use sample statistics (for example, X̅) to make inferences about a population parameter (μ). Estimation 41
  • 42.  Why don’t we just use a single number (a point estimate) like, say, X̅ to estimate a population parameter, μ?  The problem with using a single point (or value) is that it will very probably be wrong. In fact, with a continuous random variable, the probability that the variable is equal to a particular value is zero. So, P(X̅=μ) = 0.  This is why we use an interval estimator.  We can examine the probability that the interval includes the population parameter. Estimation 42
  • 43.  How wide should the interval be? That depends upon how much confidence you want in the estimate.  For instance, say you wanted a confidence interval estimator for the mean income of a college graduate:  The wider the interval, the greater the confidence you will have in it as containing the true population parameter μ. Estimation 43 You might have That the mean income is between 100% confidence $0 and $∞ 95% confidence $35,000 and $41,000 90% confidence $36,000 and $40,000 80% confidence $37,500 and $38,500 … … 0% confidence $38,000 (a point estimate)
  • 44.  To construct a confidence interval estimator of μ, we use: X̅ ± Zα σ /√n (1-α) confidence where we get Zα from the Z table.  When n≥30, we use s as an estimator of σ. Estimation 44
  • 45.  To be more precise, the α is split in half since we are constructing a two-sided confidence interval. However, for the sake of simplicity, we call the z-value Zα rather than Za/2 . Estimation 45 -Z/2 Z/2 /2 /2
  • 46.  You work for a company that makes smart TVs, and your boss asks you to determine with certainty the exact life of a smart TV. She tells you to take a random sample of 100 TVs.  What is the exact life of a smart TV made by this company? Sample Evidence: n = 100 X̅ = 11.50 years s = 2.50 years Estimation 46
  • 47.  Since your boss has asked for 100% confidence, the only answer you can accurately provide is: -∞ to + ∞ years.  After you are fired, perhaps you can get your job back by explaining to your boss that statisticians cannot work with 100% confidence if they are working with data from a sample. If you want 100% confidence, you must take a census. With a sample, you can never be absolutely certain as to the value of the population parameter.  This is exactly what statistical inference is: Using sample statistics to draw conclusions (e.g., estimates) about population parameters. Estimation 47
  • 48. n = 100 X̅ = 11.50 years S = 2.50 years at 95% confidence: 11.50 ± 1.96*(2.50/√100) 11.50 ± 1.96*(.25) 11.50 ± .49 The 95% CIE is: 11.01 years ---- 11.99 years Estimation 48
  • 49.  We are 95% certain that the interval from 11.01 years to 11.99 years contains the true population parameter, μ.  Another way to put this is, in 95 out of 100 samples, the population mean would lie in intervals constructed by the same procedure (same n and same α).  Remember – the population parameter (μ ) is fixed, it is not a random variable. Thus, it is incorrect to say that there is a 95% chance that the population mean will “fall” in this interval. Estimation 49
  • 50. The sample: n = 100 X̅ = 18 years s = 4 years  Construct a confidence interval estimator (CIE) of the true population mean life (µ), at each of the following levels of confidence: ◦ (a)100% (b) 99% (c) 95% (d) 90% (e) 68% Estimation 50
  • 51.  In this problem we use s as an unbiased estimator of σ: E(s) = σ  σ = s =  95% Confidence Interval Estimator: Estimation 51
  • 52. (a) 100% Confidence [α = 0, Zα = ∞] 100% CIE: −∞ years ↔ +∞ years (b) 99% Confidence α = .01, Zα = 2.575 (from Z table) 18 ± 2.575 (4/√100) 18 ± 1.03 99% CIE: 16.97 years ↔ 19.03 years (c) 95% Confidence α = .05, Zα = 1.96 (from Z table) 18 ± 1.96 (4/√100) 18 ± 0.78 95% CIE: 17.22 years ↔ 18.78 years Estimation 52
  • 53. (d) 90% Confidence α = .10, Zα = 1.645 (from Z table) 18 ± 1.645 (4/√100) 18 ± 0.66 90% CIE: 17.34 years ↔ 18.66 years (e) 68% Confidence α = .32, Zα =1.0 (from Z table) 18 ± 1.0 (4/√100) 18 ± 0.4 68% CIE: 17.60 years ↔ 18.40 years Estimation 53
  • 54.  How can we keep the same level of confidence and still construct a narrower CIE?  Let’s look at the formula one more time: X̅ ± Zασ/√n  The sample mean is in the center. The more confidence you want, the higher the value of Z, the larger the half-width of the interval.  The larger the sample size, the smaller the half- width, since we divide by √n.  So, what can we do? If you want a narrower interval, take a larger sample.  What about a smaller standard deviation? Of course, this depends on the variability of the population. However, a more efficient sampling procedure (e.g., stratification) may help. That topic is for a more advanced statistics course. Estimation 54
  • 55.  Once you are working with a sample, not the entire population, you cannot be 100% certain of population parameters. If you need to know the value of a parameter certainty, take a census.  The more confidence you want to have in the estimator, the larger the interval is going to be.  Traditionally, statisticians work with 95% confidence. However, you should be able to use the Z-table to construct a CIE at any level of confidence. Estimation 55