BASIC STATISTICAL ANALYSIS USING EXCEL

What is Data ?
Data is a collection of facts or
information from which
conclusions may be drawn.

Types of Data
Qualitative or Attribute data - the
characteristic being studied is nonnumeric.
E.g.: Gender, religious affiliation, state of birth, country
representing, words, images, videos
Quantitative data - the characteristic being
studied is numeric
E.g.: time (in seconds) for 400 mts race, Prize money won
by a tennis player , or number of boundaries scored in
a match.

Quantitative data
Quantitative variables can be classified as either discrete or continuous.
Discrete variables: can only assume certain values
E.g.: the number of goals in a football match, or the number of wickets by
a bowler in a cricket match (1,2,3,…,etc.)
Continuous variable can assume any value within a specified range.
E.g.: the height of an athlete or the weight of a boxer.

Summary of Types of variables
Searching out published research results in libraries (or the
internet)
• This is an important early step of research
• The research process always includes synthesis and analysis
• But, just reviewing of literature is not research

Levels of Measurements
• Categorical: Nominal, Ordinal
• Scale: Interval, Ratio

Nominal-Level Data
Properties:
• Observations of a qualitative
variable can only be classified
and counted.
• There is no particular order
to the labels.

Ordinal-Level Data
Properties:
• Data classifications are
represented by sets of labels
or names (high, medium,
low) that have relative
values.
• Because of the relative
values, the data classified
can be ranked or ordered.

Interval-Level Data
Properties:
• Data classifications are ordered
according to the amount of the
characteristic they possess.
• Equal differences in the
characteristic are represented
by equal differences in the
measurements.

Ratio-Level Data
 Practically all quantitative data is recorded on the ratio level
of measurement.
 Ratio level is the “highest” level of measurement.
Properties:
• Data classifications are ordered according to the amount of the
characteristics they possess.
• Equal differences in the characteristic are represented by equal differences
in the numbers assigned to the classifications.
• The zero point is the absence of the characteristic and the ratio between
two numbers is meaningful.

Summary of Levels of Measurements

Why to Know the Level of Measurement of a Data?
• The level of measurement of the data dictates the calculations
that can be done to summarize and present the data.
• To determine the statistical tests that should be performed on
the data

Descriptive & Inferential Statistics
• Descriptive statistics uses the data to
provide descriptions of the population
/ sample, either through numerical
calculations or graphs or tables.
• Inferential statistics makes inferences
and predictions about a population
based on a sample of data taken from
the population.

Descriptive Statistics
Summarizing Data:
–Central Tendency (or Groups’ “Middle Values”)
• Mean
• Median
• Mode
–Variation (or Summary of Differences Within Groups)
• Range
• Interquartile Range
• Variance
• Standard Deviation

Choosing Summary Statistics
Which average and measure of spread?
Scale
Normally distributed
Mean
(Standard deviation)
Skewed data
Median
(Interquartile
range)
Categorical
Ordinal:
Median
(Interquartile
range)
Nominal:
Mode
(None)

Pivot Table
• A PivotTable is an extremely powerful tool that you can use to
slice and dice data.
• Summary tables can be created from hundreds of thousands of
data points.
• The tables can be changed dynamically to enable you to find
the different perspectives of the data.
• Graphical presentation can be made easily by Pivot Charts

Pivot Table
• PivotTableDataOnline.xlsx

Inferential Statistics
The methods of inferential statistics are
• the estimation of parameter(s)
• testing of Statistical hypothesis

Parameter and Statistics
• A measure calculated from population data is called
Parameter.
• A measure calculated from sample data is called Statistic.
Parameter Statistic
Size N n
Mean μ x̄
Standard deviation σ s
Proportion P p
Correlation coefficient ρ r

Estimation
Estimation is a process whereby we select a random sample from
a population and use a sample statistic to estimate a population
parameter.
There are two ways for estimation:
• Point Estimation
• Interval Estimation

Point Estimate
Point Estimate – A sample statistic used to estimate the exact
value of a population parameter.
• A point estimate is a single value and has the advantage of
being very precise but there is no information about its
reliability.
• The probability that a single sample statistic actually equal to
the parameter value is extremely small. For this reason point
estimation is rarely used.

Interval Estimate
Confidence interval (interval estimate) – A range of values
defined by the confidence level within which the population
parameter is estimated to fall.
• The interval estimate is less precise, but gives more
confidence.

Statistical Hypothesis
A Statistical hypothesis is an assumption or any logical statement
about the parameter of the population.
E.g.
• The average income of Indians in 2018 is 10500 Rs.
• Proportion of diabetic patients in Gujarat is not more than 10
%
• Faculties of Gujarat score better in IQ than faculties of
Maharashtra

Null hypothesis
A statistical hypothesis which is written for the possible
acceptance is called Null hypothesis. It is denoted by H0.
• In Null hypothesis if the parameter assumes specific value then
it is called Simple hypothesis.
E.g. 𝜇 = 280, P=0.10
• In Null hypothesis if the parameter assumes set of values then
it is called Composite hypothesis.
E.g. 𝜇 ≥ 280, P ≤ 0.10

Alternative Hypothesis
A statistical hypothesis which is complementary to the Null
hypothesis is called Alternative hypothesis. It is denoted by H1.

Testing of Hypothesis
The procedure to decide whether to accept or reject the null
hypothesis is called Testing of hypothesis.

Type I and Type II Error
• The error of rejecting the true null hypothesis is called Type I
error. The probability of type I error is denoted by 𝛼.
𝛼 = Prob [ Reject H0 / H0 is true]
• The error of accepting the false null hypothesis is called Type II
error. The probability of type II error is denoted by 𝛽.
𝛽= Prob [ Accept H0 / H0 is false]

Type I and Type II Error
DECISION
Null Hypothesis
TRUE FALSE
ACCEPT No Error Type II Error
REJECT Type I Error No Error

Level of Significance
The predetermined value of probability of type I error is called
level of significance. It is denoted by 𝛼.
The most commonly used level of significance are 1% or 5%.
Interpretation: 5% level of significance means in 5 out of 100
cases, it is likely to reject a true null hypothesis.

Critical Region
The area of the probability curve corresponding to 𝛼 is called
critical region.
i.e. the area under normal curve at which a true null hypothesis
is rejected is called area of rejection or critical region.
The remaining region under normal curve is called acceptance
region.

Power of Test
The probability of rejecting the false null hypothesis is called the
Power of the test.
It is denoted by 1- 𝛽.
i.e. 1- 𝛽 = Prob [ Reject H0 / H0 is false]

Test Statistics
If the sample size is more than or equal to 30, it is called a large
sample and if it is less than 30, it is called a small sample.
Different test statistic is used for testing of hypothesis based on
the size of the sample.
• For a large sample, test statistic z is used.
• For a small sample, test statistic t is used.

Steps of Testing of Hypothesis
• Step 1: Setting up Null hypothesis
• Step 2: Setting up Alternative hypothesis
• Step3: Calculating test statistics
• Step 4: Determining table value of test statistics
• Step 5: Conclusion
– If test statistics ≤ table value, Null hypothesis is Accepted
– If test statistics > table value, Null hypothesis is Rejected

Testing of Hypothesis
1. Testing single mean
2. Testing significant difference between two means
3. Testing single proportion
4. Testing significant difference between two proportions
5. Testing single standard deviation
6. Testing two standard deviations
7. Testing means for more than two samples
8. Testing standard deviations for more than two samples
9. Testing proportion for more than two samples
10. Testing for non-normal populations

Decision Tree for deciding Test

P value
• P-value ≡ the probability the test statistic would take a value as
extreme or more extreme than observed test statistic, when H0
is true
• Smaller-and-smaller P-values → stronger-and-stronger
evidence against H0
• For typical analysis, using the standard α = 0.05 cutoff, the null
hypothesis is
- rejected when p <= .05 and
- not rejected when p > .05.

Loading Data Analysis Tool Pack
• Go to Excel Options. (In Excel 2010 or 2013, click
the File tab and select Options. In Excel 2007,
click the Office button and select Excel Options.)
• Click the Add-Ins category.
• Click the Go button toward the bottom.
• Check the Analysis ToolPak item, as shown in
Figure.

Handout & Data
• Data Analysis Tool Pack.xlsx
• Handouts for Excel Data Analysis Tool Pack.docx

BASIC STATISTICAL ANALYSIS USING EXCEL

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to BASIC STATISTICAL ANALYSIS USING EXCEL

Similar to BASIC STATISTICAL ANALYSIS USING EXCEL (20)

More from Parag Shah

More from Parag Shah (16)

Recently uploaded

Recently uploaded (20)

BASIC STATISTICAL ANALYSIS USING EXCEL