SlideShare a Scribd company logo
1 of 42
Download to read offline
By
Dr.Atcharaporn Khoomtong
Elementary statistics 1
 Introduction
 Statistical methodology
 Step of scientific research
 Important parametric tests
 Important nonparametric tests
 Example using Excel program
 Using Excel for Statistics in Gateway
Cases – Office 2007
Elementary statistics 2
Most people become familiar with probability and
statistics through radios, television,newspapers and
magazines.For example,the following statements
were found in newspapers.
 Eating 10 grams(g) of fiber a day reduce the risk
of heart attack by 14%
 Thirty minutes (of exercise) two or three times
each week can raise HDLs 10 to 15%
Elementary statistics 3
 Statistics is used to analyze the results of
surveys and as a tool in scientific research to
make decisions based on controlled
experiments.
 Other uses of statistics include operations
research, quality control, estimation and
prediction.
Elementary statistics 4
What’s it?
Flower
 as the basis of data analysis are concerned with two
basic types of problems
(1) summarizing, describing, and exploring the data
(2) using sampled data to infer the nature of the
process which produced the data
Elementary statistics 6
This problems is covered by inferential statistics.
This problems is covered by descriptive statistics
 Statistics plays an important role in the
description of mass phenomena.
 Organized and summarized for clear
presentation for ease of communications.
 Data may come from studies of populations
or samples
 It offers methods to summarize a collection
of data. These methods may be numerical or
graphical, both of which have their own
advantages and disadvantages.
Elementary statistics 7
 Inferential statistics is used to draw
conclusions about a data set.
 Usually this means drawing inferences about
a population from a sample either by
estimating some relationships or by testing
some hypothesis.
Elementary statistics 8
A Population is the
set of all possible
states of a random
variable. The size of
the population may
be either infinite or
finite.
A Sample is a subset
of the population; its
size is always finite.
Descriptive Statistics
 Graphical
 Arrange data in tables
 Bar graphs and pie charts
 Numerical
 Percentages
 Averages
 Range
 Relationships
 Correlation coefficient
 Regression analysis
Inferential Statistics
 Confidence interval
 Compare means of two
samples
 t Test
 F -Test
 Compare means from
three samples
 Pre/post (LSD,DMRT)
 ANOVA = analysis of
variance
 F -Test
Elementary statistics 10
 Another important aspect of data analysis is the Data,
which can be of two different types:
 qualitative data ex. Sex, color, smell, taste etc.
 quantitative data ex. Height, weight, percentage etc.
 Qualitative data does not contain quantitative
information.
 Qualitative data can be classified into categories.
Elementary statistics 11
Type of Scale Possible Statements Allowed
Operators
Examples
nominal scale identity, countable =, ≠ colors, phone
numbers,
feelings
ordinal scale identity, less
than/greater than
relations, countable
=, ≠, <, > soccer league
table, military
ranks, energy
efficiency
classes
interval scale identity, less
than/greater than
relations, equality of
differences
=, ≠ , <,
>, +, -
dates (years),
temperature in
Celsius, IQ scale
ratio scale identity, less
than/greater than
relations, equality of
differences, equality
of ratios, zero point
=, ≠ , <,
>, +, -, *, /
velocities,
lengths,
temperatur in
Kelvin, age
Elementary statistics 12
Collecting the
necessary
facts Analyzing the facts
Making decisions
Carrying out
decisions
Assessing
the results
Descriptive Statistics
Inference Statistics
 Mode =The most frequent value
 Median =The value of the middle point of the ordered
measurements
 Mean =The average (balancing point in the distribution)
 Variance= The average of the squared deviations of all
the population measurements from the
population mean
 Standard deviation =The square root of the variance
 




2
2

Descriptive Formula
 
1
2
2



S
Inferential Formula
Called the “unbiased
estimator of the population
value”
Population of profit margins for five companies:
8%, 10%, 15%, 12%, 5%
         
   
611
5
58
5
2542504
5
52502
5
105101210151010108
22222
22222
2
.






%40636112
.. 
%10
5
50
5
51215108



 Hypothesis = a assumption or some supposition
to be proved or disproved.
“the automobile A is performing
as well as
automobile B.”
 Null hypothesis (H0 ) =expresses no difference
 Alternative hypothesis (H1 )
Elementary statistics 17
H0:  = 0
Often said
“H naught” Or any number
Later…….
H0: 1 = 2
H0:  = 0; Null Hypothesis
HA:  = 0; Alternative Hypothesis
Type I error(α) :
reject H0 | H0 true
Elementary statistics 18
Type II error(β) :
 Accept H0 | H1 true
Calculated F value is greater than the critical F values
Significant >>>reject H0
Elementary statistics 19
Calculated F value is lower than the critical F values
Non Significant >>>accept H0
α = significance level
1- β = power
Truth
Data
H0 Correct HA Correct
Decide H0
“fail to reject H0”
1- α
True Negative
β
False Negative
Decide HA
“reject H0”
α
False Positive
1- β
True Positive
Elementary statistics 21
Z - test
T – test
F – test
Z - test
is based on the normal probability distribution and is used for
judging the significance of several statistical measures, particularly
the mean. (n>30)
z-test is generally used for comparing the mean of a sample to
some hypothesized mean for the population in case of large sample
Elementary statistics 23
T – test
is based on t-distribution and is considered an appropriate
test for judging the significance of a sample mean or for
judging the significance of difference between the means
of two samples in case of small sample(s) when population
variance is not known (in which case we use variance of
the sample as an estimate of the population variance).
t-test applies only in case of small sample(s)
when population variance is unknown.
Unknown variance
Under H0
Critical values: statistics books or computer
t-distribution approximately normal for degrees of freedom (df) >30
0
( 1)~
/
n
X
t
s n



Elementary statistics 24
F – test
is based on F-distribution and is used to compare the variance of
the two-independent samples. This test is also used in the context
of analysis of variance (ANOVA) for judging the significance of
more than two sample means at one and the same time.
Test statistic, F, is calculated and compared with its probable value
(to be seen in the F-ratio tables for different degrees of freedom for
greater and smaller variances at specified level of significance) for
accepting or rejecting the null hypothesis.
Anova tables:
for a 1-way anova with N observations and T treatments.
Source df SS MS F
treatment (T-1) SStrt =SStrt/(T-1) MStrt/MSerr
error…………by subtraction Sserr =SSerr/dferr
Total (N-1)
Finally, you (or the PC) consult tables or otherwise obtain a probability of
obtaining this F value given df for treatment and error.
1: Calculate N, Σx, Σx2 for the whole dataset.
2: Find the Correction factor
CF = (Σx * Σx) /N
3: Find the total Sum of Squares for the data
= Σ(xi
2) – CF
4: add up the totals for each treatment in turn (Xt.), then
calculate Treatment Sum of Squares
SStrt = Σt(Xt.*Xt.)/r - CF
where Xt. = sum of all values within treatment t, and r is
the number of observations that went into that total.
3: Draw up ANOVA table, getting error terms by subtraction.
Elementary statistics 27
Complete Randomize Design (CRD)
Randomize Complete Block Design (RBD)
Latin Square (LQ)
Treatments
Replication
Degree of freedom (df)
@LSD : Least
Significant
Difference
@DMRT:
Duncan’s New
Multiple Range
Test
 Most people have difficulties in determining
whether a model is linear or non-linear.
 Before discussing the issues of linear vs. non-
linear systems, let's have a short look at
some examples, displaying several types of
discrimination lines between two classes:
Elementary statistics 28
linear
Non-
linear
 Here's the answer: linear models are linear
in the parameters which have to be
estimated, but not necessarily in the
independent variables.
 This explains why the middle of the three
figures above shows a linear discrimination
line between the two classes, although the
line is not linear in the sense of a straight
line.
Elementary statistics 29
 When calculating a regression model, we are
interested in a measure of the usefulness of
the model.
 There are several ways to do this, one of
them being the coefficient of determination
(also sometimes called goodness of fit).
 The concept behind this coefficient is to
calculate the reduction of the error of
prediction when the information provided by
the x values is included in the calculation.
Elementary statistics 30
 Thus the coefficient of determination specifies
the amount of sample variation in y explained
by x.
 For simple linear regression the coefficient of
determination is simply the square of the
correlation coefficient between Y and X .
Elementary statistics 31
-1 +10
Strong negative
Linear relationship
Strong positive
Linear relationship
No Linear relationship
 also called Pearson's product moment
correlation after Karl Pearson is calculated
by
Elementary statistics 32
The correlation coefficient may take any value between -1.0 and +1.0.
Assumptions: linear relationship between x and y
continuous random variables
both variables must be normally distributed
x and y must be independent of each other
Elementary statistics 33
2 test
2 test
is based on chi-square distribution and as a parametric test
is used for comparing a sample variance to a theoretical
population variance.
where
= variance of the sample;
= variance of the population;
(n – 1) = degrees of freedom,
n being the number of items in the sample.
Elementary statistics 35
 In quality control, there are situations when
we need to know whether a sample mean lies
within the confidence limits of the entire
population. This can be accomplished by
using t-distribution to determine confidence
limits for a population mean using a selected
probability.
Elementary statistics 36
E
X
A
M
P
L
E
I
We will use Excel function TINV( ) to determine the t-distribution.
Ten cans of sliced pineapple were removed at
random from a population of 1000 cans. The
drained weight of the contents were
measured as 410.5, 411.4, 410.4, 412.6,
411.9, 411.5,412.5, 411.4, 411.5, 410.1 g.
Determine the 95% confidence limits for the
entire population.
Elementary statistics 37
We will first calculate the average of the ten
data values using the AVERAGE() function.
Next we will determine the standard
deviation of the sample mean using STDEV()
function. Then we will use the following
expression to estimate the lower and upper
limits of population mean
Elementary statistics 38
Elementary statistics 39
Discussion:
The results show that the 95% confidence lower
and upper limits for the population mean are
410.78 and 411.98, respectively.
When a sample is taken from a large
population and analyzed for selected DATA,
statistical analysis is helpful in obtaining
estimates for the total population from
which the sample was obtained. In this
worksheet.
Elementary statistics 40
E
X
A
M
P
L
E
IIWe will use Excel's built-in data analysis techniques to determine
various statistical descriptors for the sample and the population.
 A sample of 10 breads is obtained from a
conveyor belt exiting a baking oven. The
breads are analyzed for color by comparing
them with a standard color chart. The values
recorded, in customized color units, are as
follows: 34, 33, 36,37, 31, 32, 38, 33, 34,
and 35. Estimate the mean, variance,
and standard deviation of the population.
Elementary statistics 41
Case study : Color Data
We will use the Data Analysis capability of
Excel in determining the descriptive
statistics for the given data. First, you should
make sure that Data Analysis... is available
under the menu command Tools. If it is not
available, then see Next slide for details on
how to add this analysis package.
Elementary statistics 42
43
 Click Microsoft Office Button , and Then
Click Excel Options
 Click Add-ins. In Manage Box, Select Excel
Add-ins
 Click Go
 In the Add-Ins Available Box, Select Analysis
ToolPak Check Box and Click OK. (If ToolPak
Is Not Listed, Click Browse to Locate It.)
Elementary statistics 44
Step 1 Open a new worksheet expanded to full size.
Step 2 In cells A2 :A 11, type the text labels and data values
Elementary statistics 45
Step 3 Choose the menu items Data, Data Analysis ....
A dialog box will open as shown.
Step 4 Double click on Descriptive Statistics.
Elementary statistics 46
Step 5 In the edit box for Input Range:, type the range of
cells as SA$2:$A$11.
Step 6 Select the radio button Columns.
Step 7 In output range type A13. Click OK.
Step 8 Excel will calculate the descriptive statistics and
display results in cells A13:B28
@The results indicate that the
sample mean is 34.3.
@The standard deviation for
the population is 2.214, and
@the sample variance of the
population is 4.9
 t  (difference between samples) / (variability)
 Excel will automatically calculate t-values to
compare:
 Means of two datasets with equal variances
 Means of two datasets with unequal variances
 Two sets of paired data
 abs(t-score) < abs(t-critical): accept H0
 Insufficient evidence to prove that observed
differences reflect real, significant differences
47
 A researcher wishes to test whether heavy
metal in soil have different mean after war
threat versus before war threat. The heavy
metal in soil is that mean after war threat
will exceed mean before war threat
Elementary statistics 48
Use Excel to help test the hypothesis for the difference
in population means.
E
X
A
M
P
L
E
III
Elementary statistics 49
Step 1 Open a new worksheet expanded to full size.
Step 2 In cells B5 :C19, type the text labels and data values
The null and
hypothesis to be
test are:
0.0:
0.0:
21
21




A
o
H
H
Elementary statistics 50
Step 3 Choose the menu items Tools, Data Analysis ....
A dialog box will open as shown.
Step 4 Double click on t-Test two-sample assuring equal variances.
Elementary statistics 51
Elementary statistics 52
Change this if you want to know
whether the means of the two
samples differ by at least some
specified amount.
p value for Two-tail test is
.007 which is less than .05 so
we reject the null hypothesis.
p value for one tailed
test is .003 which is
less than .05 so we
reject the null
hypothesis.
t > tcritical(two-tail), so
the mean of sample #1
is significantly
different from the mean
of sample #2.
t > tcritical(one-tail), so the
mean of sample #1 is
significantly larger than
the mean of sample #2.
 In hypothesis testing, it is sometimes not
possible to use the same judges for testing
different treatments. Although, it would be
desirable to use the same judges to evaluate
samples obtained from different treatments.
 In such cases, we have a completely
randomized design. Using single-factor ANOVA
Elementary statistics 53
We can test to see whether the treatments had any influence on the
judges scores; in other words, does the mean of each treatment differ?
E
X
A
M
P
L
E
IV
 Consider a weight of oranges from three
different suppliers A, B, and C .Five oranges
was random sampling and weighted. The
following weights were obtained:
Elementary statistics 54
Case study : Weight of oranges Data
A B C
150 148 146
151 150 148
152 152 150
153 154 152
154 156 154
For each treatment, 5 samples were weighted by
5 times. Therefore, the design was completely
randomized. Calculate the F value to determine
whether the means of three treatments are
significantly different.
Elementary statistics 55
 We will use a single factor analysis of variance
available in Excel. We will determine the F
value at probability of 0.95 .
 These computations will allow us to determine
if the means between the three different
treatments are significantly different.
 First make sure that the Data Analysis...
Command is available under menu item Data.
Elementary statistics 56
Elementary statistics 57
Step 1 Open a new worksheet expanded to full size.
Step 2 In cells A4 :C8, type the text labels and data values
Elementary statistics 58
Step 3 Choose the menu items Data, Data Analysis ....
A dialog box will open as shown.
Step 4 Double click on Anova Single Factor.
Elementary statistics 59
The results show that the F value is 0.889. The critical F
values are At the 5% level F = 3.885
This indicates that for the example problem the F value is lower than
the value at the 5% level but not at the 5% level. Thus, we can
say that no significant difference in their mean scores(P<0.05).
 When we are interested in evaluating samples
for sensory characteristics using same judges
with samples obtained from multiple
treatments, analysis of variance for a two-
factor design without replication is useful.
 This analysis helps in determining if there are
significant differences among the various
treatments as well as if an significant
differences exist among the judges themselves.
Elementary statistics 60
E
X
A
M
P
L
E
V
 Three types of ice cream were evaluated by
11 judges. The judges assigned the following
scores.
Elementary statistics 61
Judge Ice Cream A Ice Cream B Ice Cream C
A 16 14 15
B 17 15 17
C 16 16 16
D 18 14 16
E 16 14 14
F 17 16 17
G 18 14 15
H 16 15 16
I 17 14 14
J 18 13 16
K 17 15 15
 We will use the built-in analysis pack
available in the Excel command called Data
Analysis ....
 Three sets of results will be obtained for the
5% level
Elementary statistics 62
Elementary statistics 63
Step 1 Open a new worksheet expanded to full size.
Step 2. In cell A3 :D 13, type the text labels and
data values,
Elementary statistics 64
Step 3 Choose the menu items Data, Data Analysis ....
A dialog box will open.
Step 4 Double click on Anova: Two-Factor Without
Replication. A new dialog box will open.
Step 5 Type entries in edit boxes as shown.
Step 6. The results will be displayed in cells
Elementary statistics 65
The difference
among ice cream
types is determined
by examining the F
values. The F value
is calculated as
19.73. This value is
greater than 3.49 for
the 5% level
For judges, the calculated F value is
1.36. This value is lower than the critical
F values of 2.35 at the 5 % level
 The difference among ice cream types is
determined by examining the F values. The F
value is calculated as 19.73. This value is
greater than 3.49 for the 5% level,
 The ice cream types are significantly
different at p<0.001.
 For judges, the calculated F value is 1.36.
This value is lower than the critical F values
of 2.35 at the 5 % level.
 The judges showed no significant difference
in their mean scores.
Elementary statistics 66
 Simple regression analysis involves determining
the statistical relationship between two
variables. One of the uses of such analysis is in
predicting one variable on the basis of the
other.
Elementary statistics 67
We will use the regression analysis available in
the Add-in package in Excel to determine linear regression
between two variables.
E
X
A
M
P
L
E
VI
flavor with storage time in a frozen
vegetable. Sensory scores obtained at 0, 1,
2, 3, 4 and 6 month times were 1.5, 2, 2, 3,
2.5, and 3.5, respectively. Assuming that
these data can be linearly correlated,
determine the regression coefficient and
predict the off-flavor score at 5 months of
storage.
Elementary statistics 68
Case study : Sensory scores Data
We will use the package Regression available
as an Add-in item in Excel. We will use this
package to obtain required statistical
relationships. We assume that a linear
relationship exists between the off-flavor
score and time (in months) with the equation
Elementary statistics 69
y= mx+b,
where
y is off-flavor score, x is time in months, m is slope and
b is intercept.
Elementary statistics 70
Step 1 Open a new worksheet expanded to full size.
Step 2 In cells A4 :B9, enter the text labels and data values
Elementary statistics 71
Step 3 Choose the menu items Data, Data Analysis .... A dialog box will
open.
Step 4 Double click on Regression.
Step 5 A new dialog box will open. Enter the range of cells for Y and X as
shown. Check boxes for Residuals and Line Fit Plots. Click OK.
Elementary statistics 72
Probability of
getting this value of
F by randomly
sampling from a
normally distributed
population. Low
value means model
(rather than random
variability) explains
most variation in
data.
Ratio of variability explained
by model to leftover
variability. High number
means model explains most
variation in data.
~99% of the variation in y is explained by
variation in x. The remainder may be
random error, or may be explained by
some factor other than x.
Confidence limits on
slope and intercept.
Probability of getting a slope or intercept this
much different from zero by randomly sampling
from a normally-distributed population.
y=0.31x+1.58
The results will
be displayed
 The r 2 value is calculated as 0.85, the
standard error is 0.318.The intercept is 1.5786
and the slope is 0.3143.
 The linear equation is y = 0.31x + 1.58 . The
residual output gives the predicted values for
the off-flavor score at different time intervals.
These data are also shown in the chart.
 The predicted and calculated values are shown.
The predicted value at 5 months of storage
duration is calculated as 3.13.
Elementary statistics 73
Elementary statistics 74
75
76
 Statistics
- Descriptive Statistics
- Histograms
- Hypothesis Testing
- Scatter Plots
- Regression Analysis
77
 Click Microsoft Office Button , and Then
Click Excel Options
 Click Add-ins. In Manage Box, Select Excel
Add-ins
 Click Go
 In the Add-Ins Available Box, Select Analysis
ToolPak Check Box and Click OK. (If ToolPak
Is Not Listed, Click Browse to Locate It.)
78
 Click Data/Data Analysis (Far Right) /Descriptive
Statistics & OK.
 Put Checkmarks on Summary Statistics, 95% or
99% Confidence Interval, & Labels in First Row
Boxes.
 Move Cursor to Input Range Window, Highlight
Data to Analyze including Labels, & Click OK.
 Your Data will Appear on New Worksheet.
 Widen Columns by Clicking Home/Format/AutoFit
Column Width.
79
 Click Data/Data Analysis/Histogram & OK.
 Put Checkmarks on Chart Output & New Worksheet
Boxes.
 Move Cursor to Input Range Window, Highlight Data
Going into Histogram.
 Move Cursor to Input Bin Range, Highlight Data
Showing Upper Value of Each Bin & Click OK.
 Histogram will be on New Worksheet. You May
Lengthen it by Clicking Blank Space in Window, Moving
Cursor to Window Bottom Line & Holding Down Mouse
Button as You Pull Down Window.
80
 Go to Sheet One.
 Click Data/Data Analysis/ and the Appropriate
Statistical Test. Then Click OK.
 On New Window Check Labels Box and Put
Cursor on Variable 1 Range.
 Highlight Variable 1 Data Including Label.
 Put Cursor on Variable 2 Range & Highlight
Variable 2 Data (Including Label). Then Click OK.
 Click Home/Format/AutoFit/Column Width
81
 Go to Sheet One.
 Highlight Data (Be Sure X Values are in
Left Column and Y Values are in Right
Column).
 Click Insert/Scatter. Pull down menu and
click Upper Left Icon.
 Click a Datum Point on Chart with Right
Mouse Key, Add Trendline, & Click Linear.
82
 Go to Sheet One.
 Click Data/Data Analysis (On Far Right)
/Regression & Click OK.
 On New Window Check Labels Box and Put
Cursor on X Range.
 Highlight X Data Including Label.
 Put Cursor on Y Range & Highlight Y Data
(Including Label), Then Click OK.
 Click Home/Format/AutoFit Column Width.
Elementary statistics 83

More Related Content

What's hot

Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributionsjasondroesch
 
frequency distribution table
frequency distribution tablefrequency distribution table
frequency distribution tableMonie Ali
 
4 measures of variability
4  measures of variability4  measures of variability
4 measures of variabilityDr. Nazar Jaf
 
Descriptive and Inferential Statistics
Descriptive and Inferential StatisticsDescriptive and Inferential Statistics
Descriptive and Inferential StatisticsDanica Antiquina
 
Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)YesAnalytics
 
Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"muhammad raza
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IJames Neill
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsRozainita Rosley
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
 
Basic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture NotesBasic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture NotesDr. Nirav Vyas
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distributionAvjinder (Avi) Kaler
 
What is Statistics
What is StatisticsWhat is Statistics
What is Statisticssidra-098
 
Kolmogorov smirnov
Kolmogorov smirnovKolmogorov smirnov
Kolmogorov smirnovRaquel Cruz
 
Lecture11 spearman rank correlation part-2-with tied ranks
Lecture11 spearman rank correlation part-2-with tied ranksLecture11 spearman rank correlation part-2-with tied ranks
Lecture11 spearman rank correlation part-2-with tied ranksDr Rajeev Kumar
 

What's hot (20)

Frequency distribution
Frequency distributionFrequency distribution
Frequency distribution
 
Statistics:Fundamentals Of Statistics
Statistics:Fundamentals Of StatisticsStatistics:Fundamentals Of Statistics
Statistics:Fundamentals Of Statistics
 
Math 102- Statistics
Math 102- StatisticsMath 102- Statistics
Math 102- Statistics
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
frequency distribution table
frequency distribution tablefrequency distribution table
frequency distribution table
 
4 measures of variability
4  measures of variability4  measures of variability
4 measures of variability
 
Quartile
QuartileQuartile
Quartile
 
Descriptive and Inferential Statistics
Descriptive and Inferential StatisticsDescriptive and Inferential Statistics
Descriptive and Inferential Statistics
 
Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)Introduction to Statistics (Part -I)
Introduction to Statistics (Part -I)
 
Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"Presentation on "Measure of central tendency"
Presentation on "Measure of central tendency"
 
Multiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA IMultiple Linear Regression II and ANOVA I
Multiple Linear Regression II and ANOVA I
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
INFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTIONINFERENTIAL STATISTICS: AN INTRODUCTION
INFERENTIAL STATISTICS: AN INTRODUCTION
 
Basic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture NotesBasic Concepts of Statistics - Lecture Notes
Basic Concepts of Statistics - Lecture Notes
 
Measures Of Central Tendencies
Measures Of Central TendenciesMeasures Of Central Tendencies
Measures Of Central Tendencies
 
Stat topics
Stat topicsStat topics
Stat topics
 
Normal and standard normal distribution
Normal and standard normal distributionNormal and standard normal distribution
Normal and standard normal distribution
 
What is Statistics
What is StatisticsWhat is Statistics
What is Statistics
 
Kolmogorov smirnov
Kolmogorov smirnovKolmogorov smirnov
Kolmogorov smirnov
 
Lecture11 spearman rank correlation part-2-with tied ranks
Lecture11 spearman rank correlation part-2-with tied ranksLecture11 spearman rank correlation part-2-with tied ranks
Lecture11 spearman rank correlation part-2-with tied ranks
 

Similar to elementary statistic

Elementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyElementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyAtcharaporn Khoomtong
 
ders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptxders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptxErgin Akalpler
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupNeelam Zafar
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statisticsRamachandra Barik
 
Statistical Significance Tests.pptx
Statistical Significance Tests.pptxStatistical Significance Tests.pptx
Statistical Significance Tests.pptxAldofChrist
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminHazilahMohd
 
LEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docxLEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docxsmile790243
 
Statistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-TestStatistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-TestVasundhraKakkar
 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.pptSnehamurali18
 
MPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxMPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxrodrickrajamanickam
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfDikshathawait
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statisticsguestc94d8c
 
QNT 275 Exceptional Education - snaptutorial.com
QNT 275   Exceptional Education - snaptutorial.comQNT 275   Exceptional Education - snaptutorial.com
QNT 275 Exceptional Education - snaptutorial.comDavisMurphyB22
 
Qnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comQnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comBaileya33
 
QNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comQNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comBromleyz33
 
abdi research ppt.pptx
abdi research ppt.pptxabdi research ppt.pptx
abdi research ppt.pptxAbdetaBirhanu
 

Similar to elementary statistic (20)

Elementary statistics for Food Indusrty
Elementary statistics for Food IndusrtyElementary statistics for Food Indusrty
Elementary statistics for Food Indusrty
 
ders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptxders 5 hypothesis testing.pptx
ders 5 hypothesis testing.pptx
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Chi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & groupChi square and t tests, Neelam zafar & group
Chi square and t tests, Neelam zafar & group
 
Medical Statistics Part-II:Inferential statistics
Medical Statistics Part-II:Inferential  statisticsMedical Statistics Part-II:Inferential  statistics
Medical Statistics Part-II:Inferential statistics
 
Statistical Significance Tests.pptx
Statistical Significance Tests.pptxStatistical Significance Tests.pptx
Statistical Significance Tests.pptx
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Anova by Hazilah Mohd Amin
Anova by Hazilah Mohd AminAnova by Hazilah Mohd Amin
Anova by Hazilah Mohd Amin
 
LEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docxLEARNING OUTCOMESKnow what descriptive statistics are an.docx
LEARNING OUTCOMESKnow what descriptive statistics are an.docx
 
Statistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-TestStatistical tests of significance and Student`s T-Test
Statistical tests of significance and Student`s T-Test
 
Chi square test social research refer.ppt
Chi square test social research refer.pptChi square test social research refer.ppt
Chi square test social research refer.ppt
 
Data science
Data scienceData science
Data science
 
MPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptxMPhil clinical psy Non-parametric statistics.pptx
MPhil clinical psy Non-parametric statistics.pptx
 
ChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdfChandanChakrabarty_1.pdf
ChandanChakrabarty_1.pdf
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statistics
 
Chi2 Anova
Chi2 AnovaChi2 Anova
Chi2 Anova
 
QNT 275 Exceptional Education - snaptutorial.com
QNT 275   Exceptional Education - snaptutorial.comQNT 275   Exceptional Education - snaptutorial.com
QNT 275 Exceptional Education - snaptutorial.com
 
Qnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.comQnt 275 Enhance teaching / snaptutorial.com
Qnt 275 Enhance teaching / snaptutorial.com
 
QNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.comQNT 275 Inspiring Innovation / tutorialrank.com
QNT 275 Inspiring Innovation / tutorialrank.com
 
abdi research ppt.pptx
abdi research ppt.pptxabdi research ppt.pptx
abdi research ppt.pptx
 

More from Atcharaporn Khoomtong

ปกหน้า food packaging book
ปกหน้า food packaging bookปกหน้า food packaging book
ปกหน้า food packaging bookAtcharaporn Khoomtong
 
Impacts of food safety standards on processed (case study Thailand)
Impacts of food safety standards on processed (case study Thailand)Impacts of food safety standards on processed (case study Thailand)
Impacts of food safety standards on processed (case study Thailand)Atcharaporn Khoomtong
 
Impacts of food safety standards (atcharaporn khoomtong)
Impacts of food safety standards  (atcharaporn khoomtong)Impacts of food safety standards  (atcharaporn khoomtong)
Impacts of food safety standards (atcharaporn khoomtong)Atcharaporn Khoomtong
 
Freezing and thawing process (atcharaporn khoomtong)
Freezing and thawing process  (atcharaporn khoomtong)Freezing and thawing process  (atcharaporn khoomtong)
Freezing and thawing process (atcharaporn khoomtong)Atcharaporn Khoomtong
 

More from Atcharaporn Khoomtong (9)

ปกหน้า food packaging book
ปกหน้า food packaging bookปกหน้า food packaging book
ปกหน้า food packaging book
 
freezing and thawing process
freezing and thawing processfreezing and thawing process
freezing and thawing process
 
haccp of pineapple canned in syrup
haccp of pineapple canned in syruphaccp of pineapple canned in syrup
haccp of pineapple canned in syrup
 
impacts of food safety standards
impacts of food safety standardsimpacts of food safety standards
impacts of food safety standards
 
Impacts of food safety standards on processed (case study Thailand)
Impacts of food safety standards on processed (case study Thailand)Impacts of food safety standards on processed (case study Thailand)
Impacts of food safety standards on processed (case study Thailand)
 
Impacts of food safety standards (atcharaporn khoomtong)
Impacts of food safety standards  (atcharaporn khoomtong)Impacts of food safety standards  (atcharaporn khoomtong)
Impacts of food safety standards (atcharaporn khoomtong)
 
Freezing and thawing process (atcharaporn khoomtong)
Freezing and thawing process  (atcharaporn khoomtong)Freezing and thawing process  (atcharaporn khoomtong)
Freezing and thawing process (atcharaporn khoomtong)
 
Haccp of pineapple canned in syrup
Haccp of pineapple canned in syrupHaccp of pineapple canned in syrup
Haccp of pineapple canned in syrup
 
Recall fda
Recall fdaRecall fda
Recall fda
 

elementary statistic

  • 1. By Dr.Atcharaporn Khoomtong Elementary statistics 1  Introduction  Statistical methodology  Step of scientific research  Important parametric tests  Important nonparametric tests  Example using Excel program  Using Excel for Statistics in Gateway Cases – Office 2007 Elementary statistics 2
  • 2. Most people become familiar with probability and statistics through radios, television,newspapers and magazines.For example,the following statements were found in newspapers.  Eating 10 grams(g) of fiber a day reduce the risk of heart attack by 14%  Thirty minutes (of exercise) two or three times each week can raise HDLs 10 to 15% Elementary statistics 3  Statistics is used to analyze the results of surveys and as a tool in scientific research to make decisions based on controlled experiments.  Other uses of statistics include operations research, quality control, estimation and prediction. Elementary statistics 4
  • 3. What’s it? Flower  as the basis of data analysis are concerned with two basic types of problems (1) summarizing, describing, and exploring the data (2) using sampled data to infer the nature of the process which produced the data Elementary statistics 6 This problems is covered by inferential statistics. This problems is covered by descriptive statistics
  • 4.  Statistics plays an important role in the description of mass phenomena.  Organized and summarized for clear presentation for ease of communications.  Data may come from studies of populations or samples  It offers methods to summarize a collection of data. These methods may be numerical or graphical, both of which have their own advantages and disadvantages. Elementary statistics 7  Inferential statistics is used to draw conclusions about a data set.  Usually this means drawing inferences about a population from a sample either by estimating some relationships or by testing some hypothesis. Elementary statistics 8 A Population is the set of all possible states of a random variable. The size of the population may be either infinite or finite. A Sample is a subset of the population; its size is always finite.
  • 5. Descriptive Statistics  Graphical  Arrange data in tables  Bar graphs and pie charts  Numerical  Percentages  Averages  Range  Relationships  Correlation coefficient  Regression analysis Inferential Statistics  Confidence interval  Compare means of two samples  t Test  F -Test  Compare means from three samples  Pre/post (LSD,DMRT)  ANOVA = analysis of variance  F -Test Elementary statistics 10  Another important aspect of data analysis is the Data, which can be of two different types:  qualitative data ex. Sex, color, smell, taste etc.  quantitative data ex. Height, weight, percentage etc.  Qualitative data does not contain quantitative information.  Qualitative data can be classified into categories.
  • 6. Elementary statistics 11 Type of Scale Possible Statements Allowed Operators Examples nominal scale identity, countable =, ≠ colors, phone numbers, feelings ordinal scale identity, less than/greater than relations, countable =, ≠, <, > soccer league table, military ranks, energy efficiency classes interval scale identity, less than/greater than relations, equality of differences =, ≠ , <, >, +, - dates (years), temperature in Celsius, IQ scale ratio scale identity, less than/greater than relations, equality of differences, equality of ratios, zero point =, ≠ , <, >, +, -, *, / velocities, lengths, temperatur in Kelvin, age Elementary statistics 12 Collecting the necessary facts Analyzing the facts Making decisions Carrying out decisions Assessing the results Descriptive Statistics Inference Statistics
  • 7.  Mode =The most frequent value  Median =The value of the middle point of the ordered measurements  Mean =The average (balancing point in the distribution)  Variance= The average of the squared deviations of all the population measurements from the population mean  Standard deviation =The square root of the variance       2 2  Descriptive Formula   1 2 2    S Inferential Formula Called the “unbiased estimator of the population value”
  • 8. Population of profit margins for five companies: 8%, 10%, 15%, 12%, 5%               611 5 58 5 2542504 5 52502 5 105101210151010108 22222 22222 2 .       %40636112 ..  %10 5 50 5 51215108     Hypothesis = a assumption or some supposition to be proved or disproved. “the automobile A is performing as well as automobile B.”
  • 9.  Null hypothesis (H0 ) =expresses no difference  Alternative hypothesis (H1 ) Elementary statistics 17 H0:  = 0 Often said “H naught” Or any number Later……. H0: 1 = 2 H0:  = 0; Null Hypothesis HA:  = 0; Alternative Hypothesis Type I error(α) : reject H0 | H0 true Elementary statistics 18 Type II error(β) :  Accept H0 | H1 true
  • 10. Calculated F value is greater than the critical F values Significant >>>reject H0 Elementary statistics 19 Calculated F value is lower than the critical F values Non Significant >>>accept H0 α = significance level 1- β = power Truth Data H0 Correct HA Correct Decide H0 “fail to reject H0” 1- α True Negative β False Negative Decide HA “reject H0” α False Positive 1- β True Positive
  • 11. Elementary statistics 21 Z - test T – test F – test Z - test is based on the normal probability distribution and is used for judging the significance of several statistical measures, particularly the mean. (n>30) z-test is generally used for comparing the mean of a sample to some hypothesized mean for the population in case of large sample
  • 12. Elementary statistics 23 T – test is based on t-distribution and is considered an appropriate test for judging the significance of a sample mean or for judging the significance of difference between the means of two samples in case of small sample(s) when population variance is not known (in which case we use variance of the sample as an estimate of the population variance). t-test applies only in case of small sample(s) when population variance is unknown. Unknown variance Under H0 Critical values: statistics books or computer t-distribution approximately normal for degrees of freedom (df) >30 0 ( 1)~ / n X t s n    Elementary statistics 24 F – test is based on F-distribution and is used to compare the variance of the two-independent samples. This test is also used in the context of analysis of variance (ANOVA) for judging the significance of more than two sample means at one and the same time. Test statistic, F, is calculated and compared with its probable value (to be seen in the F-ratio tables for different degrees of freedom for greater and smaller variances at specified level of significance) for accepting or rejecting the null hypothesis.
  • 13. Anova tables: for a 1-way anova with N observations and T treatments. Source df SS MS F treatment (T-1) SStrt =SStrt/(T-1) MStrt/MSerr error…………by subtraction Sserr =SSerr/dferr Total (N-1) Finally, you (or the PC) consult tables or otherwise obtain a probability of obtaining this F value given df for treatment and error. 1: Calculate N, Σx, Σx2 for the whole dataset. 2: Find the Correction factor CF = (Σx * Σx) /N 3: Find the total Sum of Squares for the data = Σ(xi 2) – CF 4: add up the totals for each treatment in turn (Xt.), then calculate Treatment Sum of Squares SStrt = Σt(Xt.*Xt.)/r - CF where Xt. = sum of all values within treatment t, and r is the number of observations that went into that total. 3: Draw up ANOVA table, getting error terms by subtraction.
  • 14. Elementary statistics 27 Complete Randomize Design (CRD) Randomize Complete Block Design (RBD) Latin Square (LQ) Treatments Replication Degree of freedom (df) @LSD : Least Significant Difference @DMRT: Duncan’s New Multiple Range Test  Most people have difficulties in determining whether a model is linear or non-linear.  Before discussing the issues of linear vs. non- linear systems, let's have a short look at some examples, displaying several types of discrimination lines between two classes: Elementary statistics 28 linear Non- linear
  • 15.  Here's the answer: linear models are linear in the parameters which have to be estimated, but not necessarily in the independent variables.  This explains why the middle of the three figures above shows a linear discrimination line between the two classes, although the line is not linear in the sense of a straight line. Elementary statistics 29  When calculating a regression model, we are interested in a measure of the usefulness of the model.  There are several ways to do this, one of them being the coefficient of determination (also sometimes called goodness of fit).  The concept behind this coefficient is to calculate the reduction of the error of prediction when the information provided by the x values is included in the calculation. Elementary statistics 30
  • 16.  Thus the coefficient of determination specifies the amount of sample variation in y explained by x.  For simple linear regression the coefficient of determination is simply the square of the correlation coefficient between Y and X . Elementary statistics 31 -1 +10 Strong negative Linear relationship Strong positive Linear relationship No Linear relationship  also called Pearson's product moment correlation after Karl Pearson is calculated by Elementary statistics 32 The correlation coefficient may take any value between -1.0 and +1.0. Assumptions: linear relationship between x and y continuous random variables both variables must be normally distributed x and y must be independent of each other
  • 17. Elementary statistics 33 2 test 2 test is based on chi-square distribution and as a parametric test is used for comparing a sample variance to a theoretical population variance. where = variance of the sample; = variance of the population; (n – 1) = degrees of freedom, n being the number of items in the sample.
  • 18. Elementary statistics 35  In quality control, there are situations when we need to know whether a sample mean lies within the confidence limits of the entire population. This can be accomplished by using t-distribution to determine confidence limits for a population mean using a selected probability. Elementary statistics 36 E X A M P L E I We will use Excel function TINV( ) to determine the t-distribution.
  • 19. Ten cans of sliced pineapple were removed at random from a population of 1000 cans. The drained weight of the contents were measured as 410.5, 411.4, 410.4, 412.6, 411.9, 411.5,412.5, 411.4, 411.5, 410.1 g. Determine the 95% confidence limits for the entire population. Elementary statistics 37 We will first calculate the average of the ten data values using the AVERAGE() function. Next we will determine the standard deviation of the sample mean using STDEV() function. Then we will use the following expression to estimate the lower and upper limits of population mean Elementary statistics 38
  • 20. Elementary statistics 39 Discussion: The results show that the 95% confidence lower and upper limits for the population mean are 410.78 and 411.98, respectively. When a sample is taken from a large population and analyzed for selected DATA, statistical analysis is helpful in obtaining estimates for the total population from which the sample was obtained. In this worksheet. Elementary statistics 40 E X A M P L E IIWe will use Excel's built-in data analysis techniques to determine various statistical descriptors for the sample and the population.
  • 21.  A sample of 10 breads is obtained from a conveyor belt exiting a baking oven. The breads are analyzed for color by comparing them with a standard color chart. The values recorded, in customized color units, are as follows: 34, 33, 36,37, 31, 32, 38, 33, 34, and 35. Estimate the mean, variance, and standard deviation of the population. Elementary statistics 41 Case study : Color Data We will use the Data Analysis capability of Excel in determining the descriptive statistics for the given data. First, you should make sure that Data Analysis... is available under the menu command Tools. If it is not available, then see Next slide for details on how to add this analysis package. Elementary statistics 42
  • 22. 43  Click Microsoft Office Button , and Then Click Excel Options  Click Add-ins. In Manage Box, Select Excel Add-ins  Click Go  In the Add-Ins Available Box, Select Analysis ToolPak Check Box and Click OK. (If ToolPak Is Not Listed, Click Browse to Locate It.) Elementary statistics 44 Step 1 Open a new worksheet expanded to full size. Step 2 In cells A2 :A 11, type the text labels and data values
  • 23. Elementary statistics 45 Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open as shown. Step 4 Double click on Descriptive Statistics. Elementary statistics 46 Step 5 In the edit box for Input Range:, type the range of cells as SA$2:$A$11. Step 6 Select the radio button Columns. Step 7 In output range type A13. Click OK. Step 8 Excel will calculate the descriptive statistics and display results in cells A13:B28 @The results indicate that the sample mean is 34.3. @The standard deviation for the population is 2.214, and @the sample variance of the population is 4.9
  • 24.  t  (difference between samples) / (variability)  Excel will automatically calculate t-values to compare:  Means of two datasets with equal variances  Means of two datasets with unequal variances  Two sets of paired data  abs(t-score) < abs(t-critical): accept H0  Insufficient evidence to prove that observed differences reflect real, significant differences 47  A researcher wishes to test whether heavy metal in soil have different mean after war threat versus before war threat. The heavy metal in soil is that mean after war threat will exceed mean before war threat Elementary statistics 48 Use Excel to help test the hypothesis for the difference in population means. E X A M P L E III
  • 25. Elementary statistics 49 Step 1 Open a new worksheet expanded to full size. Step 2 In cells B5 :C19, type the text labels and data values The null and hypothesis to be test are: 0.0: 0.0: 21 21     A o H H Elementary statistics 50 Step 3 Choose the menu items Tools, Data Analysis .... A dialog box will open as shown. Step 4 Double click on t-Test two-sample assuring equal variances.
  • 26. Elementary statistics 51 Elementary statistics 52 Change this if you want to know whether the means of the two samples differ by at least some specified amount. p value for Two-tail test is .007 which is less than .05 so we reject the null hypothesis. p value for one tailed test is .003 which is less than .05 so we reject the null hypothesis. t > tcritical(two-tail), so the mean of sample #1 is significantly different from the mean of sample #2. t > tcritical(one-tail), so the mean of sample #1 is significantly larger than the mean of sample #2.
  • 27.  In hypothesis testing, it is sometimes not possible to use the same judges for testing different treatments. Although, it would be desirable to use the same judges to evaluate samples obtained from different treatments.  In such cases, we have a completely randomized design. Using single-factor ANOVA Elementary statistics 53 We can test to see whether the treatments had any influence on the judges scores; in other words, does the mean of each treatment differ? E X A M P L E IV  Consider a weight of oranges from three different suppliers A, B, and C .Five oranges was random sampling and weighted. The following weights were obtained: Elementary statistics 54 Case study : Weight of oranges Data A B C 150 148 146 151 150 148 152 152 150 153 154 152 154 156 154
  • 28. For each treatment, 5 samples were weighted by 5 times. Therefore, the design was completely randomized. Calculate the F value to determine whether the means of three treatments are significantly different. Elementary statistics 55  We will use a single factor analysis of variance available in Excel. We will determine the F value at probability of 0.95 .  These computations will allow us to determine if the means between the three different treatments are significantly different.  First make sure that the Data Analysis... Command is available under menu item Data. Elementary statistics 56
  • 29. Elementary statistics 57 Step 1 Open a new worksheet expanded to full size. Step 2 In cells A4 :C8, type the text labels and data values Elementary statistics 58 Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open as shown. Step 4 Double click on Anova Single Factor.
  • 30. Elementary statistics 59 The results show that the F value is 0.889. The critical F values are At the 5% level F = 3.885 This indicates that for the example problem the F value is lower than the value at the 5% level but not at the 5% level. Thus, we can say that no significant difference in their mean scores(P<0.05).  When we are interested in evaluating samples for sensory characteristics using same judges with samples obtained from multiple treatments, analysis of variance for a two- factor design without replication is useful.  This analysis helps in determining if there are significant differences among the various treatments as well as if an significant differences exist among the judges themselves. Elementary statistics 60 E X A M P L E V
  • 31.  Three types of ice cream were evaluated by 11 judges. The judges assigned the following scores. Elementary statistics 61 Judge Ice Cream A Ice Cream B Ice Cream C A 16 14 15 B 17 15 17 C 16 16 16 D 18 14 16 E 16 14 14 F 17 16 17 G 18 14 15 H 16 15 16 I 17 14 14 J 18 13 16 K 17 15 15  We will use the built-in analysis pack available in the Excel command called Data Analysis ....  Three sets of results will be obtained for the 5% level Elementary statistics 62
  • 32. Elementary statistics 63 Step 1 Open a new worksheet expanded to full size. Step 2. In cell A3 :D 13, type the text labels and data values, Elementary statistics 64 Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open. Step 4 Double click on Anova: Two-Factor Without Replication. A new dialog box will open. Step 5 Type entries in edit boxes as shown. Step 6. The results will be displayed in cells
  • 33. Elementary statistics 65 The difference among ice cream types is determined by examining the F values. The F value is calculated as 19.73. This value is greater than 3.49 for the 5% level For judges, the calculated F value is 1.36. This value is lower than the critical F values of 2.35 at the 5 % level  The difference among ice cream types is determined by examining the F values. The F value is calculated as 19.73. This value is greater than 3.49 for the 5% level,  The ice cream types are significantly different at p<0.001.  For judges, the calculated F value is 1.36. This value is lower than the critical F values of 2.35 at the 5 % level.  The judges showed no significant difference in their mean scores. Elementary statistics 66
  • 34.  Simple regression analysis involves determining the statistical relationship between two variables. One of the uses of such analysis is in predicting one variable on the basis of the other. Elementary statistics 67 We will use the regression analysis available in the Add-in package in Excel to determine linear regression between two variables. E X A M P L E VI flavor with storage time in a frozen vegetable. Sensory scores obtained at 0, 1, 2, 3, 4 and 6 month times were 1.5, 2, 2, 3, 2.5, and 3.5, respectively. Assuming that these data can be linearly correlated, determine the regression coefficient and predict the off-flavor score at 5 months of storage. Elementary statistics 68 Case study : Sensory scores Data
  • 35. We will use the package Regression available as an Add-in item in Excel. We will use this package to obtain required statistical relationships. We assume that a linear relationship exists between the off-flavor score and time (in months) with the equation Elementary statistics 69 y= mx+b, where y is off-flavor score, x is time in months, m is slope and b is intercept. Elementary statistics 70 Step 1 Open a new worksheet expanded to full size. Step 2 In cells A4 :B9, enter the text labels and data values
  • 36. Elementary statistics 71 Step 3 Choose the menu items Data, Data Analysis .... A dialog box will open. Step 4 Double click on Regression. Step 5 A new dialog box will open. Enter the range of cells for Y and X as shown. Check boxes for Residuals and Line Fit Plots. Click OK. Elementary statistics 72 Probability of getting this value of F by randomly sampling from a normally distributed population. Low value means model (rather than random variability) explains most variation in data. Ratio of variability explained by model to leftover variability. High number means model explains most variation in data. ~99% of the variation in y is explained by variation in x. The remainder may be random error, or may be explained by some factor other than x. Confidence limits on slope and intercept. Probability of getting a slope or intercept this much different from zero by randomly sampling from a normally-distributed population. y=0.31x+1.58 The results will be displayed
  • 37.  The r 2 value is calculated as 0.85, the standard error is 0.318.The intercept is 1.5786 and the slope is 0.3143.  The linear equation is y = 0.31x + 1.58 . The residual output gives the predicted values for the off-flavor score at different time intervals. These data are also shown in the chart.  The predicted and calculated values are shown. The predicted value at 5 months of storage duration is calculated as 3.13. Elementary statistics 73 Elementary statistics 74
  • 38. 75 76  Statistics - Descriptive Statistics - Histograms - Hypothesis Testing - Scatter Plots - Regression Analysis
  • 39. 77  Click Microsoft Office Button , and Then Click Excel Options  Click Add-ins. In Manage Box, Select Excel Add-ins  Click Go  In the Add-Ins Available Box, Select Analysis ToolPak Check Box and Click OK. (If ToolPak Is Not Listed, Click Browse to Locate It.) 78  Click Data/Data Analysis (Far Right) /Descriptive Statistics & OK.  Put Checkmarks on Summary Statistics, 95% or 99% Confidence Interval, & Labels in First Row Boxes.  Move Cursor to Input Range Window, Highlight Data to Analyze including Labels, & Click OK.  Your Data will Appear on New Worksheet.  Widen Columns by Clicking Home/Format/AutoFit Column Width.
  • 40. 79  Click Data/Data Analysis/Histogram & OK.  Put Checkmarks on Chart Output & New Worksheet Boxes.  Move Cursor to Input Range Window, Highlight Data Going into Histogram.  Move Cursor to Input Bin Range, Highlight Data Showing Upper Value of Each Bin & Click OK.  Histogram will be on New Worksheet. You May Lengthen it by Clicking Blank Space in Window, Moving Cursor to Window Bottom Line & Holding Down Mouse Button as You Pull Down Window. 80  Go to Sheet One.  Click Data/Data Analysis/ and the Appropriate Statistical Test. Then Click OK.  On New Window Check Labels Box and Put Cursor on Variable 1 Range.  Highlight Variable 1 Data Including Label.  Put Cursor on Variable 2 Range & Highlight Variable 2 Data (Including Label). Then Click OK.  Click Home/Format/AutoFit/Column Width
  • 41. 81  Go to Sheet One.  Highlight Data (Be Sure X Values are in Left Column and Y Values are in Right Column).  Click Insert/Scatter. Pull down menu and click Upper Left Icon.  Click a Datum Point on Chart with Right Mouse Key, Add Trendline, & Click Linear. 82  Go to Sheet One.  Click Data/Data Analysis (On Far Right) /Regression & Click OK.  On New Window Check Labels Box and Put Cursor on X Range.  Highlight X Data Including Label.  Put Cursor on Y Range & Highlight Y Data (Including Label), Then Click OK.  Click Home/Format/AutoFit Column Width.