Shahid Lecture-4-MKAG1273

MAL1303: STATISTICAL HYDROLOGY
Hypothesis Test
Dr. Shamsuddin Shahid
Department of Hydraulics and Hydrology
Faculty of Civil Engineering, Universiti Teknologi Malaysia
Room No.: M46-332; Phone: 07-5531624; Mobile: 0182051586
Email: sshahid@utm.my
11/23/2015 Shamsuddin Shahid, FKA, UTM
You created this PDF from an application that is not licensed to print to novaPDF printer (http://www.novapdf.com)

How can we solve it?
Groundwater depth (m)
data is collected from two
aquifer namely X and Y. We
want to know is
groundwater depth is both
aquifers are same or not.

After using a new technique,
groundwater yield has
increased significantly. How
can we prove it.

Environmental activist claim
that after introduction of
fertilizer based agriculture
groundwater quality of the area
has been deteriorated. Is it
possible to prove?

Is it the solution?
Sixteen (16) river discharge
data (randomly selected) of
two rivers are collected. From
the mean of the discharge
data it is clear that River-B
has higher discharge
compared to River-A. It is
possible to say discharge of
River-B is higher than River-
A?

Interval of Mean Discharge
For River-A at 95% level of confidence:
30.2  A  215.5
For River-B at 95% level of confidence:
60.4  B  190.7
River-A and River-B can have same mean
discharge value.
Is it the solution?

One tailed Test:
Rejection region for
Ha:   520 when a  .025
Two tailed Test:
Rejection region for
Ha:   520 when a  .025

Comparing two sets of data

Hypothesis Tests
One important use of hypothesis tests is to evaluate and
compare groups of data. Statistical tests are the most
quantitative ways to determine whether hypotheses can be
substantiated, or whether they must be modified or
rejected outright.
Hypothesis tests have at least two advantages over educated
opinion:
1. They insure that every analyst of a data set using the
same methods will arrive at the same result.
2. They present a measure of the strength of the evidence
(the p-value).

1) Choose the appropriate test.
2) Establish the null and alternate hypotheses.
3) Decide on an acceptable error rate α.
4) Compute the test statistic from the data.
5) Compute the p-value.
6) Reject the null hypothesis if p ≤ α.
Structure of Hypothesis Tests

Selection of Appropriate Test
There are a larger number of hypothesis tests. They are classified
based on
1. The measurement scales of the data
2. Distribution of the data
If the measurement scales are interval/ratio and data distribution is
normal, we use parametric hypothesis tests
If the measurement scales are not interval/ration (such as ordinal or
categorical) or event interval/ratio but not normally distribution,
then we use non-parametric hypothesis tests.

Null Hypothesis and Alternative Hypothesis
The 'null' often refers to the common view of something, while the
alternative hypothesis is what the researcher really thinks is the cause
of a phenomenon. The null hypothesis is a hypothesis which the
researcher tries to disprove, reject or nullify.
The null hypothesis, denoted as H0
The alternative hypothesis, denoted as Ha

Want to test mean can be 190?
Ho:  = 190 when  =0.05 [Null hypothesis: mean value can be 190]
Ha:   190 when  =0.05 [Alternative hypothesis: mean value can not be
190]
Comparing two population means, µ1 and µ2:
Null Hypothesis, H0: µ1 = µ2.
The alternative hypothesis, H1: µ1 ≠ µ2 (two-tailed t test),
H1: µ1 < µ2 (one tailed t test),
or
H1: µ1 > µ2 (one-tailed t test).
Example: Null and Alternative Hypothesis

1) Choose the appropriate test.
2) Establish the null and alternate hypotheses.
3) Decide on an acceptable error rate α.
4) Compute the test statistic from the data.
5) Compute the p-value.
6) Reject the null hypothesis if α  p.
Structure of Hypothesis Tests

Permiability of groundwater is found to vary very widely in an area. One
hundred (n=100) permiability measurements are done in an area.
Calculated mean of permiability of 100 measurements is 190. For some
engineering purpose we need to know whether groundwater permiability
in the area can have a mean value of 180 or not? We want to determine it
at 95% level of confidence.
Ho:  = 190 when  =0.05 [Null hypothesis: mean value can be 190]
Ha:   190 when  =0.05 [Alternative hypothesis: mean value can not be 190]
A Simple Example

A Simple Example
Accepted Region=
Result:
180 can not be the
mean permeability
in the region
At 95% level of confidence:

Comparing Two sets of Data: Student t-test
Underlying assumptions made in using the t test to compare
two population means:
1. The underlying distributions for both populations are
normal.
2. The variances of the two populations are approximately
equal:
s1 = s2

Null Hypothesis
The null hypothesis, denoted as H0, is expressed as follows for the
t-test comparing two population means, µ1 and µ2:
H0: µ1 = µ2.
Alternative Hypothesis
The alternative hypothesis, denoted as H1, is expressed as one of
the following for the t test comparing two population means, µ1 and
µ2:
H1: µ1 ≠ µ2 (two-tailed t test),
H1: µ1 < µ2 (one tailed t test),
or
H1: µ1 > µ2 (one-tailed t test).
Null Hypothesis

Student t-test: Comparing two sets of data
Standard Error in Mean
t-statistic estimated using:
Where,
n1 is the number of xi observations, n2 is the number of yi
observations,
Sx
2 is the sample variance of xi , Sy
2 is the sample variance of yi,
x is the sample average for xi , and y is the sample average for yi

1. Once the t-statistic has been computed, we can compare our
estimated t value to critical t values given in a table for the t
distribution.
2. If estimated t value is greater than the critical t value entry in the t
table associated with a significance level of α (one-sided t test) or
α/2 (two-sided t test) we can reject the null hypothesis.
3. Thus, we compare our t value to the t distribution table entry for:
t(α, n1 + n2 − 2) (one-sided)
or
t(α/2, n1 + n2 − 2) (two-sided)
where α is the level of significance (equal to 1 – level of
confidence), and n1 and n2 are the number of samples from each
of the two populations being compared.
Making Decision

Student t-test: Example
Groundwater samples are from near a
underground mining area before the
starting mining and after mining are given
below. It is anticipated by many scientists
that increasing concentration of Chemical-X
in groundwater due to the mining. Is it true?
Null Hypothesis, H0: µ1 = µ2
[No change in groundwater quality]
Alternative Hypothesis, H1: µ1 ≠ µ2
[Groundwater quality has changed]

Student t-test: Example
t(calculated) = 0.7968
Degree of freedom
= n1 + n2 -2
= 16 + 14 – 2 = 28
At Alpha = 0.05
t(critical) = t(0.025, 28) = 2.3685
t(calculated) < t(critical)
Decision: Null hypothesis can not be
rejected at 95% level of confidence.

ANalysis Of VAriance (ANOVA)
Analysis of variance (ANOVA) is a method for testing the hypothesis
that there is no difference between two or more population means
(usually at least three).
Why t-test cannot be applied?
• t-test, which is based on the standard error of the difference
between two means, can only be used to test differences
between two means
• With more than two means, could compare each mean with
each other mean using t-tests. Conducting multiple t-tests can
lead to error and is NOT RECOMMENDED

Three groups tightly spread about their respective means, the variability
within each group is relatively small.
Three groups have the same means as in previous figure but the
variability within each group is much larger.
ANOVA examines the difference between the groups as well as the
difference within a group.
Analysis of Variance (ANOVA)

Assumptions of ANOVA
1. The observations are sampled independently, the groups
under consideration are Independent. Selection of one
sample has no effect on another
2. Each of the populations is Normally distributed with the
same variance (homogeneity of variance)
3. Population variances are equal

Calculating an ANOVA means that we want to calculate the F
statistic. There are six steps to calculating the F statistic:
1. Calculation of “sum of squares” between the groups,
2. Calculation of “sum of squares” within the groups,
3. Determine the degrees of freedom for each.
4. Calculation of “mean square between” and “mean square
within”
5. Calculation of the F ratio (or F statistic)
6. Making a decision
Calculating an ANOVA

Calculating an ANOVA
Mean Square Between (MSB)
Mean Square Within (MSW)
F-statistics
Larger F-statistics mean more variation between the group
compared to within the group. Larger F-statistics support the
groups are from different population.

Calculation of Degree of Freedom
Degrees of freedom between (DFB) and the degrees of freedom
within (DFW) can be calculated by following way:
DFB = No. of groups - 1
DFW = Population size - No. of groups

Example ANOVA Test

Hypotheses
We may test the
Null Hypothesis : There is no difference in
groundwater depth in three catchments
against the
Alternative Hypothesis : the groundwater depth of at
least one pair of catchments are not equal

Sum of
Square
Between
(SSB)
38.798
SSB
SSB11/23/2015 Shamsuddin Shahid, FKA, UTM

Total Sum
Square
(TSS)
Total sum
square = Sum
square between
(SSB) + Sum
square within
(SSW)
44.735TSS
TSS

Total sum square (TSS)=
Sum square between (SSB) + Sum square within (SSW)
Therefore,
SSW = TSS – SSB
= 44.735 – 38.798
= 5.937
Mean Square Within (MSW)

Determine Degree of Freedoms
Between group degree of freedom (BDF)
=Number of group – 1
= 3 -1
=2
Within group degree of freedom (WDF)
=Total population – Total Group
= 30 – 3
=27

Mean Squares
Between Group Mean Square
= SSB / BDF
= 38.798 / 2
= 19.399
Within Group Mean Square
= SSW / WDF
= 5.937 / 27
= 0.2199

F-Statistics
Between Group Mean Square
F = --------------------------------------------------
Within Group Mean Square
= 19.399 / 0.2199
= 88.2
F (0.05; 2,27) = 3.36
F(calculated)>F(critical). Therefore, we can reject null hypothesis.
Important:
The F statistic doesn’t advise us about which groups are different, it
only says that mean values does or does not differ significantly by
different groups. In this case, it only says groundwater depth differs
significantly in different catchments.

One-way and Two-way ANOVA
When there is only one qualitative variable which denotes the groups and only
one measurement variable (quantitative), a one-way ANOVA is carried out. The
purpose of one-way ANOVA is to find out whether data from several groups have
a common mean. That is, to determine whether the groups are actually different
in the measured characteristic.
The purpose of two-way ANOVA is to test the effectives of two independent
variables of several groups. One-way ANOVA and two-way ANOVA differ in that
the groups in two-way ANOVA have two categories of defining characteristics
instead of one.
Suppose sediment samples are collected from three different areas. Contents of
two minerals (A & B) are measured for each sample. We want to see are the
samples are different from area to area as well as from types of mineral contents.

Chi-square Test of Normality

Normsdist(z) [Excel Function]
Normsdist(-1) = 0.158655
Normsdist (-1) – (Normsdist(0) = 0.341345
Normsdist(0 ) – Normsdist(1) = 0.341345
Normsdist(1) – Normsdist(2) = 0.135905
Expected Frequency = n x [probability of z-value occurring in that class
interval]
Example = 12 x 0.158655 = 1.903863

Example: (2 – 1.903863)2/1.903863
= 0.004855
Chi (calculated) = 0.09292
Chi(critical) (alpha,df) = ?
Degree of Freedom (df) = m – k – 1
Where, m is the number of class (here 4)
We estimated y(bar) and s, so k = 2
Therefore, df = 4 – 2 – 1 =1
Chi (0.05, 1) = 3.841459
Chi(calculated) < Chi(critical)
Null hypothesis can not be rejected.

We can conclude that, the measurements
has come from normal distribution at 95%
level of confidence

Parametric and Non-parametric Tests

Mann-Whitney U-Test
Computational Steps
1. Two samples are taken.
2. The data are put into order, based on size.
3. Data can be ranked from highest to lowest or lowest to highest
values
4. Calculate Mann-Whitney U statistic
U = n1n2 + n1(n1+1) – R1
2

Example of Mann-Whitney U-test
Two tailed null hypothesis
that there is no difference
between transmissivity in two
aquifers
Ho: Aquifer-A and Aquifer-B
have same Transmissivity
HA: Transmissivity of
Aquifer-A and Aquifer-B are
not same.

Transmis.
Aquifer-A
Transmis.
Aquifer-A
Ranks of
Trans. Of A
Ranks of Trans.
Of B
193 175 1 7
188 173 2 8
185 168 3 10
183 165 4 11
180 163 5 12
178 6
170 9
n2 = 7 n1 = 5 R1 = 30 R2 = 48
Example of Mann-Whitney U test
U1 = n1n2 + n1(n1+1) – R1
2
U1 =(5)(7) + (5)(6) – 30
2
U1 = 35 + 15 – 30
U1 = 20
U 0.05,7,5 = 5
The value is equal to our value, Therefore, Ho is rejected.
We can say at 95% level of confidence that the two samples have
different mean
U2 = n1n2 + n2(n2+1) – R2
2
U2 =(5)(7) + (7)(8) – 48
2
U2 = 35 + 28 – 48
U2 = 15
U2 ~ U1 = 15 ~ 20

• The Kruskal-Wallis test is a nonparametric (distribution free) test,
which is used to compare three or more groups of sample data.
• Kruskal-Wallis Test is used when assumptions of ANOVA are not met.
In ANOVA, we assume that distribution of each group should be
normally distributed. In Kruskal-Wallis Test, we do not assume any
assumption about the distribution. So Kruskal-Wallis Test is a
distribution free test.
• If normality assumptions are met, then the Kruskal-Wallis Test is not as
powerful as ANOVA.
• The Kruskal-Wallis Test was developed by Kruskal and Wallis jointly
and is named after them.
Kruskal-Wallis Test

Steps of Kruskal-Wallis Test
1. Arrange the data of all samples in a single series in ascending order.
2. Assign rank to them in ascending order. In the case of a repeated
value, assign ranks to them by averaging their rank position.
3. Different samples are separated and summed up as R1 R2 R3, etc.
4. To calculate the value of Kruskal-Wallis Test, apply the following
formula:
Where,
H = Kruskal-Wallis Test
n = total number of observations in all samples
Ri = Rank of the sample

Calculation of Degree of Freedom:
Degree of freedom = k-1; population is each group should be more
than 5.
Kruskal-Wallis Test statistics is approximately a chi-square
distribution.
Value of Kruskal-Wallis Test < The chi-square table value:
The null hypothesis is can not be rejected. The sample comes from
same population.
Value of Kruskal-Wallis Test H > Tthe chi-square table value: The
null hypothesis is rejected. The sample comes from a different
population.
Kruskal-Wallis Test

Example: Groundwater depth in three catchments (A, B, C) are
measured. Is there any variation in groundwater depth in three
catchments?
Kruskal-Wallis Test: Example

Example: Cont..
H = 9.84
Degree of Freedom = No. of groups -1
= 3 -1
= 2
H(critical) = 5.99
H (calculated) > H (critical) at p = 0.01
Null hypothesis rejected.
Result: Significant difference exists in groundwater depth of three
catchments.

Chi-square Table

Nonparametric Methods
 Mann-Whitney-Wilcoxon Test
 Kruskal-Wallis Test
 Sign Test
 Wilcoxon Signed-Rank Test
 Run Test

Example: Sign Test
As part of research, studies were carried out to measure whether the
new method proposed by you (Method-A) can remote the Arsenic in
water more than the well-known existing method (Method-B). A
total of 36 case studies were conducted. The obtained result is given
below. Do the data shown below indicate a significant difference in
the two method?
18 found Method-A is better (+ sign recorded)
12 found Method-B is better (_ sign recorded)
6 cases both methods gives similar ambiguity
The analysis is based on a sample size of 18 + 12 = 30.

Hypotheses
H0: No preference for one method over the other exists
Ha: A preference for one method over the other exists
Rejection Rule
If binomial table value is less than certain p value (such as 0.05)
 Test Statistic
NEGBINOMDIST(12,18,0.5) = 0.1145 (cumulative value)
 Conclusion
Do not reject H0. There is insufficient evidence in the sample to
conclude that a difference in methods exists
We could reject if success is 20 and failure is 10 (Table value: 0.034).
Example

Example: Sign Test -Prevalence of one mineral
Problem
As part of study, we want to see
whether concentration of
Mineral-A is more compared to
Mineral-B in a place. We have
collected 14 samples and measure
the concentration of Mineral-A
and Mineral-B is the samples. Is
there any difference in
concentration of minerals in the
samples?

Example: Prevalence of one mineral
 Test Statistic
Yes = 11, No, 3, Cumulative Binomial Value = 0.023
 Conclusion
Binomial values is less than 0.05. Therefore, Reject H0 at 95% level
of confidence.
Decision: There is sufficient evidence in the sample to conclude that
concentration of one mineral is more compared to other.

Example: Wilcoxon Signed-Rank Test
This test is the nonparametric alternative to the parametric matched-sample
test
AsAs partpart ofof study,study, wewe wantwant toto seesee whetherwhether concentrationconcentration ofof MineralMineral--AA isis moremore
comparedcompared toto MineralMineral--BB inin aa placeplace.. WeWe havehave collectedcollected 1010 samplessamples andand measuremeasure
thethe concentrationconcentration ofof MineralMineral--AA andand MineralMineral--BB inin thethe samplessamples.. IsIs therethere anyany
differencedifference inin concentrationconcentration ofof mineralsminerals inin thethe samples?samples?

WilcoxonWilcoxon SignedSigned--Rank TestRank Test
 Preliminary Steps of the Test
• Compute the differences between the paired observations.
• Discard any differences of zero.
• Rank the absolute value of the differences from lowest to
highest. Tied differences are assigned the average ranking
of their positions.
• Give the ranks the sign of the original difference in the data.
• Sum the signed ranks individually (“+” together and “–”
together)
• Wilconxon Statistics W = minimum (“+” Rank; “-” Rank)
• Compare calculated value to Wilconxon Tabulated value.
• If your value less than the tabulated value Reject Null
Hypothesis

Example:Example: Wilcoxon SignedSigned--Rank TestRank Test
+ Rank = 49.5; - Rank = 5.5;
W = Mininmum (+Rank; - Rank) = 5.5
H0: The concentration of minerals are same
Ha: Concentration of minerals are not same.

Wilcoxon Critical Value Table
W = 5.5
N = 10
W(calculated) < W (critical)
Important Note: If
W(calculated) is less than
critical table value, then null
hypothesis is rejected.
Decision:
Reject H0. There is
sufficient evidence in the
sample to conclude that a
difference exists in
mineral concentration.

• The runs test is used to determine for serial
randomness: whether or not observations occur in a
sequence in time or over space.
• Runs Test is used for Nominal Data
• In Hydrological study, the runs test is most often used
to determine whether observations are random or
following some pattern.
Run TestRun Test

For example, we have sampled occurrence of some hydrological
disaster in every year, resulting in the data set:
Run TestRun Test
Where A denotes “No Disaster” and B denotes “Disaster” year. We are
interested in determining whether the order of the Disastruous year is
random or not. In some cases, some phenomena follows some
pattern, Like below:

Unlike other tests there is no equation for the runs test unless the
sample size of either group is greater than 30. One only needs to
count the number of runs (u), a run being a series of the same
nominal value when counting from left to right.
Run TestRun Test

Run Test: Example (Two tailed)Run Test: Example (Two tailed)
Flood years in a place during the last
twenty-one years (1990-2010) has been
given in the table below. It has been
reported in different studies that climate
change has caused an increase of flood
frequency in the recent years. We want
to check whether it is true in the place of
our interest.

Run Test: Example (Two tailed)Run Test: Example (Two tailed)
YNYNNYNNYNYYYNYYYNYYY
 Hypothesis
H0 : The occurrence of flood in random.
Ha : The occurrence of flood is not random.
 Computation of Test
n1 = 13 ← there are 13 occurrences of flood.
n2 = 8 ← there are 8 occurrences of no flood.
u = 13 ← there are 13 runs.
 Decision
At α = 0.05, u(critical) = 6, 16 ← there are 2 critical
values of u, if the calculated value falls between
these then H0 is accepted.
Since 6 < 13 < 16 accept H0
The distribution of flood years are random

CriticalCritical
Values forValues for
Run TestRun Test

If a one tailed runs test is used, we can determine whether the data
are either random, non-random due to clustering, or non-random due
to uniformity.
 u has two critical values:
If u < the lower u(critical )then the data are non-random due to
clustering.
If u > the upper u(Critical) then the data are non-random due to
uniformity.
If u falls between the lower and upper uCritical then the data are
random.
Run Test: Example (One tailed)Run Test: Example (One tailed)

Shahid Lecture-4-MKAG1273

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Viewers also liked

Viewers also liked (9)

Similar to Shahid Lecture-4-MKAG1273

Similar to Shahid Lecture-4-MKAG1273 (20)

Recently uploaded

Recently uploaded (20)

Shahid Lecture-4-MKAG1273