T-Test
R. Sahoo
Theoretical work on “t distribution” was done by William
Sealy Gosset in 1980. He has published his findings under
the pen name “Student” because he was working with
“Guinness Son & Company, Dublin Brewery, Ireland” and
company didn’t permit employees to publish research
findings under their own names. That’s why it is called
Student’s t Test.
Student’s t Test
 A t-test is an inferential statistic used to determine if
there is a statistically significant difference between
the means of two variables.
 The t-test is a test used for hypothesis testing in
statistics.
 Calculating a t-test requires three fundamental data
values including the difference between the mean
values from each data set, the standard deviation of
each group, and the number of data values.
 T-tests can be dependent or independent.
Key Takeaways
Uses of t-test / Application
 Student’s t-test is used when sample size is less than 30
 Population standard deviation (𝜎) is unknown.
 Compare the means of two groups on two samples.
 When parameter of population are normal.
 Correlation of coefficient in population is Zero.
Type of t-test
Investigate whether there’s a difference
within a group between two points in time
(within-subjects).
If the groups come from a single population (e.g.,
measuring before and after an experimental treatment),
perform a paired t-test. This is a within-subjects design.
Paired t-test
Type of t-test Cont..
Investigate whether there's a difference
between two groups (between-subjects).
Type of t-test Cont..
If the groups come from two different populations (e.g.,
two different species, or people from two separate
cities), perform a two-sample t-test (independent t-test).
This is a between-subjects design.
Independent t-test
Type of t-test Cont..
Investigate whether there's a difference
between a group and a standard value or
Whether a subgroup belongs to a
population.
Type of t-test Cont..
If there is one group being compared against a standard
value (e.g., comparing the acidity of a liquid to a neutral
pH of 7), perform a one-sample t-test.
One-sample t-test
Type of t-test Cont..
If you only care whether the two populations are
different from one another, perform a two-tailed t-test.
If you want to know whether one population mean is
greater than or less than the other, perform a one-
tailed t-test.
One-tailed or two-tailed t-test
The following flowchart can be used to determine which
t-test to use based on the characteristics of the sample
sets. The key items to consider include the similarity of
the sample records, the number of data records in each
sample set, and the variance of each sample set.
Which t-test to Use
Calculating a t-test requires three fundamental data values.
They include the difference between the mean values from
each data set, or the mean difference, the standard deviation
of each group, and the number of data values of each
group.
where x
̄ = mean of sample
μ = mean of population
n = sample size
s = standard deviation of sample
t-test Formula
t =
𝒙−µ
𝒔
𝒏
The t-score
The t score is a ratio between the difference between
two groups and the difference within the groups.
 Larger t scores = more difference between groups.
 Smaller t score = more similarity between groups.
 The p value is a number, calculated from a statistical
test, that describes how likely you are to have found a
particular set of observations if the null hypothesis
were true.
 P values are used in hypothesis testing to help decide
whether to reject the null hypothesis. The smaller the
p value, the more likely you are to reject the null
hypothesis.
Understanding P-values
Degrees of freedom are the maximum number of logically
independent values, which may vary in a data sample.
Degrees of freedom are calculated by subtracting one from
the number of items within the data sample.
Degrees of Freedom
Formula:
where:
Df ​= Degrees of freedom
N = Sample size
Df​ = N −𝟏
The T-Distribution Table is available in one-tail and two-
tails formats. The former is used for assessing cases that
have a fixed value or range with a clear direction, either
positive or negative. For instance, what is the probability
of the output value remaining below -3, or getting more
than seven when rolling a pair of dice? The latter is used
for range-bound analysis, such as asking if the coordinates
fall between -2 and +2.
How is the t-distribution table used
Alok Restaurant near the railway station at Cuttack has been
having average sales of 500 tea cups per day. Because of the
development of bus stand nearby, it expects to increase its sales.
During the first 12 days after the start of the bus stand, the daily
sales were as under: 550, 570, 490, 615, 505, 580, 570, 460, 600,
580, 530, and 526
On the basis of this sample information, can one conclude that
Alok Restaurant’s sales have increased? Use 5 per cent level of
significance.
Example
Solution: Taking the null hypothesis that sales average 500 tea cups
per day and they have not increased unless proved, we can write:
H0 : μ = 500 cups per day
Ha : μ > 500 (as we want to conclude that sales have increased).
As the sample size is small and the population standard deviation is
not known, we shall use t-test assuming normal population and shall
work out the test statistic t as:
t =
𝑿−µ𝟎
𝑺𝟏∕ 𝒏
(To find 𝑋and s1 we make the following computations):
Example Cont..
S. No. 𝑋𝑖 𝑋𝑖 − 𝑋 𝑋𝑖 − 𝑋 2
1 550 2 4
2 570 22 484
3 490 -58 3364
4 615 67 4489
5 505 -43 1849
6 580 32 1024
7 570 22 484
8 460 -88 7744
9 600 52 2704
10 580 32 1024
11 530 -18 324
12 526 -22 484
n = 12 ∑𝑋𝑖 = 6576 ∑ 𝑋𝑖 − 𝑋 2 = 23978
Example Cont..
𝑿 =
∑𝑿𝒊
𝒏
=
𝟔𝟓𝟕𝟔
𝟏𝟐
= 548
S1 =
∑ 𝑿𝒊−𝑿 𝟐
𝒏−𝟏
=
𝟐𝟑𝟗𝟕𝟖
𝟏𝟐−𝟏
= 46.68
t =
𝟓𝟒𝟖−𝟓𝟎𝟎
𝟒𝟔.𝟔𝟖∕ 𝟏𝟐
=
𝟒𝟖
𝟏𝟑.𝟒𝟗
= 3.558
Degree of freedom = n – 1 = 12 – 1 = 11
and
Hence,
Example Cont..
As H1 is one-sided, we shall determine the rejection region
applying one-tailed test (in the right tail because H1 is of more than
type) at 5 per cent level of significance and it comes to as under,
using table of t-distribution for 11 degrees of freedom:
R : t > 1.796
The observed value of t is 3.558 which is in the rejection region
and thus H0 is rejected at 5 per cent level of significance and we
can conclude that the sample data indicate that Alok restaurant’s
sales have increased.
Example Cont..
Different t-test Formulae
To test the significance of the Mean of random sample
s =
∑ 𝑿−𝑿 𝟐
𝒏−𝟏
t =
𝒙−µ ∗ 𝒏
𝒔
Confidence Interval Estimate (α level)
One tailed test
𝒙 ± 𝒕𝜶
𝒔
𝒏
𝒙 ± 𝒕𝜶∕𝟐
𝒔
𝒏
Two tailed test
Different t-test Formulae Cont..
To test the difference between Means of the two samples
(Independent samples)
t =
𝒙𝟏−𝒙𝟐
𝒔
*
𝒏𝟏∗ 𝒏𝟐
𝒏𝟏+ 𝒏𝟐
𝒔 =
∑ 𝒙𝟏 − 𝒙𝟏
𝟐 + ∑ 𝒙𝟐 − 𝒙𝟐
𝟐
𝒏𝟏 + 𝒏𝟐 − 𝟏
Different t-test Formulae Cont..
To test the difference between Means of the two
samples (Dependent samples / Matched pair)
𝑑 = Mean of the difference
𝒕 =
𝒅 ∗ 𝜼
𝒔 𝒔 =
∑ 𝒅 − 𝒅
𝟐
𝒏 − 𝟏
Different t-test Formulae Cont..
Testing the Significance of an observed
correlation coefficient
𝒕 =
𝒓
𝟏 − 𝒓𝟐
∗ 𝒏 − 𝟐
n = Number of sample
r = Correlation coefficient
1. C R Kothari & Gaurav Garg, Research Methodology, fourth
edition
2. https://www.scribbr.com/statistics
3. https://www.investopedia.com/terms/t/t-test
4. https://www.statisticshowto.com/probability-and-statistics
5. https://en.wikipedia.org/wiki/Student%27s t-test
6. https://www.wallstreetmojo.com/t-test
7. https://archive.nptel.ac.in
Reference

t test

  • 1.
  • 2.
    Theoretical work on“t distribution” was done by William Sealy Gosset in 1980. He has published his findings under the pen name “Student” because he was working with “Guinness Son & Company, Dublin Brewery, Ireland” and company didn’t permit employees to publish research findings under their own names. That’s why it is called Student’s t Test. Student’s t Test
  • 3.
     A t-testis an inferential statistic used to determine if there is a statistically significant difference between the means of two variables.  The t-test is a test used for hypothesis testing in statistics.  Calculating a t-test requires three fundamental data values including the difference between the mean values from each data set, the standard deviation of each group, and the number of data values.  T-tests can be dependent or independent. Key Takeaways
  • 4.
    Uses of t-test/ Application  Student’s t-test is used when sample size is less than 30  Population standard deviation (𝜎) is unknown.  Compare the means of two groups on two samples.  When parameter of population are normal.  Correlation of coefficient in population is Zero.
  • 5.
    Type of t-test Investigatewhether there’s a difference within a group between two points in time (within-subjects).
  • 6.
    If the groupscome from a single population (e.g., measuring before and after an experimental treatment), perform a paired t-test. This is a within-subjects design. Paired t-test Type of t-test Cont..
  • 7.
    Investigate whether there'sa difference between two groups (between-subjects). Type of t-test Cont..
  • 8.
    If the groupscome from two different populations (e.g., two different species, or people from two separate cities), perform a two-sample t-test (independent t-test). This is a between-subjects design. Independent t-test Type of t-test Cont..
  • 9.
    Investigate whether there'sa difference between a group and a standard value or Whether a subgroup belongs to a population. Type of t-test Cont..
  • 10.
    If there isone group being compared against a standard value (e.g., comparing the acidity of a liquid to a neutral pH of 7), perform a one-sample t-test. One-sample t-test Type of t-test Cont..
  • 11.
    If you onlycare whether the two populations are different from one another, perform a two-tailed t-test. If you want to know whether one population mean is greater than or less than the other, perform a one- tailed t-test. One-tailed or two-tailed t-test
  • 12.
    The following flowchartcan be used to determine which t-test to use based on the characteristics of the sample sets. The key items to consider include the similarity of the sample records, the number of data records in each sample set, and the variance of each sample set. Which t-test to Use
  • 14.
    Calculating a t-testrequires three fundamental data values. They include the difference between the mean values from each data set, or the mean difference, the standard deviation of each group, and the number of data values of each group. where x ̄ = mean of sample μ = mean of population n = sample size s = standard deviation of sample t-test Formula t = 𝒙−µ 𝒔 𝒏
  • 15.
    The t-score The tscore is a ratio between the difference between two groups and the difference within the groups.  Larger t scores = more difference between groups.  Smaller t score = more similarity between groups.
  • 16.
     The pvalue is a number, calculated from a statistical test, that describes how likely you are to have found a particular set of observations if the null hypothesis were true.  P values are used in hypothesis testing to help decide whether to reject the null hypothesis. The smaller the p value, the more likely you are to reject the null hypothesis. Understanding P-values
  • 17.
    Degrees of freedomare the maximum number of logically independent values, which may vary in a data sample. Degrees of freedom are calculated by subtracting one from the number of items within the data sample. Degrees of Freedom Formula: where: Df ​= Degrees of freedom N = Sample size Df​ = N −𝟏
  • 18.
    The T-Distribution Tableis available in one-tail and two- tails formats. The former is used for assessing cases that have a fixed value or range with a clear direction, either positive or negative. For instance, what is the probability of the output value remaining below -3, or getting more than seven when rolling a pair of dice? The latter is used for range-bound analysis, such as asking if the coordinates fall between -2 and +2. How is the t-distribution table used
  • 20.
    Alok Restaurant nearthe railway station at Cuttack has been having average sales of 500 tea cups per day. Because of the development of bus stand nearby, it expects to increase its sales. During the first 12 days after the start of the bus stand, the daily sales were as under: 550, 570, 490, 615, 505, 580, 570, 460, 600, 580, 530, and 526 On the basis of this sample information, can one conclude that Alok Restaurant’s sales have increased? Use 5 per cent level of significance. Example
  • 21.
    Solution: Taking thenull hypothesis that sales average 500 tea cups per day and they have not increased unless proved, we can write: H0 : μ = 500 cups per day Ha : μ > 500 (as we want to conclude that sales have increased). As the sample size is small and the population standard deviation is not known, we shall use t-test assuming normal population and shall work out the test statistic t as: t = 𝑿−µ𝟎 𝑺𝟏∕ 𝒏 (To find 𝑋and s1 we make the following computations): Example Cont..
  • 22.
    S. No. 𝑋𝑖𝑋𝑖 − 𝑋 𝑋𝑖 − 𝑋 2 1 550 2 4 2 570 22 484 3 490 -58 3364 4 615 67 4489 5 505 -43 1849 6 580 32 1024 7 570 22 484 8 460 -88 7744 9 600 52 2704 10 580 32 1024 11 530 -18 324 12 526 -22 484 n = 12 ∑𝑋𝑖 = 6576 ∑ 𝑋𝑖 − 𝑋 2 = 23978 Example Cont..
  • 23.
    𝑿 = ∑𝑿𝒊 𝒏 = 𝟔𝟓𝟕𝟔 𝟏𝟐 = 548 S1= ∑ 𝑿𝒊−𝑿 𝟐 𝒏−𝟏 = 𝟐𝟑𝟗𝟕𝟖 𝟏𝟐−𝟏 = 46.68 t = 𝟓𝟒𝟖−𝟓𝟎𝟎 𝟒𝟔.𝟔𝟖∕ 𝟏𝟐 = 𝟒𝟖 𝟏𝟑.𝟒𝟗 = 3.558 Degree of freedom = n – 1 = 12 – 1 = 11 and Hence, Example Cont..
  • 24.
    As H1 isone-sided, we shall determine the rejection region applying one-tailed test (in the right tail because H1 is of more than type) at 5 per cent level of significance and it comes to as under, using table of t-distribution for 11 degrees of freedom: R : t > 1.796 The observed value of t is 3.558 which is in the rejection region and thus H0 is rejected at 5 per cent level of significance and we can conclude that the sample data indicate that Alok restaurant’s sales have increased. Example Cont..
  • 25.
    Different t-test Formulae Totest the significance of the Mean of random sample s = ∑ 𝑿−𝑿 𝟐 𝒏−𝟏 t = 𝒙−µ ∗ 𝒏 𝒔 Confidence Interval Estimate (α level) One tailed test 𝒙 ± 𝒕𝜶 𝒔 𝒏 𝒙 ± 𝒕𝜶∕𝟐 𝒔 𝒏 Two tailed test
  • 26.
    Different t-test FormulaeCont.. To test the difference between Means of the two samples (Independent samples) t = 𝒙𝟏−𝒙𝟐 𝒔 * 𝒏𝟏∗ 𝒏𝟐 𝒏𝟏+ 𝒏𝟐 𝒔 = ∑ 𝒙𝟏 − 𝒙𝟏 𝟐 + ∑ 𝒙𝟐 − 𝒙𝟐 𝟐 𝒏𝟏 + 𝒏𝟐 − 𝟏
  • 27.
    Different t-test FormulaeCont.. To test the difference between Means of the two samples (Dependent samples / Matched pair) 𝑑 = Mean of the difference 𝒕 = 𝒅 ∗ 𝜼 𝒔 𝒔 = ∑ 𝒅 − 𝒅 𝟐 𝒏 − 𝟏
  • 28.
    Different t-test FormulaeCont.. Testing the Significance of an observed correlation coefficient 𝒕 = 𝒓 𝟏 − 𝒓𝟐 ∗ 𝒏 − 𝟐 n = Number of sample r = Correlation coefficient
  • 29.
    1. C RKothari & Gaurav Garg, Research Methodology, fourth edition 2. https://www.scribbr.com/statistics 3. https://www.investopedia.com/terms/t/t-test 4. https://www.statisticshowto.com/probability-and-statistics 5. https://en.wikipedia.org/wiki/Student%27s t-test 6. https://www.wallstreetmojo.com/t-test 7. https://archive.nptel.ac.in Reference