The Paired Sample T Test is used to determine whether the mean of a dependent variable. For example, weight, anxiety level, salary, or reaction time is the same in two related groups. It is particularly useful in measuring results before and after a particular event, action, process change, etc.
Software Project Health Check: Best Practices and Techniques for Your Product...
What is the Paired Sample T Test and How is it Beneficial to Business Analysis?
1. Master the Art of Analytics
A Simplistic Explainer Series For Citizen Data Scientists
J o u r n e y To w a r d s A u g m e n t e d A n a l y t i c s
3. Basic Terminologies
Sample data is the subset of population data used to represent the entire
group as whole
For instance, if we want to come up with average value of all cars in
united states, it is impractical to assess the each car value in united
states, adding these numbers and dividing by total number of cars
Instead, we can randomly select some of the cars, say 200 and get value
of each of these 200 cars and find average of these 200 numbers
These 200 numbers containing randomly selected 200 cars’ values is
called a sample data of entire United states’ cars’ values (population
data)
There are two popular sampling techniques namely simple random
sampling and stratified sampling which are explained in annexure section
4. P- value : In case of Paired samples t test, it indicates whether there is a
statistically significant difference between two samples
For different levels of accuracy desired, the p-value can be checked at different
thresholds and inference can be made accordingly
For instance, for confidence level or accuracy = 95% ( error =5%) , we have to
check p-value against the threshold of 0.05.
If p-value < 0.05 then the difference is significant and treatment has been
effective else the difference is insignificant and treatment has been not
significantly effective
Similarly, for confidence level =98% (error =2%), we have to check p-value
against the threshold of 0.02.
If p-value < 0.02 then the difference is significant and treatment has been
effective else the difference is insignificant and treatment has been not
significantly effective and so on
Basic Terminologies
5. Introduction
• It is used to determine whether the mean of a dependent variable (e.g.,
weight, anxiety level, salary, reaction time, etc.) is the same in two related
groups (e.g., two groups of participants that are measured at two different
"time points" or who undergo two different "conditions")
• Thus the classic use of the Paired t-Test is to evaluate the before and after
of some treatment
• Examples :
• Understand whether there was a difference in managers' salaries before and after
undertaking a PhD (i.e., your dependent variable would be "salary", and your two
related groups would be the two different "time points"; that is, salaries "before"
and "after" undertaking the PhD)
• Measure the blood pressure of patient A, give him something (pharmaceutical,
exercise, Tilapia) to reduce his blood pressure, then measure the blood pressure of
patient A again. Repeat for patients B, C, D, ... In this case, the data of "Before" and
"After" are paired by patient
6. Example : Input
Let’s conduct the Paired sample t-test on following two variables, one is a time
dimension containing months and the other is a measure :
Month Value
January 90
February 95
March 80
April 78
May 75
June 70
Time dimension to divide
data into two groups
Dependent Variable
Let’s say, measure values before April belong to ‘before’ or ‘pre’ sample and from
April belong to ‘After’ or ‘post’ sample
7. Example : Output
Pre sample mean 55
Post sample mean 74.3
Mean Difference 19.3
P-value 0.041
At 95% confidence level (5% chance of error) :
As p-value = 0.041 which is less than 0.05, there is a statistically significant
difference between means of pre and post sample values
The treatment has been effective
At 98 % confidence level (2% chance of error) :
As p-value = 0.041 which is greater than 0.02, there is no statistically
significant difference between the means of pre and post samples
The treatment has not been effective
11. SAMPLE OUTPUT 3 :
OUTLIERS
DATA VALUES THAT
DIFFER GREATLY FROM
THE MAJORITY OF A
SET OF DATA
12. LIMITATIONS
Can be applied on only two
samples (One measure and one
time dimension or a sequence
ID to decide the cut point for
division of measure values into
pre and post samples)
Number of data points should
be at least 30
13. GENERAL APPLICATIONS
• Has the particular medicine or treatment been effective?
Medicine
• Has the sales increased post a particular campaign?
Marketing
• Has the cycle time reduced or defects reduced pre and post a particular process change
Manufacturing
• Has the transit time reduced from supplier to customer pre and post a route change
Logistics
14. Use case 1
Business benefit:
• Once the test is completed, p-
value is generated which
indicates whether there is
statistical difference between
cycle time of both time points
• Based on this value, a manager
can easily conclude whether
particular process change has
had a significant impact on cycle
time or not
Business problem :
• A manufacturing unit manager
want to know if there is a
statistically significant difference
in cycle time pre and post a
particular process change
• Here the dependent variable
would be ‘cycle time values’
15. Use case 1 : Input Dataset
Let’s say process change was in effect from date 16/8/17 to 19/8/17
Hence cycle time values for these dates would be considered as post sample
and measures from 12/8/17 to 15/8/17 would be pre samples as both samples should have
equal data points
Time point
Cycle time
(Minutes)
12/8/17 21000
13/8/17 15000
14/8/17 25600
15/8/17 23000
16/8/17 19750
17/8/17 25000
18/8/17 21250
19/8/17 14400
16. Use case 1 : Output
Cycle time
“Pre” sample mean cycle time 19444.44
“Post” sample mean cycle time 18080.0
Mean Difference 1364.44
P-value 0.27
P-value : 0.27 (< 0.05) indicates that there is no significant difference in
cycle time of both samples. And hence the process change has been
impactful.
As mean of post sample is < mean of pre sample, the process change
has reduced the cycle time significantly
17. Use case 2
Business benefit:
• Once the test is completed, p-
value is generated which
indicates whether there is a
statistical difference between
average daily sales- pre and post
an advertising campaign
• Based on this value, grocery store
manager can get to know if the
campaign has been effective
Business problem :
• A grocery store sales manager
wants to know whether daily
sales has increased post an
advertising campaign
• Here the dependent variable
would be ‘Daily sales’.
18. Use case 3
Business benefit:
• Once the test is completed, p-value
is generated which indicates
whether there is statistical
difference between cholesterol
levels of pre drug treatment and
post drug treatment groups.
• Also based on which group mean is
higher or lower, whether the drug
has lowered down the cholesterol
levels or not can be inferred.
Business problem :
• Suppose a medical researcher
decided to investigate whether a
particular drug treatment is
effective in lowering cholestrol
levels.
• There are two groups : cholesterol
levels of patients : before taking
drug and after taking drug
• Here the dependent variable would
be ‘Cholesterol levels’ .
19. Sampling Methods
• There are two main types of sampling :
• Simple random sampling:
• Here, the selection is purely based on a chance and every item has an equal chance
of getting selected
• Lottery system is an example of simple random sampling
• Stratified sampling:
• Here, the population data is divided into subgroups known as strata
• The members in each of the subgroup formed have similar attributes and
characteristics in terms of demographics, income, location etc.
• A random sample from each of these subgroups is taken in proportion to the
subgroup size relative to the population size
• These subsets of subgroups are then added to from a final stratified random sample
20. Want to Learn
More?
Get in touch with us @
support@Smarten.com
And Do Checkout the Learning section
on
Smarten.com
June 2018