The document discusses several non-parametric tests that can be used as alternatives to parametric tests when the assumptions of parametric tests are violated. Specifically, it discusses:
1. The sign test and one sample median test, which can be used instead of t-tests when the data is skewed or not normally distributed.
2. Mood's median test, which compares the medians of two independent samples and is the nonparametric version of a one-way ANOVA.
3. The Kruskal-Wallis test, which determines if there are differences in medians across three or more groups and is the nonparametric version of a one-way ANOVA.
My attractive effective presentation is the proof of my hard work as i made it for those who can not take interest in their studies so as they can see this they will take interest too as well as for those who really want to do come thing different from others , they can use my presentation if any kind of help you want just mail me at ammara.aftab63@gmail.com
My attractive effective presentation is the proof of my hard work as i made it for those who can not take interest in their studies so as they can see this they will take interest too as well as for those who really want to do come thing different from others , they can use my presentation if any kind of help you want just mail me at ammara.aftab63@gmail.com
In Hypothesis testing parametric test is very important. in this ppt you can understand all types of parametric test with assumptions which covers Types of parametric, Z-test, T-test, ANOVA, F-test, Chi-Square test, Meaning of parametric, Fisher, one-sample z-test, Two-sample z-test, Analysis of Variance, two-way ANOVA.
Subscribe to Vision Academy for Video assistance
https://www.youtube.com/channel/UCjzpit_cXjdnzER_165mIiw
The Mann Witney U Test in statistics is related to a testing without considering any assumption as to the parameters of frequently distributed of a valueless hypothesis. It is similar to the value selected randomly from one sample, can be higher than or lesser than a value selected randomly from a second sample. Copy the link given below and paste it in new browser window to get more information on Mann Whitney U Test:- http://www.transtutors.com/homework-help/statistics/mann-whitney-u-test.aspx
Through this ppt you could learn what is Wilcoxon Signed Ranked Test. This will teach you the condition and criteria where it can be run and the way to use the test.
The test used to ascertain whether the difference between estimator & parameter or between two estimator are real or due to chance are called test of hypothesis.
T-test.
Chi-square (휒^2)- test.
F-Test.
ANOVA.
This presentation contains information about Mann Whitney U test, what is it, when to use it and how to use it. I have also put an example so that it may help you to easily understand it.
In Hypothesis testing parametric test is very important. in this ppt you can understand all types of parametric test with assumptions which covers Types of parametric, Z-test, T-test, ANOVA, F-test, Chi-Square test, Meaning of parametric, Fisher, one-sample z-test, Two-sample z-test, Analysis of Variance, two-way ANOVA.
Subscribe to Vision Academy for Video assistance
https://www.youtube.com/channel/UCjzpit_cXjdnzER_165mIiw
The Mann Witney U Test in statistics is related to a testing without considering any assumption as to the parameters of frequently distributed of a valueless hypothesis. It is similar to the value selected randomly from one sample, can be higher than or lesser than a value selected randomly from a second sample. Copy the link given below and paste it in new browser window to get more information on Mann Whitney U Test:- http://www.transtutors.com/homework-help/statistics/mann-whitney-u-test.aspx
Through this ppt you could learn what is Wilcoxon Signed Ranked Test. This will teach you the condition and criteria where it can be run and the way to use the test.
The test used to ascertain whether the difference between estimator & parameter or between two estimator are real or due to chance are called test of hypothesis.
T-test.
Chi-square (휒^2)- test.
F-Test.
ANOVA.
This presentation contains information about Mann Whitney U test, what is it, when to use it and how to use it. I have also put an example so that it may help you to easily understand it.
univariate and bivariate analysis in spss Subodh Khanal
this slide will help to perform various tests in spss targeting univariate and bivariate analysis along with the way of entering and analyzing multiple responses.
linearity concept of significance, standard deviation, chi square test, stude...KavyasriPuttamreddy
Linearity concept of significance, standard deviation, chi square test, students T- test, ANOVA test , pharmaceutical science, statistical analysis, statistical methods, optimization technique, modern pharmaceutics, pharmaceutics, mpharm 1 unit i sem, 1 year m
pharm, applications of chi square test, application of standard deviation , pharmacy, method to compare dissolution profile, statistical analysis of dissolution profile, important statical analysis, m. pharmacy, graphical representation of standard deviation, graph of chi square test, graph of T test , graph of ANOVA test ,formulation of t test, formulation of chi square test, formula of standard deviation.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
StarCompliance is a leading firm specializing in the recovery of stolen cryptocurrency. Our comprehensive services are designed to assist individuals and organizations in navigating the complex process of fraud reporting, investigation, and fund recovery. We combine cutting-edge technology with expert legal support to provide a robust solution for victims of crypto theft.
Our Services Include:
Reporting to Tracking Authorities:
We immediately notify all relevant centralized exchanges (CEX), decentralized exchanges (DEX), and wallet providers about the stolen cryptocurrency. This ensures that the stolen assets are flagged as scam transactions, making it impossible for the thief to use them.
Assistance with Filing Police Reports:
We guide you through the process of filing a valid police report. Our support team provides detailed instructions on which police department to contact and helps you complete the necessary paperwork within the critical 72-hour window.
Launching the Refund Process:
Our team of experienced lawyers can initiate lawsuits on your behalf and represent you in various jurisdictions around the world. They work diligently to recover your stolen funds and ensure that justice is served.
At StarCompliance, we understand the urgency and stress involved in dealing with cryptocurrency theft. Our dedicated team works quickly and efficiently to provide you with the support and expertise needed to recover your assets. Trust us to be your partner in navigating the complexities of the crypto world and safeguarding your investments.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
2. Mean / Median
• The mean is a good measure of center when the data is bell-shaped,
but it is sensitive to outliers and extreme values.
• When the data is skewed, however, a better measure of center would
be the median.
• The median, is a resistant measure.
• In other words, we may want to consider a test for the median and
not the mean.
• In a skewed distribution, the population median, typically denoted
as η, is a better typical value than the population mean, μ.
3. Sign test
• It is a non-parametric or “distribution free” test, which means the test
doesn’t assume the data comes from a particular distribution.
• The sign test compares the sizes of two groups.
• The sign test is an alternative to a one sample t test or a paired t test.
• It can also be used for ordered data.
• The null hypothesis for the sign test is that the difference
between medians is zero. red (ranked) categorical data.
• This test is used when we are interested in testing the population
median and not the mean.
4. One sample median test
• The one sample median test checks whether or not there is
a significant difference between our hypothesized median and the
real median of a sample.
• We learned how to use a t-test for the difference between means of
dependent samples. That test required both populations to be
normally distributed.
• If the condition of normality cannot be satisfied, we can use the
paired-sample sign test to test the difference between two
population medians, the following conditions must be met.
• 1. A sample must be randomly selected from each population.
• 2. The samples must be dependent (paired).
5. • We find the difference between corresponding data entries by
subtracting the entry representing the second variable from the entry
representing the first variable, and record the sign of the difference.
• Then compare the number of + and – signs. (the 0s are ignored.)
6. Steps:-
• State the hypothesis
• Specify alpha
• Specify sample size
• Find critical value – from t-table or z-table
• Find test statistic
• Make decision
• Interpret
7. Test statistic
• When n<=25 , test statistic is smaller no of positive or negative sign.
• When n>25 , test statistic is calculated from formula :-
• z=((x+0.5)+0.5n)/sqrt(n)/2
• Where x=smaller no of sign and n=total no of positive and negative
sign.
8. Example :- Sand C represent two tasks, S the spelling of 25 words presented separately, and C the
spelling of 25 words of equal difficulty presented as an integral part of a sentence (i.e., in context). A
teacher wants to know which condition is favorable to higher scores. Test the hypothesis that C is better
than S.
9. • Of the 10 differences, 7 are plus (C higher than S), 2 are minus (S
higher than C) and one is zero. Excluding the 0 as being neither +nor
- , we have 9 differences of which 7 are plus.
• Let alpha = 0.05 and N = 9 . It’s a left tailed test. Critical value- 1.860
(from t-table)
• Test statistics = 2
• Since test statistic is greater than the critical value , we fail to reject the
null hypothesis.
10. Ex
• A college statistics professor claims that the median test score for his
students is 58. The scores of 18 randomly selected tests are listed
below. At alpha=0.01, can you reject the professors claim?
• 58 62 55 55 53 52 52 59 55 55 60 56 57 61 58 63 63 55
11. Paired/Matched sample Sign test
• Assumptions for the test (your data should meet these requirements
before running the test) are:
• The data should be from two samples.
• The two dependent samples should be paired or matched. For example,
depression scores from before a medical procedure and after.
• Example:-
• This set of data represents test scores at the end of Spring and the
beginning of the Fall semesters.
• The hypothesis is that the summer break means a significant drop in test
scores.
12. • H0: No difference in median of the signed differences.
• H1: Median of the signed differences is less than zero.
13. • H0: No difference in median of the signed differences.
• H1: Median of the signed differences is less than zero.
• Count the number of positives and negatives.
• 4 positives.
• 12 negatives.
• Add up the number of items in your sample and subtract any you had
a difference of zero for (in column 3). The sample size in this question
was 17, with one zero, so n = 16.
• Let alpha = 0.05 and N = 16 . Critical value- 2.120 (from t-table)
• Test statistics = 4
• Since test statistic is greater than the critical value , we fail to reject the
null hypothesis.
14. Example:
A new chemotherapy treatment is proposed for patients with breast cancer. Investigators are
concerned with patient's ability to tolerate the treatment and assess their quality of life both before
and after receiving the new chemotherapy treatment. Quality of life (QOL) is measured on an
ordinal scale and for analysis purposes, numbers are assigned to each response category as follows:
1=Poor, 2= Fair, 3=Good, 4= Very Good, 5 = Excellent. The data are shown below.
Patient QOL Before
Chemother
apy
Treatment
QOL After
Chemother
apy
Treatment
Difference Sign
1 3 2 1 +
2 2 3 -1 -
3 3 4 -1 -
4 2 4 -2 -
5 1 1 0 NA
6 3 4 -1 -
7 2 4 -2 -
8 3 3 0 NA
9 2 1 1 +
10 1 3 -2 -
11 3 4 -1 -
12 2 3 -1 -
H0- no difference in median of both the data values
Ha – there is a difference in the median of both the data
values
No of +ves- 2
No. of –ves = 8
N=10
Alpha= 0.05
Test statistics= 2
Critical value- 1.812
Conclusion:- test statistic > critical value
We accept the hypothesis that there is no difference in the
median of both the data values.
There was no significant change in the quality of life after
and before the chemotherapy treatment.
15. Mood’s Median Test
• Mood’s median test is used to compare the medians for two samples to
find out if they are different.
• For example, you might want to compare the median number
of positive calls to a hotline vs. the median number of negative comment
calls to find out if you’re getting significantly more negative comments than
positive comments (or vice versa).
• This test is the nonparametric alternative to a one way ANOVA;
Nonparametric means that you don’t have to know what distribution your
sample came from (i.e. a normal distribution) before running the test. That
said, your samples should have been drawn from distributions with the
same shape.
• Use this test instead of the sign test when you have two independent
samples. The test is a particular case of the chi-square test of dependence.
• The null hypothesis for this test is that the medians are the same for both
groups.
• The alternate hypothesis for the test is that the medians are different for
both groups.
16. • Step 1: Make a 2 x k contingency table, where k is the number of
samples.
• Step 2: Find M, the overall median for all the data in your samples. To
do this, list all of your data (from all samples) in a single set. Sort in
ascending order and then find the middle number.
• Step 3: List each individual sample’s data in ascending order. Count
how many data points are greater than M (from Step 2) and then
count how many data points are smaller than or equal to M. List
these in the first row of the contingency table.
• Step 4: Perform a chi-square test on the completed contingency table.
• Step 5: Compare the chi-square statistic to the table value
with: degrees of freedom = (number of rows – 1) * (number of
columns – 1).
17. Example
• Non parametric test - Mood's Median test for the following sets of
data :- (11,15,9,4,34,17,18,14,12,13,26,31)
(34,31,35,29,28,12,18,30,14,22,10,29 )
• Significance Level α=0.05 and One-tailed test
• Sol:- Step-1:Calculate total Median of combination of 2 samples
Sorting of combined samples
4,9,10,11,12,12,13,14,14,15,17,18,18,22,26,28,29,29,30,31,31,34,34,
35
n=24
Median =(12thterm+13thterm)/2=(18+18)/2=18
18. • Step-2:Create a 2×2 contingency table whose first row consists of the number of elements in each
sample that are greater than Median and second row consists of the number of elements in each
sample that are less than or equal to Median
Sample A Sample B Total
> Median 3 8 11
<= Median 9 4 13
Total 12 12 24
Step-3:Perform a chi-square test of independence.
State the hypothesis
H0: two categories variables are independent.
H1: two categories variables are not independent.
Observed Frequencies
B1 B2
Total
A1 3 8 11
A2 9 4 13
Total 12 12 24
19. • Expected Frequencies
• Compute Chi-square
• χ2=∑(Oij-Eij)2/Eij
=(3-5.5)2/5.5+(8-5.5)2/5.5+(9-6.5)2/6.5+(4-6.5)2/6.5
=6.25/5.5+6.25/5.5+6.25/6.5+6.25/6.5
=1.1364+1.1364+0.9615+0.9615
=4.1958
• Compute the degrees of freedom (df).
df=(2-1)⋅(2-1)=1
• for 1 df, p(χ2≥4.1958)=0.0405. Test statistic- 4.1958. Critical value- 6.314
Since the test statistic < critical value , we reject the null hypothesis H0.
B1
B2 Total
A1 5.5 5.5 11
A2 6.5 6.5 13
Total 12 12 24
20. Example
• A major wheat supplier from Texas analyzing the yields of various
crop methods. He randomly assigned two different wheat crop
methods to a very high number of different acres of farm land and
recorded the production rate (yield per acre) for each plot. We need
to find out difference between the two wheat crop methods.
21. Kruskal Wallis Test
• The Kruskal Wallis test is the non parametric alternative to the One Way ANOVA.
• The test determines whether the medians of two or more groups are different. Like most
statistical tests, you calculate a test statistic and compare it to a distribution cut-off point.
The test statistic used in this test is called the H statistic.
• The hypotheses for the test are:
• H0: population medians are equal.
• H1: population medians are not equal.
• The Kruskal Wallis test will tell you if there is a significant difference between groups.
However, it won’t tell you which groups are different.
• You want to find out how test anxiety affects actual test scores. The independent
variable “test anxiety” has three levels: no anxiety, low-medium anxiety and high anxiety.
The dependent variable is the exam score, rated from 0 to 100%.
• You want to find out how socioeconomic status affects attitude towards sales tax
increases. Your independent variable is “socioeconomic status” with three levels:
working class, middle class and wealthy. The dependent variable is measured on a 5-
point scale from strongly agree to strongly disagree.
22. • The H test is used when the assumptions for ANOVA aren’t met (like
the assumption of normality). It is sometimes called the one-way
ANOVA on ranks, as the ranks of the data values are used in the test
rather than the actual data points.
• Assumptions:-
• One independent variable with two or more levels (independent
groups). The test is more commonly used when you have three or
more levels.
• Ordinal scale, Ratio Scale or Interval scale dependent variables.
• Your observations should be independent. In other words, there
should be no relationship between the members in each group or
between groups.
• All groups should have the same shape distributions.
• It is used for comparing two or more independent samples of equal
or different sample sizes.
23. • The Kruskal-Wallis H Test is a nonparametric procedure that can be
used to compare more than two populations in a completely
randomized design.
• All n = n1+n2+…+nk measurements are jointly
• ranked (i.e.treat as one large sample).
• We use the sums of the ranks of the k samples to compare the
distributions.
24. • Rank the total measurements in all k samples from 1 to n. Tied
observations are assigned average of the ranks they would have gotten if
not tied.
• Calculate
T = rank sum for the ith sample
And the test statistic
i = 1, 2,…,k
3(n 1)
n(n 1) ni
12
2
T
i
H
25. H0: the k distributions are identical versus
Ha: at least one distribution is different Test
statistic: Kruskal-Wallis H
When H0 is true, the test statistic H has an
approximate chi-square distribution with df
= k-1.
Use a right-tailed rejection region or p-
value based on the Chi-square distribution.
26. Example
• A shoe company wants to know if three groups of workers have different salaries:
Women: 23K, 41K, 54K, 66K, 78K.
Men: 45K, 55K, 60K, 70K, 72K
Minorities: 18K, 30K, 34K, 40K, 44K.
• Sol:- Null Hypothesis H0 : All groups are equal
Alternative Hypothesis H1 : At least one group is not equal
• Step 1: Sort the data for all groups/samples into ascending order in one combined set.
20K
23K
30K
34K
40K
41K
44K
45K
54K
55K
60K
66K
70K
72K
90K
27. • Step 2: Assign ranks to the sorted data points. Give tied values the average
rank.
20K 1
23K 2
30K 3
34K 4
40K 5
41K 6
44K 7
45K 8
54K 9
55K 10
60K 11
66K 12
70K 13
72K 14
90K 15
28. • Step 3: Add up the different ranks for each group/sample.
Women: 23K, 41K, 54K, 66K, 90K = 2 + 6 + 9 + 12 + 15 = 44.
Men: 45K, 55K, 60K, 70K, 72K = 8 + 10 + 11 + 13 + 14 = 56.
Minorities: 20K, 30K, 34K, 40K, 44K = 1 + 3 + 4 + 5 + 7 = 20.
• Step 4: Calculate the H statistic: Where:
• n = sum of sample sizes for all samples,
• c = number of samples,
• Tj = sum of ranks in the jth sample,
• nj = size of the jth sample.
29. H = 6.72
Step 5: Find the critical chi-square value, with c-1 degrees of freedom. For 3 – 1
degrees of freedom and an alpha level of .05, the critical chi square value is
5.9915.
Step 6: Compare the H value from Step 4 to the critical chi-square value from
Step 5.
If the critical chi-square value is less than the H statistic, reject the null
hypothesis that the medians are equal.
If the chi-square value is not less than the H statistic, there is not enough
evidence to suggest that the medians are unequal.
In this case, 5.9915 is less than 6.72, so we can reject the null hypothesis.
30. • Perform Kruskal wallis test for the following data:-
8,5,7,11,9,6 – 25.5
10,12,11,9,13,12 - 64
11,14,10,16,17,12 – 87.5
18,20,16,15,14,22 - 123
• Significance Level α=0.05 and One-tailed test.
• 12/24*25[(25.52 + 642 + 87.52 + 1232 )/6] -3(24+1)
• H= 16.825
• Critical value = 7.815
31. Mann Whitney U Test
• The Mann-Whitney U test is the nonparametric equivalent of the two
sample t-test.
• The Mann Whitney U test, sometimes called the Mann Whitney Wilcoxon
Test or the Wilcoxon Rank Sum Test
• While the t-test makes an assumption about the distribution of
a population , the Mann Whitney U Test makes no such assumption.
• The test compares two populations.
• The null hypothesis is that the two samples come from the same
population (i.e. that they both have the same median).
• This test is often performed as a two-sided test and, thus, the research
hypothesis indicates that the populations are not equal as opposed to
specifying directionality.
• A one-sided research hypothesis is used if interest lies in detecting a
positive or negative shift in one population as compared to the other.
32. • Assumptions for the Mann Whitney U Test
• The dependent variable should be measured on an ordinal scale or a
continuous scale.
• The independent variable should be two independent, categorical
groups.
• Observations should be independent. In other words, there should be
no relationship between the two groups or within each group.
• Observations are not normally distributed. However, they should
follow the same shape (i.e. both are bell-shaped and skewed left).
• The result of performing a Mann Whitney U Test is a U Statistic.
• For small samples, use the direct method (see below) to find the U
statistic;
• For larger samples, a formula is necessary.
33. Formula
Either of these two formulas are valid for the Mann
Whitney U Test.
R is the sum of ranks in the sample, and n is the number
of items in the sample.
34. Consider a Phase II clinical trial designed to investigate the effectiveness of a new drug to reduce symptoms
of asthma in children. A total of n=10 participants are randomized to receive either the new drug or a
placebo. Participants are asked to record the number of episodes of shortness of breath over a 1 week
period following receipt of the assigned treatment. The data are shown below.
Placebo 7 5 6 4 12
New
Drug
3 6 4 2 1
Is there a difference in the number of episodes of shortness of breath over a 1 week period in
participants receiving the new drug as compared to those receiving the placebo?
SOL:- In this example, the outcome is a count and in this sample the data do not follow a normal
distribution. In addition, the sample size is small (n1=n2=5), so a nonparametric test is appropriate. The
hypothesis is given below, and we run the test at the 5% level of significance (i.e., α=0.05).
H0: The two populations are equal versus
H1: The two populations are not equal.
The first step is to assign ranks and to do so we order the data from smallest to largest. This is done on the
combined or total sample (i.e., pooling the data from the two treatment groups (n=10)), and assigning ranks
from 1 to 10, as follows.
36. • We produce a test statistic based on the ranks.
• First, we sum the ranks in each group. In the placebo group, the sum
of the ranks is 37; in the new drug group, the sum of the ranks is 18.
Recall that the sum of the ranks will always equal n(n+1)/2. As a check
on our assignment of ranks, we have n(n+1)/2 = 10(11)/2=55 which is
equal to 37+18 = 55.
• For the test, we call the placebo group 1 and the new drug group 2
• We let R1 denote the sum of the ranks in group 1 (i.e., R1=37), and
R2denote the sum of the ranks in group 2 (i.e., R2=18).
• The test statistic for the Mann Whitney U Test is denoted U and is
the smaller of U1 and U2.
37. • In every test, we must determine whether the observed U supports
the null or research hypothesis.
• We determine a critical value of U such that if the observed value of U
is less than or equal to the critical value, we reject H0 in favor of
H1 and if the observed value of U exceeds the critical value we do not
reject H0.
• To determine the appropriate critical value we need sample sizes (for
Example: n1=n2=5) and our two-sided level of significance (α=0.05)
• The critical value is 2, and the decision rule is to reject H0 if U < 2. We
do not reject H0 because 3 > 2. We do not have statistically significant
evidence at α =0.05, to show that the two populations of numbers of
episodes of shortness of breath are not equal.
• To be significant, our obtained U has to be equal to or LESS than this
critical value.
38. • A new approach to prenatal care is proposed for pregnant women living in a rural
community. The new program involves in-home visits during the course of
pregnancy in addition to the usual or regularly scheduled visits. A pilot
randomized trial with 15 pregnant women is designed to evaluate whether
women who participate in the program deliver healthier babies than women
receiving usual care. The outcome is the APGAR score measured 5 minutes after
birth. Recall that APGAR scores range from 0 to 10 with scores of 7 or higher
considered normal (healthy), 4-6 low and 0-3 critically low. The data are shown
below.
Usual
Care
8 7 6 2 5 8 7 3
New
Program
9 9 7 8 10 9 6
Is there statistical evidence of a difference in APGAR scores in women receiving
the new and enhanced versus usual prenatal care?