This document provides an overview of analysis of variance (ANOVA). It begins by defining parametric tests and discussing the assumptions of ANOVA. The key ideas of ANOVA are introduced, including comparing the variance between groups to the variance within groups. Calculations for one-way ANOVA are demonstrated, including sums of squares, mean squares, and the F-statistic. Examples are provided to illustrate one-way ANOVA calculations and interpretations. Violations of assumptions and extensions to two-way ANOVA are also discussed.
Analysis of variance (ANOVA) everything you need to knowStat Analytica
Most of the students may struggle with the analysis of variance (ANOVA). Here in this presentation you can clear all your doubts in analysis of variance with suitable examples.
a full lecture presentation on ANOVA .
areas covered include;
a. definition and purpose of anova
b. one-way anova
c. factorial anova
d. mutiple anova
e MANOVA
f. POST-HOC TESTS - types
f. easy step by step process of calculating post hoc test.
Analysis of variance (ANOVA) everything you need to knowStat Analytica
Most of the students may struggle with the analysis of variance (ANOVA). Here in this presentation you can clear all your doubts in analysis of variance with suitable examples.
a full lecture presentation on ANOVA .
areas covered include;
a. definition and purpose of anova
b. one-way anova
c. factorial anova
d. mutiple anova
e MANOVA
f. POST-HOC TESTS - types
f. easy step by step process of calculating post hoc test.
This presentation contains information about Mann Whitney U test, what is it, when to use it and how to use it. I have also put an example so that it may help you to easily understand it.
Today’s overwhelming number of techniques applicable to data analysis makes it extremely difficult to define the most beneficial approach while considering all the significant variables.
The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic factors have a statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study.
Sir Ronald Fisher pioneered the development of ANOVA for analyzing results of agricultural experiments.1 Today, ANOVA is included in almost every statistical package, which makes it accessible to investigators in all experimental sciences. It is easy to input a data set and run a simple ANOVA, but it is challenging to choose the appropriate ANOVA for different experimental designs, to examine whether data adhere to the modeling assumptions, and to interpret the results correctly. The purpose of this report, together with the next 2 articles in the Statistical Primer for Cardiovascular Research series, is to enhance understanding of ANVOA and to promote its successful use in experimental cardiovascular research. My colleagues and I attempt to accomplish those goals through examples and explanation, while keeping within reason the burden of notation, technical jargon, and mathematical equations.
This presentation contains information about Mann Whitney U test, what is it, when to use it and how to use it. I have also put an example so that it may help you to easily understand it.
Today’s overwhelming number of techniques applicable to data analysis makes it extremely difficult to define the most beneficial approach while considering all the significant variables.
The analysis of variance has been studied from several approaches, the most common of which uses a linear model that relates the response to the treatments and blocks. Note that the model is linear in parameters but may be nonlinear across factor levels. Interpretation is easy when data is balanced across factors but much deeper understanding is needed for unbalanced data.
Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures (such as the "variation" among and between groups) used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors. The systematic factors have a statistical influence on the given data set, while the random factors do not. Analysts use the ANOVA test to determine the influence that independent variables have on the dependent variable in a regression study.
Sir Ronald Fisher pioneered the development of ANOVA for analyzing results of agricultural experiments.1 Today, ANOVA is included in almost every statistical package, which makes it accessible to investigators in all experimental sciences. It is easy to input a data set and run a simple ANOVA, but it is challenging to choose the appropriate ANOVA for different experimental designs, to examine whether data adhere to the modeling assumptions, and to interpret the results correctly. The purpose of this report, together with the next 2 articles in the Statistical Primer for Cardiovascular Research series, is to enhance understanding of ANVOA and to promote its successful use in experimental cardiovascular research. My colleagues and I attempt to accomplish those goals through examples and explanation, while keeping within reason the burden of notation, technical jargon, and mathematical equations.
These are slides I use when teaching my second year undergraduate statistics course. They are designed more for conceptual understanding, and do not have syntax for programs like SPSS or R. So it is a more conceptual and mathematical review, rather than a "how-to" computer guide.
A TOPIC WHICH IS RELATED TO NURSING RESEARCH AND EFFECTIVE TO DONE AND COMPLETE STUDY SO I WAS TRYING MY BEST TO INVOLVE THIS SO MY FRIENDS AND OTHER ONE COME TO WHY IS THIS MUST
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
Show drafts
volume_up
Empowering the Data Analytics Ecosystem: A Laser Focus on Value
The data analytics ecosystem thrives when every component functions at its peak, unlocking the true potential of data. Here's a laser focus on key areas for an empowered ecosystem:
1. Democratize Access, Not Data:
Granular Access Controls: Provide users with self-service tools tailored to their specific needs, preventing data overload and misuse.
Data Catalogs: Implement robust data catalogs for easy discovery and understanding of available data sources.
2. Foster Collaboration with Clear Roles:
Data Mesh Architecture: Break down data silos by creating a distributed data ownership model with clear ownership and responsibilities.
Collaborative Workspaces: Utilize interactive platforms where data scientists, analysts, and domain experts can work seamlessly together.
3. Leverage Advanced Analytics Strategically:
AI-powered Automation: Automate repetitive tasks like data cleaning and feature engineering, freeing up data talent for higher-level analysis.
Right-Tool Selection: Strategically choose the most effective advanced analytics techniques (e.g., AI, ML) based on specific business problems.
4. Prioritize Data Quality with Automation:
Automated Data Validation: Implement automated data quality checks to identify and rectify errors at the source, minimizing downstream issues.
Data Lineage Tracking: Track the flow of data throughout the ecosystem, ensuring transparency and facilitating root cause analysis for errors.
5. Cultivate a Data-Driven Mindset:
Metrics-Driven Performance Management: Align KPIs and performance metrics with data-driven insights to ensure actionable decision making.
Data Storytelling Workshops: Equip stakeholders with the skills to translate complex data findings into compelling narratives that drive action.
Benefits of a Precise Ecosystem:
Sharpened Focus: Precise access and clear roles ensure everyone works with the most relevant data, maximizing efficiency.
Actionable Insights: Strategic analytics and automated quality checks lead to more reliable and actionable data insights.
Continuous Improvement: Data-driven performance management fosters a culture of learning and continuous improvement.
Sustainable Growth: Empowered by data, organizations can make informed decisions to drive sustainable growth and innovation.
By focusing on these precise actions, organizations can create an empowered data analytics ecosystem that delivers real value by driving data-driven decisions and maximizing the return on their data investment.
As Europe's leading economic powerhouse and the fourth-largest hashtag#economy globally, Germany stands at the forefront of innovation and industrial might. Renowned for its precision engineering and high-tech sectors, Germany's economic structure is heavily supported by a robust service industry, accounting for approximately 68% of its GDP. This economic clout and strategic geopolitical stance position Germany as a focal point in the global cyber threat landscape.
In the face of escalating global tensions, particularly those emanating from geopolitical disputes with nations like hashtag#Russia and hashtag#China, hashtag#Germany has witnessed a significant uptick in targeted cyber operations. Our analysis indicates a marked increase in hashtag#cyberattack sophistication aimed at critical infrastructure and key industrial sectors. These attacks range from ransomware campaigns to hashtag#AdvancedPersistentThreats (hashtag#APTs), threatening national security and business integrity.
🔑 Key findings include:
🔍 Increased frequency and complexity of cyber threats.
🔍 Escalation of state-sponsored and criminally motivated cyber operations.
🔍 Active dark web exchanges of malicious tools and tactics.
Our comprehensive report delves into these challenges, using a blend of open-source and proprietary data collection techniques. By monitoring activity on critical networks and analyzing attack patterns, our team provides a detailed overview of the threats facing German entities.
This report aims to equip stakeholders across public and private sectors with the knowledge to enhance their defensive strategies, reduce exposure to cyber risks, and reinforce Germany's resilience against cyber threats.
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Opendatabay - Open Data Marketplace.pptxOpendatabay
Opendatabay.com unlocks the power of data for everyone. Open Data Marketplace fosters a collaborative hub for data enthusiasts to explore, share, and contribute to a vast collection of datasets.
First ever open hub for data enthusiasts to collaborate and innovate. A platform to explore, share, and contribute to a vast collection of datasets. Through robust quality control and innovative technologies like blockchain verification, opendatabay ensures the authenticity and reliability of datasets, empowering users to make data-driven decisions with confidence. Leverage cutting-edge AI technologies to enhance the data exploration, analysis, and discovery experience.
From intelligent search and recommendations to automated data productisation and quotation, Opendatabay AI-driven features streamline the data workflow. Finding the data you need shouldn't be a complex. Opendatabay simplifies the data acquisition process with an intuitive interface and robust search tools. Effortlessly explore, discover, and access the data you need, allowing you to focus on extracting valuable insights. Opendatabay breaks new ground with a dedicated, AI-generated, synthetic datasets.
Leverage these privacy-preserving datasets for training and testing AI models without compromising sensitive information. Opendatabay prioritizes transparency by providing detailed metadata, provenance information, and usage guidelines for each dataset, ensuring users have a comprehensive understanding of the data they're working with. By leveraging a powerful combination of distributed ledger technology and rigorous third-party audits Opendatabay ensures the authenticity and reliability of every dataset. Security is at the core of Opendatabay. Marketplace implements stringent security measures, including encryption, access controls, and regular vulnerability assessments, to safeguard your data and protect your privacy.
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...John Andrews
SlideShare Description for "Chatty Kathy - UNC Bootcamp Final Project Presentation"
Title: Chatty Kathy: Enhancing Physical Activity Among Older Adults
Description:
Discover how Chatty Kathy, an innovative project developed at the UNC Bootcamp, aims to tackle the challenge of low physical activity among older adults. Our AI-driven solution uses peer interaction to boost and sustain exercise levels, significantly improving health outcomes. This presentation covers our problem statement, the rationale behind Chatty Kathy, synthetic data and persona creation, model performance metrics, a visual demonstration of the project, and potential future developments. Join us for an insightful Q&A session to explore the potential of this groundbreaking project.
Project Team: Jay Requarth, Jana Avery, John Andrews, Dr. Dick Davis II, Nee Buntoum, Nam Yeongjin & Mat Nicholas
Techniques to optimize the pagerank algorithm usually fall in two categories. One is to try reducing the work per iteration, and the other is to try reducing the number of iterations. These goals are often at odds with one another. Skipping computation on vertices which have already converged has the potential to save iteration time. Skipping in-identical vertices, with the same in-links, helps reduce duplicate computations and thus could help reduce iteration time. Road networks often have chains which can be short-circuited before pagerank computation to improve performance. Final ranks of chain nodes can be easily calculated. This could reduce both the iteration time, and the number of iterations. If a graph has no dangling nodes, pagerank of each strongly connected component can be computed in topological order. This could help reduce the iteration time, no. of iterations, and also enable multi-iteration concurrency in pagerank computation. The combination of all of the above methods is the STICD algorithm. [sticd] For dynamic graphs, unchanged components whose ranks are unaffected can be skipped altogether.
1. ANALYSIS OF VARIANCE
(ANOVA)
DR RAVI ROHILLA
COMMUNITY MEDICINE
PT. B.D. SHARMA PGIMS, ROHTAK
1
2. Contents
Parametric tests
Difference b/w parametric & non-parametric tests
Introduction of ANOVA
Defining the Hypothesis
Rationale for ANOVA
Basic ANOVA situation and assumptions
Methodology for Calculations
F- distribution
Example of 1 way ANOVA
2
3. Contents
Violation of assumptions
Two way ANOVA
Null hypothesis and data layout
Methodology for calculations and example
Comparisons in ANOVA
ANOVA with repeated measures
Assumptions and example
MANOVA
Assumptions and example
Other tests of interest and References
3
4. PARAMETRIC TESTS
Parameter- summary value that describes the
population such as mean, variance, correlation
coefficient, proportion etc.
Parametric test- population constants as described
above are used such as mean, variances etc. and
data tend to follow one assumed or established
distribution such as normal, binomial, poisson etc.
4
5. Parametric vs. non-parametric tests
Choosing parametric
test
Choosing non-parametric
test
Correlation test Pearson Spearman
Independent measures,
2 groups
Independent-measures
t-test
Mann-Whitney test
Independent measures,
>2 groups
One-way, independent-measures
ANOVA
Kruskal-Wallis test
Repeated measures, 2
conditions
Matched-pair t-test Wilcoxon test
Repeated measures, >2
conditions
One-way, repeated
measures ANOVA
Friedman's test
5
6. (ANalysis Of VAriance)
• Idea: For two or more groups, test difference
between means, for quantitative normally
distributed variables.
• Just an extension of the t-test (an ANOVA with only
two groups is mathematically equivalent to a t-test).
6
7. EXAMPLE
Question 1.
Marks of 8th class students of two schools are given. Find
whether the scores are differing significantly with each other.
A 45 35 45 46 48 41 42 39 49
B 49 47 36 48 42 38 41 42 45
ANSWER: The test applicable here is T-test since there
are two groups involved here.
7
8. EXAMPLE
Question 2.
Marks of 8th class students of four schools are given. Find
whether the scores are differing significantly with each other.
A 45 35 45 46 48 41 42 39 49
B 49 47 36 48 42 38 41 42 45
C 43 45 42 37 39 40 41 35 47
D 34 48 47 42 36 41 45 42 48
ANSWER: The test applicable here is ANOVA since there
are more than two groups(4) involved here.
8
9. Why Not Use t-test Repeatedly?
• The t-test can only be used to test differences
between two means.
• Conducting multiple t-tests can lead to severe
inflation of the Type I error rate (false positives) and
is NOT RECOMMENDED
• ANOVA is used to test for differences among several
means without increasing the Type I error rate
9
10. Defining the hypothesis
10
The Null & Alternate hypotheses for one-way
ANOVA are:
H0 1 2... i
Ha not all of the i are equal
11. Rationale of the test
• Null Hypothesis states that all groups come from the
same population; have the same means!
• But do all groups have the same population mean??
• We need to compare the sample means.
We compare the variation among (between) the
means of several groups with the variation within
groups
11
12. 30
25
20
19
12
10
9
7
A small variability within
the samples makes it easier
to draw a conclusion about the
population means.
1
Treatment 1 Treatment 2Treatment 3
20
16
15
14
11
10
9
10 x1
x2 15
x3 20
TreatmentT 1reatmentT 2reatment 3
x110
x2 15
20 x3
The sample means are the same as before,
but the larger within-sample variability
makes it harder to draw a conclusion
about the population means.
12
FIGURE 1 FIGURE 2
14. Rationale of the test
• Clearly, we can conclude that the groups appear
most different or distinct in the 1st figure. Why?
• Because there is relatively little overlap between the
bell-shaped curves. In the high variability case, the
group difference appears least striking because the
bell-shaped distributions overlap so much.
14
15. Rationale of the test
• This leads us to a very important conclusion: when
we are looking at the differences between scores for
two or more groups, we have to judge the difference
between their means relative to the spread or
variability of their scores.
15
16. Types of variability
Two types of variability are employed when
testing for the equality of the means:
17
Between Group variability
&
Within Group variability
17. To distinguish between the groups, the variability
between (or among) the groups must be greater
than the variability within the groups.
If the within-groups variability is large compared with
the between-groups variability, any difference
between the groups is difficult to detect.
To determine whether or not the group means are
significantly different, the variability between groups
and the variability within groups are compared
18
18. The basic ANOVA situation
• Two variables: 1 Categorical, 1 Quantitative
• Main Question: Whether there are any significant
differences between the means of three or more
independent (unrelated) groups?
19
19. Assumptions
• Normality: the values within each group are
normally distributed.
• Homogeneity of variances: the variance within each
group should be equal for all groups.
• Independence of error: The error(variation of each
value around its own group mean) should be
independent of each value.
20
20. ANOVA Calculations
Sum of squares represent variation present in the data. They are calculated
by summing squared deviations. The simple ANOVA design have 3 sums
of squares.
2 ( ) TOT i G SS X X
The total sum of squares comes from the
distance of all the scores from the grand
mean.
2 ( ) W i A SS X X
The within-group or within-cell sum of
squares comes from the distance of the
observations to the cell means. This
indicates error.
2 ( ) B A A G SS N X X
The between-cells or between-groups
sum of squares tells of the distance of
the cell means from the grand mean.
This indicates IV effects.
TOT B W SS SS SS 21
21. Calculating variance between groups
1. Calculate the mean of each sample.
2. Calculate the Grand average
3. Take the difference between the means of the
nx n x ... n x
various samples and the grand average.
1 1 2 2 k k
n n ... n
X
4. Square these deviations and obtain the total which
1 2 k
will give sum of squares between the samples
5. Divide the total obtained in step 4 by the degrees of
2
i SST n (x x) i
freedom to calculate the mean squares for
treatment(MST).
i
22
22. Calculating Variance within groups
1. Calculate the mean value of each sample
2. Take the deviations of the various items in a sample
from the mean values of the respective samples.
3. Square these deviations and obtain the total which
gives the sum of square within the samples
4. Divide the total obtained in 3rd step by the degrees
( )2
of freedom to SSE calculate the x mean x
i squares for
ij error(MSE).
ij
23
23. The mean sum of squares
• To perform the test we need to calculate the
mean squares as follows
SST
1
k
MST
SSE
n k
MSE
Calculation of
MST-Mean Square
for Treatments
Calculation of MSE
Mean Square for Error
24
24. ANOVA: one way classification model
Source of
Variation
SS (Sum of
Squares)
Degrees of
Freedom
MS Mean
Square
Variance
Ratio of F
Between
Samples
Sum of squares
between samples
SSB/SST k-1 MST=
SST/(k-1)
MST/MSE
Within
Samples
Sum of squares
within samples
Total sum of
square of
variations
SSW/SSE n-k MSE=
SSE/(n-k)
Total SS(Total) n-1
k= No of Groups, n= Total No of observations 25
25. The F-distribution
The F distribution is the probability
distribution associated with the f statistic
The F-test tests the hypothesis that two
variances are equal.
26
27. Calculation of the test statistic
Variability between groups
Variability within groups
F
with the following degrees of freedom:
v1=k -1 and v2=n-k
28
F- statistic =
푀푆푇
푀푆퐸
We compare the F-statistic value with F(critical) value
which is obtained by looking for it in F distribution tables
against degrees of freedom respectively.
29. Group1 Group 2 Group3 Group 4
60 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
62.0 59.7 56.3 61.4
Step 1) calculate the sum
of squares between
groups:
Grand mean= 59.85
SSB = [(62-59.85)2 +
(59.7-59.85)2 + (56.3-
59.85)2 + (61.4-59.85)2]
x n per group =
19.65x10 = 196.5
Mean
DEGREES OF
FREEDOM(df) are
k-1 = 4-1 = 3
30
30. Step 2) calculate the sum
of squares within groups:
SSW=(60-62) 2+(67-62)
2+ … + (50-59.7) 2+ (52-
59.7) 2+ …+(48-56.3)2
+(49-56.3)2 +…(sum of
40 squared deviations)
= 2060.6
Group1 Group 2 Group3 Group 4
60 50 48 47
67 52 49 67
42 43 50 54
67 67 55 67
56 67 56 68
62 59 61 65
64 67 61 65
59 64 60 56
72 63 59 60
71 65 64 65
Mean 62.0 59.7 56.3 61.4
DEGREES OF
FREEDOM(df) are
N-k = 40-4 = 36
31
31. RESULTS
Source of
variation
df Sum of
Squares
Mean Sum
of Squares
F-Statistic P-value
Between
(Treatment
effect)
3 196.5 65.5 1.14 .344
Within
(Error)
36 2060.6 57.2 - -
Total 39 2257.1 - - -
32
F(critical)
= 2.866
32. Violations of Assumptions
Testing for Normality
• Each of the populations being compared should
follow a normal distribution. This can be tested using
various normality tests, such as:
Shapiro-Wilk test or
Kolmogorov–Smirnov test or
Assessed graphically using a normal quantile plot or
Histogram.
34
33. Remedies for Non- normal data
• Transform your data using various algorithms so that
the shape of your distributions become normally
distributed . Common transformation used are:
Logarithm
Square root and
Multiplicative inverse
• Choose the non-parametric Kruskal-Wallis H Test
which does not require the assumption of normality.
35
34. Testing for variances
• The populations being compared should have the
same variance. Tested by using
Levene's test,
Modified Levene's test
Bartlett's test
36
35. Remedies for heterogenous data
• Two tests that we can run when the assumption of
homogeneity of variances has been violated are:
Welch test or
Brown and Forsythe test.
• Alternatively, we can run a Kruskal-Wallis H Test. But
for most situations it has been shown that the Welsh
test is best.
37
36. Two-Way ANOVA
• One dependent variable (quantitative variable),
• Two independent variables (classifying variables =
factors)
• Key Advantages
– Compare relative influences on DV
– Examine interactions between IV
38
37. Example
IV#1 IV#2 DV
– Drug Level Age of Patient Anxiety Level
– Type of Therapy Length of Therapy Anxiety Level
– Type of Exercise Type of Diet Weight
Change
39
38. Null hypothesis
The two-way ANOVA include tests of three
null hypotheses:
That the means of observations grouped by one
factor are the same;
That the means of observations grouped by the
other factor are the same; and
That there is no interaction between the two
factors. The interaction test tells you whether
the effects of one factor depend on the other
factor.
40
39. Two-Way ANOVA Data Layout
Observation k
in each cell
Xijk
Level i
Factor
A
Level j
Factor
B
Factor Factor B
A 1 2 ... b
1
X111 X121 ... X1b1
X11n X12n ... X1bn
2
X211 X221 ... X2b1
X21n X22n ... X2bn
: : : : :
a Xa11 Xa21 ... Xab1
Xa1n Xa2n ... Xabn
i = 1,…,a
j = 1,…,b
k = 1,…,n
There are a X b treatment combinations
41
40. Formula for calculations
• Just as we had Sums of Squares and Mean
Squares in One-way ANOVA, we have the
same in Two-way ANOVA.
• In balanced Two-way ANOVA, we measure
the overall variability in the data by:
a
2
SS X X df N
( ) 1
T ijk
1 1 1
i
b
j
n
k
42
41. Formula for calculations
Sum of Squares for factor A
2
SS X X bn X X df a
( ) ( ) 1
1
2
A i
1 1 1
a
i
i
a
i
b
j
n
k
Sum of Squares for factor B
a
2 2 ( ) ( ) 1
B j j SS X X an X X df b
i
b
j
n
k
b
j
1 1 1 1
43
42. Formula for calculations
Interaction Sum of Squares
a
AB ij i j SS X X X X df a b
i
b
j
n
k
1 1 1
2 ( ) ( 1)( 1)
Measures the variation in the response due to the interaction
between factors A and B.
Error or Residual Sum of Squares
a
E ijk ij SS X X df ab n
i
b
j
n
k
1 1 1
2 ( ) ( 1)
Measures the variation in the response within the a x b factor
combinations.
44
43. Formula for calculations
• So the Two-way ANOVA Identity is:
T A B AB E SS SS SS SS SS
• This partitions the Total Sum of Squares
into four pieces of interest for our
hypotheses to be tested.
45
44. Two-way ANOVA Table
Source of
Variation
Degrees of
Freedom
Sum of
Squares
Mean
Square F-ratio
P-value
Factor A a 1 SSA MSA FA = MSA / MSE Tail area
Factor B b 1 SSB MSB FB = MSB / MSE Tail area
Interaction (a – 1)(b – 1) SSAB MSAB FAB = MSAB /
MSE
Tail area
Error ab(n – 1) SSE MSE
Total abn 1 SST
46
45. WHAT AFTER ANOVA RESULTS??
• ANOVA test tells us whether we have an
overall difference between our groups but it
does not tell us which specific groups differed.
• Two possibilities are then available:
For specific predictions k/a priori tests(contrasts)
For predictions after the test k/a post-hoc
comparisons.
47
46. CONTRASTS
• Known as priori or planned comparisons
– Used when a researcher plans to compare specific group
means prior to collecting the data or
– Decides to compare group means after the data has been
collected and noticing that some of the means appears to
be different.
– This can be tested, even when the H0 cannot be rejected.
– Bonferroni t procedure(referred as Dunn’s test) is used.
48
47. Post-hoc Tests
• Post-hoc tests provide solution to this and therefore
should only be run when we have an overall
significant difference in group means.
• Post-hoc tests are termed a posteriori tests - that is,
performed after the event.
– Tukey’s HSD Procedure
– Scheffe’s Procedure
– Newman-Keuls Procedure
– Dunnett’s Procedure
49
48. Example in SPSS
• A clinical psychologist is interested in comparing the relative
effectiveness of three forms of psychotherapy for alleviating
depression. Fifteen individuals are randomly assigned to each
of three treatment groups: cognitive-behavioral, Rogerian,
and assertiveness training. The Depression Scale of MMPI
serves as the response. The psychologist also wished to
incorporate information about the patient’s severity of
depression, so all subjects in the study were classified as
having mild, moderate, or severe depression.
50
49. ANOVA with Repeated Measures
• An ANOVA with repeated measures is for comparing
three or more group means where the participants
are the same in each group.
• This usually occurs in two situations –
– when participants are measured multiple times to see
changes to an intervention or
– when participants are subjected to more than one
condition/trial and the response to each of these
conditions wants to be compared
51
50. Assumptions
• The dependent variable is interval or ratio
(continuous).
• Dependent variable is approximately normally
distributed.
• Sphericity
• One independent variable where participants are
tested on the same dependent variable at least 2
times.
Sphericity is the condition where the variances of the differences between all
combinations of related groups (levels) are equal.
52
51. Sphericity violation
• Sphericity can be likened to homogeneity of
variances in a between-subjects ANOVA.
• The violation of sphericity is serious for the Repeated
Measures ANOVA, with violation causing the test to
have an increase in the Type I error rate).
• Mauchly's Test of Sphericity tests the assumption of
sphericity.
53
52. Sphericity violation
• The corrections employed to combat the violation of
the assumption of sphericity are:
Lower-bound estimate,
Greenhouse-Geisser correction and
Huynh-Feldt correction.
• The corrections are applied to the degrees of
freedom (df) such that a valid critical F-value can be
obtained.
54
53. Problems
• In a 6-month exercise training program with 20
participants, a researcher measured CRP levels of the
subjects before training, 2 weeks into training and
post-6-months-training.
• The researcher wished to know whether protection
against heart disease might be afforded by exercise
and whether this protection might be gained over a
short period of time or whether it took longer.
55
54. MANOVA
• MANOVA is a procedure used to test the significance
of the effects of one or more IVs on two or more
DVs.
• MANOVA can be viewed as an extension of ANOVA
with the key difference that we are dealing with
many dependent variables (not a single DV as in the
case of ANOVA)
56
55. Data requirements
• Dependent Variables
– Interval (or ratio) level variables
– May be correlated
– Multivariate normality
– Homogeneity of variance
• Independent Variable(s)
– Nominal level variable(s)
– At least two groups with each independent variable
– Each independent variable should be independent of
each other
57
56. Various tests to use
Wilk's Lambda
Widely used; good balance between power and
assumptions
Pillai's Trace
Useful when sample sizes are small, cell sizes are unequal,
or covariances are not homogeneous
Hotelling's (Lawley-Hotelling) Trace
Useful when examining differences between two groups
58
57. Results
• The result of a MANOVA simply tells us that a
difference exists (or not) across groups.
• It does not tell us which treatment(s) differ or what is
contributing to the differences.
• For such information, we need to run ANOVAs with
post hoc tests.
59
58. Example
• A high school takes its intake from three different
primary schools. A teacher was concerned that, even
after a few years, there were academic differences
between the pupils from the different schools. As
such, she randomly selected 20 pupils from each
School and measured their academic performance by
end-of-year English and Maths exams.
60
INDEPENDENT VARIABLE is Gender with
male and female categories
DEPENDENT VARIABLE are English and math
scores
59. Other tests of Interest
• ANCOVA
Analysis of Covariance
This test is a blend of ANOVA and linear regression
MANCOVA
Multivariate analysis of covariance
One or more continous covariates present
61
60. References
• Wikepedia: Encyclopedia. Available from
URL:http://en.wikipedia.org/wiki/
• Methods in Biostatistics by BK Mahajan
• Statistical Methods by SP Gupta
• Basic & Clinical Biostatistics by Dawson and Beth
• Statistical Methods in Medical Research by
Armitage, Berry, Matthews
62