A chi-squared test is a statistical hypothesis test that is valid to perform when the test statistic is chi-squared distributed under the null hypothesis, specifically Pearson's chi-squared test and variants
differences between the observed values
Categorical Data and Statistical AnalysisMichael770443
In this presentation, we will introduce two tests and hypothesis testing based on it, and different non-parametric methods such as the Kolmogorov-Smirnov test, the Wilcoxon’s signed-rank test, the Mann-Whitney U test, and the Kruskal-Wallis test.
This is the information about biostatistics and there are various test which are performed in the laboratory to the field. these tests are f test chi square test etc. on the basis of these data we confirmed probability and calculation of variability. here is the whole information about the chi square test
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.2: Contingency Tables
Primary vesicoureteral reflux (VUR) is the commonest congenital urological abnormalities in children, which has been associated with an increased risk of urinary tract infection (UTI) and renal scarring, also called reflux nephropathy (RN).
While it is rare, women on dialysis have become pregnant. Of these pregnancies, about 20 percent will end in miscarriage. A full-term pregnancy lasts about 40 weeks; however, about 80 percent of dialysis pregnancies will only go about 32 weeks, resulting in a premature birth
Categorical Data and Statistical AnalysisMichael770443
In this presentation, we will introduce two tests and hypothesis testing based on it, and different non-parametric methods such as the Kolmogorov-Smirnov test, the Wilcoxon’s signed-rank test, the Mann-Whitney U test, and the Kruskal-Wallis test.
This is the information about biostatistics and there are various test which are performed in the laboratory to the field. these tests are f test chi square test etc. on the basis of these data we confirmed probability and calculation of variability. here is the whole information about the chi square test
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.2: Contingency Tables
Primary vesicoureteral reflux (VUR) is the commonest congenital urological abnormalities in children, which has been associated with an increased risk of urinary tract infection (UTI) and renal scarring, also called reflux nephropathy (RN).
While it is rare, women on dialysis have become pregnant. Of these pregnancies, about 20 percent will end in miscarriage. A full-term pregnancy lasts about 40 weeks; however, about 80 percent of dialysis pregnancies will only go about 32 weeks, resulting in a premature birth
A common viral infection of the nose and throat.
In contrast to the flu, a common cold can be caused by many different types of viruses. The condition is generally harmless and symptoms usually resolve within two weeks.
Symptoms include a runny nose, sneezing and congestion. High fever or severe symptoms are reasons to see a doctor, especially in children.
Most people recover on their own within two weeks. Over-the-counter products and home remedies can help control symptoms.
The major passages and structures of the upper respiratory tract include the nose or nostrils, nasal cavity, mouth, throat (pharynx), and voice box (larynx). The respiratory system is lined with a mucous membrane that secretes mucus. The mucus traps smaller particles like pollen or smoke.
The kidneys filter waste and excess fluid from the blood. As kidneys fail, waste builds up.
Symptoms develop slowly and aren't specific to the disease. Some people have no symptoms at all and are diagnosed by a lab test.
Medication helps manage symptoms. In later stages, filtering the blood with a machine (dialysis) or a transplant may be required.
Although the most important causes of kidney injury in late pregnancy are preeclampsia and the associated disorders eclampsia and HELLP (hemolysis, elevated liver enzyme levels, low platelet count) syndrome, they will be discussed with the hypertensive disorders of pregnancy.
Hyperoxaluria occurs when you have too much oxalate in your urine. Oxalate is a natural chemical in your body, and it's also found in certain types of food. But too much oxalate in your urine can cause serious problems.
Nephrolithiasis is the term employed for kidney stones, also known as renal calculi, and they are crystal concretions formed typically in the kidney. Calculi typically form in the kidneys and ideally leave the body via the urethra without pain. Larger stones are painful and may need surgical intervention
A tibial shaft fracture occurs along the length of the bone, below the knee and above the ankle. It typically takes a major force to cause this type of broken leg. Motor vehicle collisions, for example, are a common cause of tibial shaft fractures.
Two-way tables are used in statistical analysis to summarize the relationship between two categorical variables. Two-way tables are also known as contingency, cross-tabulation, or crosstab tables.
Data categories are groupings of data with common characteristics or features. They are useful for managing the data because certain data may be treated differently based on their classification. Understanding the relationship and dependency between the different categories can help direct data quality effort
Water management is the control and movement of water resources to minimize damage to life and property and to maximize efficient beneficial use. Good water management of dams and levees reduces the risk of harm due to flooding. Irrigation water management systems make the most efficient use of limited water supplies for agriculture.
Drainage management involves water budgeting and analysis of surface and sub-surface drainage systems. Sometimes water management involves changing practices, such as groundwater withdrawal rates, or allocation of water to different purposes.
A tracheostomy is an opening (made by an incision) through the neck into the trachea (windpipe). A tracheostomy opens the airway and aids breathing.
A tracheostomy may be done in an emergency, at the patient’s bedside or in an operating room. Anesthesia pain relief medication may be used before the procedure. Depending on the person’s condition, the tracheostomy may be temporary or permanent
A tracheal tube is a catheter that is inserted into the trachea for the primary purpose of establishing and maintaining a patent airway and to ensure the adequate exchange of oxygen and carbon dioxide.
India, country that occupies the greater part of South Asia. With roughly one-sixth of the world’s total population, India is the second most populous country. Types of water resources Surface water Resources Groundwater Resources.
Management of water resources in India has been a challenge whose magnitude has risen manifolds over the past 50 years due to a variety of reasons, notably the rising demands and growing environmental degradation.
Water is used intensively by various sectors such as agriculture, industry, and public. Increasing global water demand and the effects of climate change are leading to overuse of water resources in many regions.
A nasopharyngeal airway, also known as an NPA, nasal trumpet (because of its flared end), or nose hose, is a type of airway adjunct, a tube that is designed to be inserted into the nasal passageway to secure an open airway
An oropharyngeal airway (also known as an oral airway, OPA or Guedel pattern airway) is a medical device called an airway adjunct used in airway management.
BASIC AIRWAY SKILLS AND TECHINC
Head and chin lift,
Jaw thrust (with out neck extension if suspect c-spine injury),
Mouth to mouth ventilation,
Mouth to barrier device,
Bag mask ventilation
Normal childbirth it is process of fetus come out of the vagina, this will start from uterus contraction and delivery fetus out side of female genitalia this full process called has normal delivery
PPH Postpartum hemorrhage, affecter the delivery of fetus vaginal bleeding you can see with in 24 hours this primary PPH, secondary PPH will be up 28 of delivery.
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...2023240532
Quantitative data Analysis
Overview
Reliability Analysis (Cronbach Alpha)
Common Method Bias (Harman Single Factor Test)
Frequency Analysis (Demographic)
Descriptive Analysis
Explore our comprehensive data analysis project presentation on predicting product ad campaign performance. Learn how data-driven insights can optimize your marketing strategies and enhance campaign effectiveness. Perfect for professionals and students looking to understand the power of data analysis in advertising. for more details visit: https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/
2. 2
Chi-Square (χ2) and Frequency Data
• It was proposed in1900 by Karl Pearson.
• The data that we analyze consists of frequencies; that is, the
number of individuals falling into categories. In other words,
the variables are measured on a nominal scale.
• The test statistic for such frequency data is Pearson Chi-
Square.
• The magnitude of Pearson Chi-Square reflects the amount of
discrepancy between observed frequencies and expected
frequencies.
3. 3
Steps in Test of Hypothesis
1. Determine the appropriate test
2. Establish the level of significance:α
3. Formulate the statistical hypothesis
4. Calculate the test statistic
5. Determine the degree of freedom
6. Compare calculated test statistic against a
table/critical value
4. 4
1. Determine Appropriate Test
• Chi Square is used when both variables are
measured on a nominal scale.
• It can be applied to interval or ratio data that have
been categorized into a small number of groups.
• It assumes that the observations are randomly
sampled from the population.
• All observations are independent (an individual
can appear only once in a table and there are no
overlapping categories).
• It does not make any assumptions about the
shape of the distribution nor about the
homogeneity of variances.
5. 5
2. Establish Level of Significance
• α is a predetermined value
• The convention
• α = .05
• α = .01
• α = .001
6. 6
3. Determine The Hypothesis:
Whether There is an Association or
Not
• Ho : The two variables are independent
• Ha : The two variables are associated
7. 7
4. Calculating Test Statistics
• Contrasts observed frequencies in each cell of a
contingency table with expected frequencies.
• The expected frequencies represent the number of
cases that would be found in each cell if the null
hypothesis were true ( i.e. the nominal variables are
unrelated).
• The expected values specify what the values of each
cell of the table would be if there was no association
between the two variables.
• Expected frequency of two unrelated events is
product of the row and column frequency divided by
number of cases.
Fe= Fr Fc / N
11. 11
6. Compare computed test statistic
against a tabled/critical value
• The computed value of the Pearson chi- square
statistic is compared with the critical value to
determine if the computed value is improbable
• The critical tabled values are based on
sampling distributions of the Pearson chi-
square statistic
• If calculated 2 is greater than 2 table value,
reject Ho
12. Example
• General Social survey 1991.
Let X= Income
Y= Job satisfaction
Dissatisfied Little satisf Mod. Satisfied
Much Satisfied
Total
< 5,000 2 4 13 3 22
5,000 to 15,000 2 6 22 4 34
15000 to 25000 0 1 15 8 24
>25000 0 3 13 8 24
Total 4 14 63 23 104
Job satisfaction
Income
13. Hypothesis:
Ho : X and Y are independent
H1 : X and Y are dependent
Pearson Chi – Square Statistic is
Degrees of Freedom = 3 * 3 = 9
e
e
o
F
F
F 2
2 )
(
14. Observed Frequencies
Dissatisfied Little satisf Mod. Satisfied
Much Satisfied
Total
< 5,000 2 4 13 3 22
5,000 to 15,000 2 6 22 4 34
15000 to 25000 0 1 15 8 24
>25000 0 3 13 8 24
Total 4 14 63 23 104
Job satisfaction
Income
15. Expected Frequencies
Income
Job satisfaction
Dissatisfied Little satisf
Mod.
Satisfied
Much Satisfied
< 5,000 0.8 3.0 13.3 4.9
5,000 to 15,000 1.3 4.6 20.6 7.5
15000 to 25000 0.9 3.2 14.5 5.3
>25000 0.9 3.2 14.5 5.3
(22*4)/ 104 (24*23)/104
16. Income
Job satisfaction
Dissatisfied Little satisf Mod. Satisfied Much Satisfied
< 5,000 1.6 0.4 0.0 0.7
5,000 to 15,000 0.4 0.4 0.1 1.6
15000 to 25000 0.9 1.5 0.0 1.4
>25000 0.9 0.0 0.2 1.4
Total 3.8 2.4 0.3 5.1 11.5
(8-5.3)^2/5.3
χ² = 11.5
17. Table value =
Evidence against Ho is weak
Possible that Job satisfaction and Income are
independent.
18. An alternative
• The likelihood ratio test: It compares observed
values with the distribution of expected values
based on the multinomial probability distribution
cells
all
Expected
Observed
Observed
GSq _
ln
2
= 0.0866
19. Pearson Statistic and Likelihood ratio
statistic
• Like the Pearson statistic, GSq takes its minimum value
of 0 when all observed = expected , and larger values
provide stronger evidence against Ho.
• Although the Pearson and likelihood-ratio GSq
provide separate test statistics, but they share many
properties and usually provide the same conclusions.
• When Ho is true and the expected frequencies are
large, the two statistics have the same chi-squared
distribution, and their numerical values are similar.
20. Example 2
• Two sample polls of votes for two candidates
A and B for a public office are taken , one from
among the residents of rural areas. The results
are given in the adjoining table.
• Examine whether the nature of the area is
related to voting preference in this election
21. Hypothesis
• Ho: The nature of the area is independent of
the voting preference in the election
• H1: The nature of the area is dependent of the
voting preference in the election
A B
Rural 620 380 1000
Urban 550 450 1000
Total 1170 830 2000
Area
Votesfor
Total
22.
23.
24.
25.
26.
27. Interpretation
1) Table value for 1 d.f
With 5% level of
significance is 3.841.
(i.e.,) calculated value
is greater than the
table value. We
Conclude that nature of
area is related to voting
Preference in the election.
2) P – value (0.001<0.05) and
hence Null hypothesis is rejected.
Critical value
Rejection region(α)
Acceptance
Region (1 – α)
3.841
28. Residuals
• Testing of independence using “ Chi – Square
test” infers whether the association between two
variable exists or not based on the p-value.
• But, there is no info regarding the “ Strength of
Association”:
• Strength of Association is found using
1) Residual Analysis
2) Partitioning the Chi-Square statistics
29. Residual analysis
• Compares Oij (observed) and Eij (Expected)
values.
• The difference between the observed value of
the dependent variable (y) and its expected
value is known as residual.
eij = Oij - Eij
30. Pearson Residuals
• Pearson’s residuals attempts to adjust for the
notion that larger values of Oij and Eij tend to
have larger differences.
• One approach to adjusting for the variance is to
consider dividing the difference (Oij − Eij ) by √Eij.
• Thus, define
eij = Oij - Eij / √Eij
As the pearson residual.
• Note that ,
31. Standardised Pearson Residuals
• Under Ho, eij are asymptotically normal with mean 0.
• However, the variance of eij is less than 1.
• To compensate for this, one can use the
STANDARDIZED Pearson Residuals.
• Denote rij as the standardized residuals in which
Where is the estimated row I marginal
probability.
• rij is asymptotically distributed as a standard normal.
32. Standardised Pearson Residuals
• As a “rule of thumb”, a rij value (which is an
absolute value) greater than 2 or 3 indicates a
lack of fit of H0 in that cell.
• However, as the number of cells increases, the
likelihood that a cell has a value of 2 or 3
increases. For example, if you have 20 cells,
you could expect 1 in the 20 to have a value
greater the 2 just by chance (i.e., α = 0.05).
35. Output
We can find large positive residuals for “ Rural preferred voting
Candidate A” and “Urban preferred voting Candidate B”, and large
negative residuals for “ Rural preferred voting Candidate B” and “Urban
preferred voting Candidate A”. Thus, there were significantly more people
in“ Rural preferred voting Candidate A” and “Urban preferred voting
Candidate B” and fewer people in “ Rural preferred voting Candidate B”
and “Urban preferred voting Candidate A” than the hypothesis of
independence predicts.
36. Partitioning the Likelihood Ratio Test
• Motivation for this:
1) If you reject the Ho and conclude that X and Y are
dependent, the next question could be ‘Are there individual
comparisons more significant than others?’.
2) Partitioning (or breaking a general I × J contingency table
into smaller tables) may show the association is largely
dependent on certain categories or groupings of categories.
• Recall, these basic principles about Chi Square variables
1) If X1 and X2 are both (independently) distributed as χ 2
with df = 1 then X = X1 + X2 ∼ χ2 (df = 1 + 1)
2)In general, the sum of independent χ2 random variables is
distributed as χ2 ( df= ∑ df (Xi))
37. General Rules for Partitioning
In order to completely partition a I × J
contingency table, you need to follow this 3
step plan.
1. The df for the subtables must sum to the df
for the full table
2. Each cell count in the full table must be a cell
count in one and only one subtable
3. Each marginal total of the full table must be a
marginal total for one and only one subtable
38. Example
Independent random samples of 83, 60, 56, and 62 faculty
members of a state university system from four system universities
were polled to determine which of the three collective bargaining
agents (i.e., unions) are preferred.
Interest centers on whether there is evidence to indicate a
differences in the distribution of preference across the 4 state
universities.
Table 1 Total
University 101 102 103
1 42 29 12 83
2 31 23 6 60
3 26 28 2 56
4 8 17 37 62
Total 107 97 57 261
Bargaining agent
39. • Therefore, we see that there is a significant
association among University and Bargaining
Agent.
• Just by looking at the data, we see that
University 4 seems to prefer Agent 103
Universities 1 and 2 seem to prefer Agent 101
University 3 may be undecided, but leans
towards Agent 102
• Partitioning will help examine these trends
40. First subtable
The Association of University 4 appears the
strongest, so we could consider a subtable of
Note: This table was
obtained by considering
the {4, 3} cell in
comparison to the rest
of the table.
G^2 = 60.5440 on 1 df (p=0.0).
We see strong evidence for an association among
universities (grouped accordingly) and agents.
Subtable 1
university 101- 102 103
01-Mar 179 20 199
4 25 37 62
Total 204 57 261
Bargainingagent
total
41. Second Subtable
Now, we could consider just Agents 101 and 102
with Universities 1 – 3
G^2 = 1.6378 on 2 df (p=0.4411).
For Universities 1 -3 and Agents 101 and 102,
preference is homogeneous (universities prefer
agents in similar proportions from one university
to another).
Table 1
University 101 102 Total
1 42 29 71
2 31 23 54
3 26 28 54
Total 99 80 179
Bargaining agent
42. Third Subtable
We could also consider Bargaining units by
dichotomized university
G^2 = 4.8441 on 1 df (p=0.0277).
There is indication that the preference for agents varies
with the introduction of University 4.
Subtable 3 Bargainng Agent
Total
University 101 102
1 to 3 99 80 179
4 8 17 25
Total 107 97 204
43. Final table
A final table we can construct is
G^2 = 4.966 on 2 df (p=0.0835).
With the addition of agent 103 back into the
summary, we still see that sites 1 - 3 still have
homogenous preference.
Subtable 4
university 101- 102 103
1 71 12 83
2 54 6 60
3 54 2 56
Total 179 20 199
Bargaining agent
total
44. What have we done?
General Notes:
1. We created 4 subtables with df of 1,2,1 and 2 (Recall Rule 1 - df
must sum to the total. 1 + 2 + 1 + 2 = 6. Rule 1 -Check!)
2. Rule 2 - Cell counts in only 1 table. (42 was in subtable 2, 29
subtable 2, ..., 37 subtable 1). Rule 2 - Check !
3. Rule 3 - Marginals can only appear once. (83 was in subtable 4, 60
subtable 4, 56 subtable 4, 62 subtable 1, 107 subtable 3, 97
subtable 3, 57 subtable 1). Rule 3 - Check!
Since we have partitioned according to the rules, note the sum
of G^2.
G^2 = 60.5440 + 1.6378 + 4.8441 + 4.9660 = 71.9910 on 6 df which is
the same value obtained from the original table.
45. Overall Summary of Example
Now that we have verified our partitioning, we can draw
inference on the subtables.
From the partitioning, we can observe
1. Preference distribution is homogeneous among
Universities 1 - 3.
2. That preference for a bargaining unit is independent
of the faculty’s university with the exception that if a
faculty member belongs to university 4, then he or
she is much more likely than would otherwise have
been expected to show preference for bargaining
agent 103 (and vice versa).
46. Final Comments on Partitioning
• For the likelihood ratio test (G^2), exact partitioning occurs
(meaning you can sum the fully partitioned subtables’ G^2 to arrive
at the original G^2).
• Pearson’s does not have this property
• Use the summation of G^2 to double check your partitioning.
• You can have as many subtables as you have df. However, as in our
example, you may have tables with df > 1 (which yields fewer
subtables).
• The selection of subtables is not unique. To initiate the process, you
can use your residual analysis to identify the most extreme cell and
begin there (this is why I isolated the {4, 3} cell initially.
• Partitioning is not easy and is an acquired knack. However, the
rewards is additional interpretation that is generally desired in the
data summary.
47. Advantages and Limitations of Chi-
squared tests
• Pearson chi-square statistic and Likelihood ratio
statistic do not change value with reorderings of
rows or of columns.(i.e) both variables are
nominal. If atleast one is ordinal, this tests does
not hold good.
• G2 and χ² requires large samples
• When the Expected frequency is small (<5) , the
answer from G2 and χ² will not be reliable. So,
Whenever at least one expected frequency is less
than 5 you can instead use a small-sample
procedure.