This document provides an overview of inferential statistics concepts including hypothesis testing, types of hypotheses, errors in hypothesis testing, steps in hypothesis testing, z-tests, t-tests, and analysis of variance (ANOVA). Key points covered are: hypothesis testing determines if there is a statistically significant difference between groups or a relationship between variables; the two types of hypotheses are the null and alternative; z-tests and t-tests are used to test hypotheses comparing means and proportions; and ANOVA analyzes variance between and within groups. Examples of each statistical test are provided.
The document discusses testing for independence between two variables using a contingency table and chi-square test. It explains how to set up a contingency table with observed and expected frequencies, and how to calculate the chi-square test statistic to determine if the variables are independent or dependent. An example is provided that tests if blood pressure is independent of jogging status using a contingency table and chi-square test.
The document describes a study that compared two methods of instruction. One group was taught a problem-solving method directly, while the other group was told to figure it out themselves (the "discovery method"). After 3 weeks, both groups were given a novel problem to solve. The discovery method group performed better. The document discusses using a t-test to determine if the difference in performance was statistically significant or due to chance. It provides the formula for an independent samples t-test when comparing means between two unrelated groups. The t-test calculates whether the difference between two sample means is larger than would be expected by chance, given the variability in the samples.
Here are the class widths, marks and boundaries for the given class intervals:
a. Class interval (ci): 4 – 8
Class Width: 4
Class Mark: 6
Class Boundary: 3.5 – 8.5
b. Class interval (ci): 35 – 44
Class Width: 9
Class Mark: 39.5
Class Boundary: 34.5 – 43.5
c. Class interval (ci): 17 – 21
Class Width: 4
Class Mark: 19
Class Boundary: 16.5 – 20.5
d. Class interval (ci): 53 – 57
Class Width: 4
Class Mark: 55
Class Boundary: 52.5 –
I. The median test is used to determine if two independent groups have been drawn from populations with the same median. It requires at least ordinal scale data.
II. The combined median of both groups is calculated. Scores from each group are then split based on whether they are above or below the combined median. These frequencies are entered into a 2x2 contingency table.
III. The median test statistic (chi-square) is calculated and compared to a critical value based on the significance level and degrees of freedom to determine whether to reject or fail to reject the null hypothesis that the two groups have the same median.
Percentiles are positional measures used to indicate an individual's position within a group. They divide a data set into 100 equal parts, with percentiles (denoted Px) indicating what percent of values are less than a specified value. Common percentiles include the median (P50), quartiles (P25, P50, P75), and deciles. Percentiles are calculated using a formula that determines the position number based on the total number of data points and percentile value. This position is then used to find the corresponding value within ordered data.
This document discusses hypothesis testing, including:
1) The objectives are to formulate statistical hypotheses, discuss types of errors, establish decision rules, and choose appropriate tests.
2) Key symbols and concepts are defined, such as the null and alternative hypotheses, Type I and Type II errors, test statistics like z and t, means, variances, sample sizes, and significance levels.
3) The two types of errors in hypothesis testing are discussed. Hypothesis tests can result in correct decisions or two types of errors when the null hypothesis is true or false.
4) Steps in hypothesis testing are outlined, including formulating hypotheses, specifying a significance level, choosing a test statistic, establishing a
The document discusses how to calculate standard deviation and variance for both ungrouped and grouped data. It provides step-by-step instructions for finding the mean, deviations from the mean, summing the squared deviations, and using these values to calculate standard deviation and variance through standard formulas. Standard deviation measures how spread out numbers are from the mean, while variance is the square of the standard deviation.
The document discusses the paired samples t-test, which is used to compare two sets of measurements made on the same individuals. It notes that this test is appropriate when there are two correlated distributions, such as pre-test and post-test scores from the same people. The null hypothesis is that there is no difference between the pairs. The test calculates the differences between pairs, sums them, and divides this by the standard error of the differences to obtain a t-value, which can be compared to critical values to determine if the null hypothesis can be rejected.
The document discusses testing for independence between two variables using a contingency table and chi-square test. It explains how to set up a contingency table with observed and expected frequencies, and how to calculate the chi-square test statistic to determine if the variables are independent or dependent. An example is provided that tests if blood pressure is independent of jogging status using a contingency table and chi-square test.
The document describes a study that compared two methods of instruction. One group was taught a problem-solving method directly, while the other group was told to figure it out themselves (the "discovery method"). After 3 weeks, both groups were given a novel problem to solve. The discovery method group performed better. The document discusses using a t-test to determine if the difference in performance was statistically significant or due to chance. It provides the formula for an independent samples t-test when comparing means between two unrelated groups. The t-test calculates whether the difference between two sample means is larger than would be expected by chance, given the variability in the samples.
Here are the class widths, marks and boundaries for the given class intervals:
a. Class interval (ci): 4 – 8
Class Width: 4
Class Mark: 6
Class Boundary: 3.5 – 8.5
b. Class interval (ci): 35 – 44
Class Width: 9
Class Mark: 39.5
Class Boundary: 34.5 – 43.5
c. Class interval (ci): 17 – 21
Class Width: 4
Class Mark: 19
Class Boundary: 16.5 – 20.5
d. Class interval (ci): 53 – 57
Class Width: 4
Class Mark: 55
Class Boundary: 52.5 –
I. The median test is used to determine if two independent groups have been drawn from populations with the same median. It requires at least ordinal scale data.
II. The combined median of both groups is calculated. Scores from each group are then split based on whether they are above or below the combined median. These frequencies are entered into a 2x2 contingency table.
III. The median test statistic (chi-square) is calculated and compared to a critical value based on the significance level and degrees of freedom to determine whether to reject or fail to reject the null hypothesis that the two groups have the same median.
Percentiles are positional measures used to indicate an individual's position within a group. They divide a data set into 100 equal parts, with percentiles (denoted Px) indicating what percent of values are less than a specified value. Common percentiles include the median (P50), quartiles (P25, P50, P75), and deciles. Percentiles are calculated using a formula that determines the position number based on the total number of data points and percentile value. This position is then used to find the corresponding value within ordered data.
This document discusses hypothesis testing, including:
1) The objectives are to formulate statistical hypotheses, discuss types of errors, establish decision rules, and choose appropriate tests.
2) Key symbols and concepts are defined, such as the null and alternative hypotheses, Type I and Type II errors, test statistics like z and t, means, variances, sample sizes, and significance levels.
3) The two types of errors in hypothesis testing are discussed. Hypothesis tests can result in correct decisions or two types of errors when the null hypothesis is true or false.
4) Steps in hypothesis testing are outlined, including formulating hypotheses, specifying a significance level, choosing a test statistic, establishing a
The document discusses how to calculate standard deviation and variance for both ungrouped and grouped data. It provides step-by-step instructions for finding the mean, deviations from the mean, summing the squared deviations, and using these values to calculate standard deviation and variance through standard formulas. Standard deviation measures how spread out numbers are from the mean, while variance is the square of the standard deviation.
The document discusses the paired samples t-test, which is used to compare two sets of measurements made on the same individuals. It notes that this test is appropriate when there are two correlated distributions, such as pre-test and post-test scores from the same people. The null hypothesis is that there is no difference between the pairs. The test calculates the differences between pairs, sums them, and divides this by the standard error of the differences to obtain a t-value, which can be compared to critical values to determine if the null hypothesis can be rejected.
This document defines and provides examples of key statistical concepts used to describe and analyze variability in data sets, including range, variance, standard deviation, coefficient of variation, quartiles, and percentiles. It explains that range is the difference between the highest and lowest values, variance is the average squared deviation from the mean, and standard deviation describes how distant scores are from the mean on average. Examples are provided to demonstrate calculating these measures from data sets and interpreting what they indicate about the spread of scores.
The document discusses the standard normal distribution. It defines the standard normal distribution as having a mean of 0, a standard deviation of 1, and a bell-shaped curve. It provides examples of how to find probabilities and z-scores using the standard normal distribution table or calculator. For example, it shows how to find the probability of an event being below or above a given z-score, or between two z-scores. It also shows how to find the z-score corresponding to a given cumulative probability.
This document provides guidelines for writing up results sections based on APA style. It discusses reporting statistical tests, including describing test statistics, significance levels, means, standard deviations, and directions of effects. Examples are provided for how to report results from t-tests, ANOVAs, post hoc tests, chi-square tests, correlations, and regressions. Tables and figures can help report complex results. The guidelines emphasize identifying analyses and their relation to hypotheses, and assuming reader knowledge of statistics.
The document discusses the normal curve and standard scores. It defines the normal curve as a continuous probability distribution that is bell-shaped and symmetric. It was developed by Gauss and Pearson. The normal curve can be divided into areas defined by standard deviations from the mean. Standard scores are raw scores converted to other scales, including z-scores, t-scores, and stanines. Z-scores indicate the distance from the mean in standard deviations. T-scores are on a scale of 50 plus or minus 10. Stanines use a nine-point scale with a mean of 5 and standard deviation of 2.
1. Illustrate point and interval estimations.
2. Distinguish between point and interval estimation.
Visit the website for more services it can offer:
https://cristinamontenegro92.wixsite.com/onevs
This document discusses different methods for organizing data, including percentiles, quartiles, and deciles. It provides the definitions and formulas for calculating each. Percentiles indicate the value below which a given percentage of observations fall. Quartiles divide a data set into four equal parts, with the median (Q2) separating the lower and upper halves. Deciles divide a data set into ten equal parts. The document gives examples of calculating percentiles, quartiles, and deciles for sample data sets.
This document provides an overview of the Z test for two sample means. It defines the Z test, outlines when it is used, and provides the formula and steps to conduct a hypothesis test using the Z test. An example problem is included that tests if there is a significant difference in average monthly family incomes between two neighborhoods using census data from random samples of 100 families each.
The document describes how to calculate a student's class average based on weighted grades in different components. It provides the component percentages, the student's grades for each component, and shows the calculation of multiplying each grade by its component percentage to get a weighted grade. These weighted grades are summed and divided by the total percentage to get the class average of 84.7, equivalent to a B.
Determining measures of central tendency for grouped dataAlona Hall
This document discusses measures of central tendency (mean, median, mode) using grouped data from a sample of 40 students' heights. It provides an example to calculate each measure. The mean height is estimated as 151.25 cm using a frequency table and calculating the mid-point of each height range. The modal class is 155-159 cm as it has the highest frequency. A cumulative frequency table and ogive curve allow estimating the median height as 153.5 cm.
This document provides instructions for performing various statistical analyses and data management tasks in SPSS, including sorting data, selecting cases, splitting files, merging files, visual binning, frequencies analysis, descriptive statistics, cross tabulation and chi-square tests, independent samples t-tests, and one-way ANOVA. The document is authored by trainers from the Department of Applied Statistics at the University of Rwanda and dated December 6, 2014.
This document provides formulas and examples for calculating the mean and median of data sets. It defines the mean as the sum of all values divided by the total number of data points. For grouped data, the mean is calculated as the sum of the frequency multiplied by the midpoint of each class, divided by the total number of data points. Examples are provided to demonstrate calculating the mean of both ungrouped and grouped data sets. The median is defined for grouped data as the lower boundary of the class containing the median plus the class width times the amount the cumulative frequency is less than the halfway point, divided by the total number of data points. An example is given to demonstrate calculating the median of a grouped data set.
Distinguish between Parameter and Statistic.
Calculate sample variance and sample standard deviation.
Visit the website for more services: https://cristinamontenegro92.wixsite.com/onevs
Here are the modes for the three examples:
1. The mode is 3. This value occurs most frequently among the number of errors committed by the typists.
2. The mode is 82. This value occurs most frequently among the number of fruits yielded by the mango trees.
3. The mode is 12 and 15. These values occur most frequently among the students' quiz scores.
This document discusses frequency distributions and how to construct them from raw data. It provides examples of creating stem-and-leaf displays, frequency tables, relative frequency tables, and cumulative frequency tables from various data sets. Key concepts covered include class width, class boundaries, tallying data, and calculating relative frequencies and percentages. Overall, the document serves as a tutorial on how to organize and summarize data using various types of frequency distributions.
Normal distribution and sampling distributionMridul Arora
This document provides an overview of Chapter 5 from the textbook, which covers normal probability distributions. Section 5.1 introduces normal distributions and the standard normal distribution, including their key properties and how to interpret related graphs. It describes how any normal distribution can be transformed into a standard normal distribution for calculation purposes. Section 5.1 also shows how to find areas under the standard normal curve using the standard normal table. Section 5.2 discusses how to calculate probabilities for normally distributed variables by relating them to areas under the normal curve. It provides examples of finding probabilities and expected values.
Systematic sampling in probability sampling Sachin H
This is a systematic sample in probability sampling which is consider to be one of the technics of sampling . It is most useful in certain circumstances in Random sampling.
Here are the steps to solve this problem:
1. Prepare the frequency distribution table with the class intervals, frequencies, and calculate fX.
2. Find the mean (x) using the formula x = fX/f.
3. Calculate the deviations (X - x).
4. Square the deviations to get (X - x)2.
5. Multiply the frequencies and squared deviations to get f(X - x)2.
6. Calculate the variance using the formula σ2 = f(X - x)2 / (f - 1).
7. Take the square root of the variance to get the standard deviation.
8. The range is the difference between the upper
parametric test of difference z test f test one-way_two-way_anova Tess Anoza
The document provides information about z-tests and F-tests. It defines what a z-test and F-test are, explains why and how they are used, and discusses the z-test for one sample and two sample groups as well as the F-test. For the z-test, it provides the formula, steps to use it for one and two sample groups, and an example problem. For the F-test, it defines what it is, when it is used, provides the formula and steps to compute the F-value including using an ANOVA table. It also provides an example problem to demonstrate solving the F-value.
1. Statistical analysis involves collecting, organizing, analyzing data, and drawing inferences about populations based on samples. It includes both descriptive and inferential statistics.
2. The document defines key terms used in statistical analysis like population, sample, statistical analysis, and discusses various statistical measures like mean, median, mode, interquartile range, and standard deviation.
3. The purposes of statistical analysis are outlined as measuring relationships, making predictions, testing hypotheses, and summarizing results. Both parametric and non-parametric statistical analyses are discussed.
This document discusses various types of analysis of variance (ANOVA) statistical tests. It begins with an introduction to one-way ANOVA for comparing the means of three or more independent groups. Requirements for one-way ANOVA include a nominal independent variable with three or more levels and a continuous dependent variable. Assumptions of one-way ANOVA include normality and homogeneity of variances. The document then briefly discusses two-way ANOVA, MANOVA, ANOVA with repeated measures, and related statistical tests. Examples of each type of ANOVA are provided.
This document defines and provides examples of key statistical concepts used to describe and analyze variability in data sets, including range, variance, standard deviation, coefficient of variation, quartiles, and percentiles. It explains that range is the difference between the highest and lowest values, variance is the average squared deviation from the mean, and standard deviation describes how distant scores are from the mean on average. Examples are provided to demonstrate calculating these measures from data sets and interpreting what they indicate about the spread of scores.
The document discusses the standard normal distribution. It defines the standard normal distribution as having a mean of 0, a standard deviation of 1, and a bell-shaped curve. It provides examples of how to find probabilities and z-scores using the standard normal distribution table or calculator. For example, it shows how to find the probability of an event being below or above a given z-score, or between two z-scores. It also shows how to find the z-score corresponding to a given cumulative probability.
This document provides guidelines for writing up results sections based on APA style. It discusses reporting statistical tests, including describing test statistics, significance levels, means, standard deviations, and directions of effects. Examples are provided for how to report results from t-tests, ANOVAs, post hoc tests, chi-square tests, correlations, and regressions. Tables and figures can help report complex results. The guidelines emphasize identifying analyses and their relation to hypotheses, and assuming reader knowledge of statistics.
The document discusses the normal curve and standard scores. It defines the normal curve as a continuous probability distribution that is bell-shaped and symmetric. It was developed by Gauss and Pearson. The normal curve can be divided into areas defined by standard deviations from the mean. Standard scores are raw scores converted to other scales, including z-scores, t-scores, and stanines. Z-scores indicate the distance from the mean in standard deviations. T-scores are on a scale of 50 plus or minus 10. Stanines use a nine-point scale with a mean of 5 and standard deviation of 2.
1. Illustrate point and interval estimations.
2. Distinguish between point and interval estimation.
Visit the website for more services it can offer:
https://cristinamontenegro92.wixsite.com/onevs
This document discusses different methods for organizing data, including percentiles, quartiles, and deciles. It provides the definitions and formulas for calculating each. Percentiles indicate the value below which a given percentage of observations fall. Quartiles divide a data set into four equal parts, with the median (Q2) separating the lower and upper halves. Deciles divide a data set into ten equal parts. The document gives examples of calculating percentiles, quartiles, and deciles for sample data sets.
This document provides an overview of the Z test for two sample means. It defines the Z test, outlines when it is used, and provides the formula and steps to conduct a hypothesis test using the Z test. An example problem is included that tests if there is a significant difference in average monthly family incomes between two neighborhoods using census data from random samples of 100 families each.
The document describes how to calculate a student's class average based on weighted grades in different components. It provides the component percentages, the student's grades for each component, and shows the calculation of multiplying each grade by its component percentage to get a weighted grade. These weighted grades are summed and divided by the total percentage to get the class average of 84.7, equivalent to a B.
Determining measures of central tendency for grouped dataAlona Hall
This document discusses measures of central tendency (mean, median, mode) using grouped data from a sample of 40 students' heights. It provides an example to calculate each measure. The mean height is estimated as 151.25 cm using a frequency table and calculating the mid-point of each height range. The modal class is 155-159 cm as it has the highest frequency. A cumulative frequency table and ogive curve allow estimating the median height as 153.5 cm.
This document provides instructions for performing various statistical analyses and data management tasks in SPSS, including sorting data, selecting cases, splitting files, merging files, visual binning, frequencies analysis, descriptive statistics, cross tabulation and chi-square tests, independent samples t-tests, and one-way ANOVA. The document is authored by trainers from the Department of Applied Statistics at the University of Rwanda and dated December 6, 2014.
This document provides formulas and examples for calculating the mean and median of data sets. It defines the mean as the sum of all values divided by the total number of data points. For grouped data, the mean is calculated as the sum of the frequency multiplied by the midpoint of each class, divided by the total number of data points. Examples are provided to demonstrate calculating the mean of both ungrouped and grouped data sets. The median is defined for grouped data as the lower boundary of the class containing the median plus the class width times the amount the cumulative frequency is less than the halfway point, divided by the total number of data points. An example is given to demonstrate calculating the median of a grouped data set.
Distinguish between Parameter and Statistic.
Calculate sample variance and sample standard deviation.
Visit the website for more services: https://cristinamontenegro92.wixsite.com/onevs
Here are the modes for the three examples:
1. The mode is 3. This value occurs most frequently among the number of errors committed by the typists.
2. The mode is 82. This value occurs most frequently among the number of fruits yielded by the mango trees.
3. The mode is 12 and 15. These values occur most frequently among the students' quiz scores.
This document discusses frequency distributions and how to construct them from raw data. It provides examples of creating stem-and-leaf displays, frequency tables, relative frequency tables, and cumulative frequency tables from various data sets. Key concepts covered include class width, class boundaries, tallying data, and calculating relative frequencies and percentages. Overall, the document serves as a tutorial on how to organize and summarize data using various types of frequency distributions.
Normal distribution and sampling distributionMridul Arora
This document provides an overview of Chapter 5 from the textbook, which covers normal probability distributions. Section 5.1 introduces normal distributions and the standard normal distribution, including their key properties and how to interpret related graphs. It describes how any normal distribution can be transformed into a standard normal distribution for calculation purposes. Section 5.1 also shows how to find areas under the standard normal curve using the standard normal table. Section 5.2 discusses how to calculate probabilities for normally distributed variables by relating them to areas under the normal curve. It provides examples of finding probabilities and expected values.
Systematic sampling in probability sampling Sachin H
This is a systematic sample in probability sampling which is consider to be one of the technics of sampling . It is most useful in certain circumstances in Random sampling.
Here are the steps to solve this problem:
1. Prepare the frequency distribution table with the class intervals, frequencies, and calculate fX.
2. Find the mean (x) using the formula x = fX/f.
3. Calculate the deviations (X - x).
4. Square the deviations to get (X - x)2.
5. Multiply the frequencies and squared deviations to get f(X - x)2.
6. Calculate the variance using the formula σ2 = f(X - x)2 / (f - 1).
7. Take the square root of the variance to get the standard deviation.
8. The range is the difference between the upper
parametric test of difference z test f test one-way_two-way_anova Tess Anoza
The document provides information about z-tests and F-tests. It defines what a z-test and F-test are, explains why and how they are used, and discusses the z-test for one sample and two sample groups as well as the F-test. For the z-test, it provides the formula, steps to use it for one and two sample groups, and an example problem. For the F-test, it defines what it is, when it is used, provides the formula and steps to compute the F-value including using an ANOVA table. It also provides an example problem to demonstrate solving the F-value.
1. Statistical analysis involves collecting, organizing, analyzing data, and drawing inferences about populations based on samples. It includes both descriptive and inferential statistics.
2. The document defines key terms used in statistical analysis like population, sample, statistical analysis, and discusses various statistical measures like mean, median, mode, interquartile range, and standard deviation.
3. The purposes of statistical analysis are outlined as measuring relationships, making predictions, testing hypotheses, and summarizing results. Both parametric and non-parametric statistical analyses are discussed.
This document discusses various types of analysis of variance (ANOVA) statistical tests. It begins with an introduction to one-way ANOVA for comparing the means of three or more independent groups. Requirements for one-way ANOVA include a nominal independent variable with three or more levels and a continuous dependent variable. Assumptions of one-way ANOVA include normality and homogeneity of variances. The document then briefly discusses two-way ANOVA, MANOVA, ANOVA with repeated measures, and related statistical tests. Examples of each type of ANOVA are provided.
This document provides an overview of inferential statistics. It defines inferential statistics as using data to make generalizations about a larger population beyond the available sample data. Key points include:
- Inferential statistics uses hypothesis testing and estimation to analyze data. It involves making inferences about a population from a sample.
- Hypothesis testing involves forming a null and alternative hypothesis, setting a significance level, choosing a statistical test, and making a decision to accept or reject the null hypothesis based on the p-value or critical value.
- Estimation provides point estimates and interval estimates like confidence intervals to describe population parameters based on sample data.
- Common inferential statistical tests covered are z-tests, t
The document provides an overview of inferential statistics. It defines inferential statistics as making generalizations about a larger population based on a sample. Key topics covered include hypothesis testing, types of hypotheses, significance tests, critical values, p-values, confidence intervals, z-tests, t-tests, ANOVA, chi-square tests, correlation, and linear regression. The document aims to explain these statistical concepts and techniques at a high level.
OBJECTIVES:
Run the test of hypothesis for mean difference using paired samples. Construct a confidence interval for the difference in population means using paired samples.
Observation of interest will be the difference in the readings
before and after intervention called paired difference observation.
Paired t test:
A paired t-test is used to compare two means where you have two samples in which observations in one sample can be paired with observations in the other sample.
Examples of where this might occur are:
Before-and-after observations on the same subjects (e.g. students’ test
results before and after a particular module or course).
A comparison of two different methods of measurement or two different treatments where the measurements/treatments are applied to the same subjects (e.g. blood pressure measurements using a sphygmomanometer and a dynamap).
When there is a relationship between the groups, such as identical twins.
This test is concerned with the pair-wise differences
between sets of data.
This means that each data point in one group has a related data point in the other group (groups always have equal numbers).
ASSUMPTIONS:
The sample or samples are randomly selected
The sample data are dependent
The distribution of differences is approximately normally
distributed.
Note: The under root is onto the entire numerator and denominator, so you should take the root after solving it entirely
where “t” has (n-1) degrees of freedom and “n” is
the total number of pairs.
Chapter 7
Hypothesis Testing Procedures
Learning Objectives
• Define null and research hypothesis, test
statistic, level of significance and decision rule
• Distinguish between Type I and Type II errors
and discuss the implications of each
• Explain the difference between one- and two-
sided tests of hypothesis
Learning Objectives
• Estimate and interpret p-values
• Explain the relationship between confidence interval
estimates and p-values in drawing inferences
• Perform analysis of variance by hand
• Appropriately interpret the results of analysis of
variance tests
• Distinguish between one and two factor analysis of
variance tests
Learning Objectives
• Perform chi-square tests by hand
• Appropriately interpret the results of chi-square tests
• Identify the appropriate hypothesis testing procedures
based on type of outcome variable and number of
samples
Hypothesis Testing
• Research hypothesis is generated about
unknown population parameter
• Sample data are analyzed and determined to
support or refute the research hypothesis
Hypothesis Testing Procedures
Step 1
Null hypothesis (H0):
No difference, no change
Research hypothesis (H1):
What investigator
believes to be true
Hypothesis Testing Procedures
Step 2
Collect sample data and determine whether sample
data support research hypothesis or not.
For example, in test for m, evaluate .
X
Hypothesis Testing Procedures
Step 3
• Set up decision rule to decide when to believe null
versus research hypothesis
• Depends on level of significance, a = P(Reject H0|H0
is true)
Hypothesis Testing Procedures
Steps 4 and 5
• Summarize sample information in test statistic (e.g.,
Z value)
• Draw conclusion by comparing test statistic to
decision rule. Provide final assessment as to whether
H1 is likely true given the observed data.
P-values
• P-values represent the exact significance of the
data
• Estimate p-values when rejecting H0 to
summarize significance of the data (can
approximate with statistical tables, can get
exact value with statistical computing
package)
• P-value is the smallest a where we still reject
H0
Hypothesis Testing Procedures
1. Set up null and research hypotheses, select a
2. Select test statistic
2. Set up decision rule
3. Compute test statistic
4. Draw conclusion & summarize significance
Errors in Hypothesis Tests
Hypothesis Testing for m
• Continuous outcome
• 1 Sample
H0: m=m0
H1: m>m0, m<m0, m≠m0
Test Statistic
n>30 (Find critical
value in Table 1C,
n<30 Table 2, df=n-1)
ns/
μ-X
Z
0
=
ns/
μ-X
t
0
=
Example 7.2.
Hypothesis Testing for m
The National Center for Health Statistics (NCHS)
reports the mean total cholesterol for adults is 203. Is
the mean total cholesterol in Framingham Heart
Study participants significantly different?
In 3310 participants the mean is 200.3 with a standard
deviation of 36.8.
Example 7.2.
Hypothesis Test ...
1) Non-parametric tests make fewer assumptions than parametric tests about the population distribution. They do not require the assumptions of normality and equal variances.
2) Some common non-parametric tests described in the document include the Mann-Whitney U test for comparing two independent samples, the Wilcoxon Rank Sum test for comparing two independent samples, and the Wilcoxon Signed Rank test for comparing two related samples.
3) The Kruskal-Wallis H test is also described, which is the non-parametric equivalent of the one-way ANOVA and can be used to compare three or more independent samples.
The document defines various statistical measures and types of statistical analysis. It discusses descriptive statistical measures like mean, median, mode, and interquartile range. It also covers inferential statistical tests like the t-test, z-test, ANOVA, chi-square test, Wilcoxon signed rank test, Mann-Whitney U test, and Kruskal-Wallis test. It explains their purposes, assumptions, formulas, and examples of their applications in statistical analysis.
The document provides an overview of statistical hypothesis testing and various statistical tests used to analyze quantitative and qualitative data. It discusses types of data, key terms like null hypothesis and p-value. It then outlines the steps in hypothesis testing and describes different tests of significance including standard error of difference between proportions, chi-square test, student's t-test, paired t-test, and ANOVA. Examples are provided to demonstrate how to apply these statistical tests to determine if differences observed in sample data are statistically significant.
1. The sampling distribution of a statistic is the distribution of all possible values that statistic can take when calculating it from samples of the same size randomly drawn from a population. The sampling distribution will have the same mean as the population but lower variance equal to the population variance divided by the sample size.
2. For a sample mean, the sampling distribution will be approximately normal according to the central limit theorem. A 95% confidence interval for the population mean can be constructed as the sample mean plus or minus 1.96 times the standard error of the mean.
3. For a sample proportion, the sampling distribution will also be approximately normal. A 95% confidence interval can be constructed as the sample proportion plus or minus 1
This document discusses measures of central tendency and dispersion. It begins by defining measures of central tendency as statistical measures that describe the position of a distribution. The most commonly used measures of central tendency for a univariate context are the mean, median, and mode. The document then discusses the arithmetic mean in detail, including how to calculate the mean for individual, discrete, and continuous data series using direct and shortcut methods. It also covers the geometric mean and how to calculate it using logarithms for individual, discrete, and continuous data series. Various examples and practice problems are provided.
This document discusses parametric statistical tests. It defines parametric tests as those that make assumptions about the population distribution parameters. The key parametric tests covered are: t-tests (paired, unpaired, one sample), ANOVA (one way, two way), Pearson's correlation, and the z-test. Details are provided on the assumptions, calculations, and applications of each test. T-tests are used to compare means, ANOVA compares multiple group means, Pearson's r measures correlation between variables, and the z-test is for large samples when the population standard deviation is known.
This document describes the steps for conducting an independent samples t-test. The t-test is used to compare the means of two independent groups on a continuous dependent variable. It tests whether the means of the two groups are statistically significantly different from each other. The steps include: 1) stating the null and alternative hypotheses, 2) setting the significance level, 3) calculating the t-value, 4) finding the critical t-value, and 5) making a conclusion about whether to reject the null hypothesis based on the t-values. An example compares math test scores of male and female college students to determine if gender significantly impacts scores.
Testing hypothesis (methods of testing the statement of organizations)syedahadisa929
My ppt is about the testing hypothesis which is used in statistics to check whether the statement of company, organization, or institution is true or false
• Non parametric tests are distribution free methods, which do not rely on assumptions that the data are drawn from a given probability distribution. As such it is the opposite of parametric statistics
• In non- parametric tests we do not assume that a particular distribution is applicable or that a certain value is attached to a parameter of the population.
When to use non parametric test???
1) Sample distribution is unknown.
2) When the population distribution is abnormal
Non-parametric tests focus on order or ranking
1) Data is changed from scores to ranks or signs
2) A parametric test focuses on the mean difference, and equivalent non-parametric test focuses on the difference between medians.
1) Chi – square test
• First formulated by Helmert and then it was developed by Karl Pearson
• It is both parametric and non-parametric test but more of non - parametric test.
• The test involves calculation of a quantity called Chi square.
• Follows specific distribution known as Chi square distribution
• It is used to test the significance of difference between 2 proportions and can be used when there are more than 2 groups to be compared.
Applications
1) Test of proportion
2) Test of association
3) Test of goodness of fit
Criteria for applying Chi- square test
• Groups: More than 2 independent
• Data: Qualitative
• Sample size: Small or Large, random sample
• Distribution: Non-Normal (Distribution free)
• Lowest expected frequency in any cell should be greater than 5
• No group should contain less than 10 items
Example: If there are two groups, one of which has received oral hygiene instructions and the other has not received any instructions and if it is desired to test if the occurrence of new cavities is associated with the instructions.
2) Fischer Exact Test
• Used when one or more of the expected counts in a 2×2 table is small.
• Used to calculate the exact probability of finding the observed numbers by using the fischer exact probability test.
3) Mc Nemar Test
• Used to compare before and after findings in the same individual or to compare findings in a matched analysis (for dichotomous variables).
Example: comparing the attitudes of medical students toward confidence in statistics analysis before and after the intensive statistics course.
4) Sign Test
• Sign test is used to find out the statistical significance of differences in matched pair comparisons.
• Its based on + or – signs of observations in a sample and not on their numerical magnitudes.
• For each subject, subtract the 2nd score from the 1st, and write down the sign of the difference.
It can be used
a. in place of a one-sample t-test
b. in place of a paired t-test or
c. for ordered categorial data where a numerical scale is inappropriate but where it is possible to rank the observations.
5) Wilcoxon signed rank test
• Analogous to paired ‘t’ test
6) Mann Whitney Test
• similar to the student’s t test
7) Spearman’s rank correlation - similar to pearson's correlation.
In a left-tailed test comparing two means with variances unknown b.docxbradburgess22840
In a left-tailed test comparing two means with variances unknown but assumed to be equal, the sample sizes were n1 = 8 and n2 = 12. At α = .05, the critical value would be:
-1.645
-2.101
-1.734
-1.960
In the t test for independent groups, ____.
we estimate µ1 µ2
we estimate 2
we estimate X1-X2
df = N 1
Exhibit 14-1
A professor of women's studies is interested in determining if stress affects the menstrual cycle. Ten women are randomly sampled for an experiment and randomly divided into two groups. One of the groups is subjected to high stress for two months while the other lives in a relatively stress-free environment. The professor measures the menstrual cycle (in days) of each woman during the second month. The following data are obtained.
High stress
20
23
18
19
22
Relatively stress free
26
31
25
26
30
Refer to Exhibit 14-1. The obtained value of the appropriate statistic is ____.
tobt = 4.73
tobt = 4.71
tobt = 3.05
tobt = 0.47
Refer to Exhibit 14-1. The df for determining tcrit are ____.
4
9
8
3
Refer to Exhibit 14-1. Using = .052 tail, tcrit = ____.
+2.162
+2.506
±2.462
±2.306
Refer to Exhibit 14-1. Using = .052 tail, your conclusion is ____.
accept H0; stress does not affect the menstrual cycle
retain H0; we cannot conclude that stress affects the menstrual cycle
retain H0; stress affects the menstrual cycle
reject H0; stress affects the menstrual cycle
Refer to Exhibit 14-1. Estimate the size of the effect. = ____
0.8102
0.6810
0.4322
0.5776
A major advantage to using a two condition experiment (e.g. control and experimental groups) is ____.
the test has more power
the data are easier to analyze
the experiment does not need to know population parameters
the test has less power
Which of the following tests analyzes the difference between the means of two independent samples?
correlated t test
t test for independent groups
sign test
test of variance
If n1 = n2 and n is relatively large, then the t test is relatively robust against ____.
violations of the assumptions of homogeneity of variance and normality
violations of random samples
traffic violations
violations by the forces of evil
Exhibit 14-3
Five students were tested before and after taking a class to improve their study habits. They were given articles to read which contained a known number of facts in each story. After the story each student listed as many facts as he/she could recall. The following data was recorded.
Before
10
12
14
16
12
After
15
14
17
17
20
Refer to Exhibit 14-3. The obtained value of the appropriate statistic is ____.
3.92
3.06
4.12
2.58
Refer to Exhibit 14-3. What do you conclude using = 0.052 tail?
reject H0; the class appeared to improve study habits
retain H0; the class had no effect on study habits
retain H0; we cannot conclude that the class improved study habits
accept H0; the class appeared to improve study habits
Which of the following is (are) assumption(.
This document discusses various statistical hypothesis tests including z-tests, t-tests, F-tests, and chi-square tests. It provides examples and explanations of how to perform hypothesis tests to test for differences between means and variances. It discusses key concepts like type I and type II errors, level of significance, critical regions, and test statistics. Formulas and steps are provided for performing z-tests, t-tests, and F-tests on single and two sample data. Examples of applying these tests to real data sets are also included.
What is meant by hypothesis testing?
Hypothesis testing - Maths - Expert help guides at La Trobe ...
Hypothesis testing is a systematic procedure for deciding whether the results of a research study support a particular theory which applies to a population. Hypothesis testing uses sample data to evaluate a hypothesis about a population.
This document provides an overview of statistical tests of significance used to analyze data and determine whether observed differences could reasonably be due to chance. It defines key terms like population, sample, parameters, statistics, and hypotheses. It then describes several common tests including z-tests, t-tests, F-tests, chi-square tests, and ANOVA. For each test, it outlines the assumptions, calculation steps, and how to interpret the results to evaluate the null hypothesis. The goal of these tests is to determine if an observed difference is statistically significant or could reasonably be expected due to random chance alone.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Aggregage
This webinar will explore cutting-edge, less familiar but powerful experimentation methodologies which address well-known limitations of standard A/B Testing. Designed for data and product leaders, this session aims to inspire the embrace of innovative approaches and provide insights into the frontiers of experimentation!
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
End-to-end pipeline agility - Berlin Buzzwords 2024Lars Albertsson
We describe how we achieve high change agility in data engineering by eliminating the fear of breaking downstream data pipelines through end-to-end pipeline testing, and by using schema metaprogramming to safely eliminate boilerplate involved in changes that affect whole pipelines.
A quick poll on agility in changing pipelines from end to end indicated a huge span in capabilities. For the question "How long time does it take for all downstream pipelines to be adapted to an upstream change," the median response was 6 months, but some respondents could do it in less than a day. When quantitative data engineering differences between the best and worst are measured, the span is often 100x-1000x, sometimes even more.
A long time ago, we suffered at Spotify from fear of changing pipelines due to not knowing what the impact might be downstream. We made plans for a technical solution to test pipelines end-to-end to mitigate that fear, but the effort failed for cultural reasons. We eventually solved this challenge, but in a different context. In this presentation we will describe how we test full pipelines effectively by manipulating workflow orchestration, which enables us to make changes in pipelines without fear of breaking downstream.
Making schema changes that affect many jobs also involves a lot of toil and boilerplate. Using schema-on-read mitigates some of it, but has drawbacks since it makes it more difficult to detect errors early. We will describe how we have rejected this tradeoff by applying schema metaprogramming, eliminating boilerplate but keeping the protection of static typing, thereby further improving agility to quickly modify data pipelines without fear.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
2. INFERENTIAL STATISTICS
Inferential Statistics
• Refers to the statistical procedure used in the drawing of inferences about the
properties of population from sample data.
Test of Hypothesis
• It is a statistical tool that determines whether there is a statistically significant
difference between two or more groups, or whether there is a statistically significant
relationship between two or more variables.
Hypothesis
• It is a statement or tentative theory which aims to explain a facts about the real world.
• They are subjected to testing.
If they are found to be statistically true, they are accepted
If they are found to be statistically false, they are rejected
3. INFERENTIAL STATISTICS
Two Kinds of Hypothesis
1. Null Hypothesis (H0). A hypothesis that may either be rejected or accepted
2. Alternative Hypothesis (Ha). It generally represents the hypothetical statement the
that the researcher wanted to prove.
• Summary:
REJECTION of H0 implies ACCEPTANCE of Ha
ACCEPTANCE of H0 implies REJECTION of Ha
• Possible Error when Making Decision about the Proposed Hypothesis- SUMMARY:
TYPE I and TYPE II ERRORS
DECISION H0 = TRUE Actual Condition
Ha = TRUE
Reject H0 Type I Error Correct Decision
Accept H0 Correct Decision Type II Error
• The probability of making a type 1 error or alpha error in a test is called a significance
level of a test.
4. INFERENTIAL STATISTICS
Steps in Hypothesis Testing
1. Formulate the null hypothesis (H0) that there is no significant difference between
items being compared.
2. Set the level of significance.
3. Determine the test to be used.
4. Determine the tabular value for the test.
5. Compute for z-test or t-test as needed.
z – test
1. Sample mean compared with Population mean
FORMULA:
z =
X − µ
σ
𝑛
where: z = z-test
X = sample mean
µ = population mean
σ = population standard deviation
n = number of items within the sample
5. INFERENTIAL STATISTICS
2. Comparing two sample mean
FORMULA: Z =
X1 − X2
σ 1
n1
+
1
n2
where: z = z-test
X1 = mean of the first sample
X2 = mean of the second sample
n1 = number of items in the second sample
σ = population standard deviation
3. Comparing two sample proportions
FORMULA:
where: P1 = proportion of the first sample
q1 = 1 - P1
P2 = proportion of second sample
q2 = 1 - P2
n1 = number of items in the first sample
n2 = number of items in the second sample
𝑧 =
P1 −
P2
P1q1
n1
+
P2q2
n2
6. INFERENTIAL STATISTICS
EXAMPLE 1
Data from a school census show that the mean weight of college students was 45 kilos,
with a standard deviation of 3 kilos. A sample of 100 college students were found to have
a mean weight of 47 kilos. Are the 100 college students really heavier than the rest, using
.05 significance level?
Step 1: H0 : The 100 college students are not really heavier that the rest. (X = 45 kilos)
Ha : The 100 college students are really heavier that the rest. (X > 45 kilos)
Step 2: Set 0.05 level of significance
Step 3: The standard deviation given is based on the population. Therefore the z-test is to be
used
Step 4: Based on the table (critical value of z), the tabular value of z for one tailed test at 0.05
level of significance is 1.645
Step 5: The given values in the problem are:
X = 47 kilos
µ = 45 kilos
σ = 3 kilos
n = 100
7. INFERENTIAL STATISTICS
FORMULA: z =
X − µ
σ
𝑛
z =
47 −45
3
100
=
2
3
10
=
2
0.3
= 6.67
Step 6: The computed value of 6.67 is greater than the tabular value 1.645. Therefore, the null
hypothesis is rejected
8. INFERENTIAL STATISTICS
EXAMPLE 2
A researcher wishes to find out whether or not there is significant difference between the
monthly allowance of morning and afternoon students In his school. By random
sampling, he took a sample of 239 students in the morning session. These students were
found to have a mean of monthly allowance of ₱142.00. The researcher also took a
sample of 209 students in the afternoon session. They were found to have a mean
monthly allowance of ₱148.00. The total population of students in that school has a
standard deviation of ₱40. Is there a significant difference between the two samples at
0.01 level of significance?
H0 : There is no significant difference between the samples
Ha : There is significant difference between the samples
FORMULA: Z =
X1 − X2
σ 1
n1
+
1
n2
9. INFERENTIAL STATISTICS
Z =
142 −148
40 1
239
+
1
209
=
−6
40
0.0042 +0.0048
=
−6
40
0.0090
=
−6
40 (0.095)
=
−6
3.8
= −1.579
The computed value of 1.579 is less than the tabular value of 2.58 at 0.01 level of significance.
Accept the null hypothesis
10. INFERENTIAL STATISTICS
EXAMPLE 3
A sample survey of a television program in Metro Manila shows that 80 of 200 men
dislike the same program. We want to decide whether the difference between the two
sample proportion,
80
200
= 0.40 and
75
250
= 0.30, is significant or not at 0.05 level of
significance.
H0 : There is no significant difference between the two sample proportions
Ha : There is significant difference between the two sample proportions
The given values in the problem are:
P1 = 0.40 q1 = 1 - P1 = 1 – 0.40 = 0.60
P2 = 0.30 q2 = 1 - P2 = 1 – 0.30 = 0.70
n1 = 200 n2 = 250
𝑧 =
P1 −
P2
P1q1
n1
+
P2q2
n2
𝑧 =
0.40 − 0.30
(0.40) (0.60)
200
+
(0.30) 0.70)
250
=
0.10
0.24
200
+
0.21
250
=
0.10
0.0012 + 0.00084
12. INFERENTIAL STATISTICS
t – test
1. Sample mean compared with Population mean
FORMULA: t =
X − µ
𝑠
𝑛 −1
or t =
X − µ 𝑛 −1
𝑠
where: t = t-test
X = sample mean
µ = population mean
s = sample standard deviation
n = number of items in the sample
EXAMPLE: A researchers knows that the average height of Filipino women is 1.525
meters. A random sample of 26 women was taken and was found to have a mean
height of 1.56 , with standard deviation of .10 meters. Is there reason to believe that
the 26 women in the sample are significantly taller than the others .05 significance
level?
H0 : The sample is not significantly taller than the other Filipino women
Ha : The sample is significantly taller than the others
13. INFERENTIAL STATISTICS
The given values in the problem are:
X = 1.56 meters
µ = 1.525 meters
s = .10 meters
n = 26
degrees of freedom = n – 1
= 26 – 1
= 25
FORMULA: t =
X − µ
𝑠
𝑛 −1
t =
1.56 −1.525
.10
26 −1
t =
0.035
.10
25
t =
0.035
0.02
t = 1.75
The computed value of 1.75 is greater than the tabular value 1.708, the null hypothesis is
rejected
14. INFERENTIAL STATISTICS
2. Comparing two sample means
FORMULA: t =
X1 − X2
(𝑛1−1) 𝑠1
2+(𝑛2−1) 𝑠2
2
𝑛1+ 𝑛2 −2
1
𝑛1
+
1
𝑛2
where: t = t-test
X1 = mean of the first sample
X2 = mean of the second sample
𝑠1 = standard deviation of the first sample
𝑠2 = standard deviation of the second sample
𝑛1 = number of items in the first sample
𝑛2 = number of items in the second sample
15. INFERENTIAL STATISTICS
EXAMPLE: A teacher wishes to test whether or not the Case Method of teaching is more effective that
the Traditional Method. She picks two classes of approximately equal intelligence (verified trough an
administered IQ test). She gathers a sample of 18 students to whom she uses the Case Method and
another sample of 14 students to whom she uses the Traditional Method. After the experiment, an
objective tests revealed that the first sample got a mean score of 28.6 with a standard deviation of 5.9,
while the second group got a mean score of 21.7 with a standard deviation of 4.6. Based on the result
of the administered test, can we say that the Case Method is more effective that the Traditional
Method?
H0 : The Case Method is as effective as the Traditional Method
Ha : The Case Method is more effective that the Traditional Method
Given:X1 = 28.6 X2 = 21.7
𝑠1 = 5.9 𝑠2 = 4.6
𝑛1 = 18 𝑛2 = 14
degrees of freedom = 𝑛1 + 𝑛2 - 2
= 18 + 14 – 2
= 30
16. INFERENTIAL STATISTICS
FORMULA: t =
X1 − X2
(𝑛1−1) 𝑠1
2+(𝑛2−1) 𝑠2
2
𝑛1+ 𝑛2 −2
1
𝑛1
+
1
𝑛2
t =
28.6 −21.7
18−1 5.9 2+(14−1) 4.6 2
18 +14 −2
1
18
+
1
14
t =
6.9
17 (34.81)+(13) (21.16)
32 −2
0.06+0.07
t =
6.9
591.77+275.08
30
0.13
t =
6.9
28.895 0.13
t =
6.9
3.756
t =
6.9
1.94
t =3.56
The computed t-value of 3.56 is greater than the tabular value 1.697, therefore the null
hypothesis is rejected
17. INFERENTIAL STATISTICS
Analysis of Variance (ANOVA)
FORMULA:
F =
MSS𝑏
MSS𝑤
• ANOVA is based upon two sources of variation – (1) the between – column variance;
(2) the within – column variance
• The two variance was sometimes called as between – column sum of squares (𝐒𝐒𝒃)
and the within – column sum of squares (𝑺𝑺𝑤). The sum of the two variances make up
the total sum of squares (TSS).
FORMULA: TSS = 𝑥2 −
( 𝑥)2
𝑁
where: x = refers to the value of each entry
N = refers to the total number of items
EXAMPLE: Let us take three groups of 6 students each, where each group is subjected to
one of three types of teaching method. The grades of the students are taken at the end
of the semester and enumerated according to grouping. The one way classification
model will look like this:
19. INFERENTIAL STATISTICS
• The total sum of squares is computed as follows:
TSS = 136, 484 -
(1,560)2
18
= 136, 484 -
2,433,600
18
= 136, 484 - 135, 200
= 1, 284
• The between-column variance or between-column sum of squares is of the sum of the squares
of the column sum, minus the correction term, where r refers to the number of rows.
1
𝑟
𝑆𝑆𝑏 =
1
𝑁𝑜. 𝑜𝑓 𝑅𝑜𝑤𝑠
(𝑠𝑢𝑚 𝑜𝑓 𝑒𝑎𝑐ℎ 𝑐𝑜𝑙𝑢𝑚𝑛)2
−
( 𝑥)
2
𝑁
𝑆𝑆𝑏 =
1
6
(5342
+ 4652
+ 5612
) −
(1, 560)2
18
𝑆𝑆𝑏 =
1
6
(285, 156 + 216, 255 + 314, 721) −
(2, 433, 600)
18
𝑆𝑆𝑏 =
816, 132
6
−
(2, 433, 600)
18
𝑆𝑆𝑏 = 136, 022 − 135, 200
𝑆𝑆𝑏 = 822
TSS = 𝑥2
−
( 𝑥)2
𝑁
20. INFERENTIAL STATISTICS
• The within-column variance or within-column sum of squares is the difference between the total
sum of squares and the between-column sum of squares.
𝑆𝑆𝑤 = TSS − SS𝑏
= 1, 284 − 822
= 462
• We can make use any of the following in getting the “degrees of freedom”:
Total degress of freedom (df) = N − 1
= 18 − 1
= 𝟏𝟕
Total degress of freedom (df) = rk − 1
= (3 x 6) − 1
= 18 − 1
= 𝟏𝟕
Between-column df = Number of Columns – 1
= 3 − 1
= 𝟐
Within-column df = total df – between column df
= 17 − 2
= 𝟏𝟓
21. INFERENTIAL STATISTICS
• To compute for the Mean Sum of Squares
𝑀𝑆𝑆𝑏 =
SS𝑏
df𝑏
=
822
2
= 411
𝑀𝑆𝑆𝑤 =
SS𝑤
df𝑤
=
462
15
= 30.8
• To compute for the F-test
F =
MSS𝑏
MSS𝑤
F =
411
30.8
= 13.34
22. INFERENTIAL STATISTICS
• ANOVA Table on the Three Samples Subjected to Different Teaching Method
• The tabular value: 3.68 at 5% level of significance
• DECISION: The null hypothesis is rejected considering that the computed value of 13.34 is greater
than the tabular value of 3.68
23. INFERENTIAL STATISTICS
Chi-square Test (𝑿𝟐
)
• Use of Chi-square Test
1. For estimating how closely an observed distribution matches an expected
distribution, also known as good-of-fit test.
2. For estimating whether two random variables are independent, also called as test
of independence.
FORMULA: For Good-of-Fit Test
𝑋2
=
(OF − EF)2
EF
FORMULA: For Test of Independence
𝑋2
=
(OF − EF)2
EF
EF =
Row Total x Column Total
n
24. INFERENTIAL STATISTICS
EXAMPLE
• Chi-square for a Good-of-Fit Test
A two dice (dice A and B) with six sides was rolled out 10 times with a chance of any
particular number coming out was the same: 1 in 6. If the dice is loaded, there are certain
numbers which will have a greater chance of appearing, while others will have a lower
chance. The researcher observed the following in one dice (A).
𝑥2
=
(18−10)2
10
+
(5−10)2
10
+
(9−10)2
10
+
(7−10)2
10
+
(5−10)2
10
+
(16−10)2
10
𝑥2 = 6.4 + 2.5 + 0.1 + 0.9 + 2.5 + 3.6
𝑥2 = 16 CONCLUSION: There is very low chance that these rolls came from a fair dice considering that the calculated
value of 16 is greater that the tabular value of 11.07. This means that there is statistically significant difference
between the two dices.
25. INFERENTIAL STATISTICS
• Chi-square Test for Independence
• Test the hypothesis that academic performance does not depend on IQ at 1% significance level.
• Degrees of Freedom (df) = (r – 1) (k – 1)
= (2 – 1) (3 – 1)
= 2
• COMPUTATION: Getting the 𝐅𝒆
Where 𝑓𝑜 = 31, 𝑓𝑒 =
32 x 80
100
= 25.6
Where 𝑓𝑜 = 1, 𝑓𝑒 =
32 x 20
100
= 6.4
Where 𝑓𝑜 = 45, 𝑓𝑒 =
49 x 80
100
= 39.20
Where 𝑓𝑜 = 4, 𝑓𝑒 =
49 x 20
100
= 9.80
Where 𝑓𝑜 = 4, 𝑓𝑒 =
19 x 80
100
= 15.2
Where 𝑓𝑜 = 15, 𝑓𝑒 =
19 x 20
100
= 3.80
26. INFERENTIAL STATISTICS
• Replacing the above values into the chi-square formula, we shall have:
𝑋2 =
(OF − EF)2
EF
𝑋2
=
(31 −25.6)2
25.6
+
(1 −6.4)2
6.4
+
(45 −39.2)2
39.2
+
(4 −9.80)2
9.80
+
(4 −15.2)2
15.2
+
(15 −3.80)2
3.80
𝑋2 =
29.16
25.6
+
29.16
6.4
+
33.64
39.2
+
33.64
9.80
+
125.44
15.2
+
125.44
3.80
𝑋2 = 1.139 + 4.556 + 0.858 + 3.433 + 8.253 + 33.011
𝑋2 = 51.25
• Since the computed Chi-square value of 51.25 is greater than the tabular value of 9.21, the null
hypothesis is rejected. For the 100 students, academic performance depends on IQ
27. INFERENTIAL STATISTICS
Simple Regression Analysis
• Regression analysis is concerned with the problem of estimation and forecasting.
TYPES OF RELATIONSHIP
1. Direct Relationship. The slope of the line is positive because Y increases as X also
increases
2. Inverse Relationship. The slope of the line is negative because Y decreases as X also
increases
• Least Square Regression Line or LSRL is a statistical technique that analyses the
relationship between the independent and dependent variables.
EQUATION:
Y = a + bX
NORMAL EQUATIONS:
1. ΣY = aN + bΣX
2. ΣXY = aΣX + bΣ𝑿𝟐
28. INFERENTIAL STATISTICS
WHERE: ΣY = sum of the values of Y, the dependent variable
N = the number of pairs of X and Y
ΣX = sum of the values of X, the independent variable
ΣXY = the sum of the column XY, which is derived by multiplying the paired values of X and Y
Σ𝐗𝟐
= the sum of the column 𝐗𝟐
is derived by squaring the values of X
• Based on the given data of X and Y, we can determine all of the above which means
that the two normal equations now consist of a system of two linear equations with
two unknowns – a and b
• FORMULAS:
a=
𝚺𝐘 𝚺𝐗𝟐 − 𝚺𝐗 (𝜮𝑿𝒀)
𝐍 𝜮𝐗𝟐 −(𝜮𝐗)𝟐
b=
𝐍 𝜮𝐗𝐘 − 𝜮𝐗 (𝜮𝐘)
𝐍 𝜮𝐗𝟐 − (𝜮𝐗)𝟐
30. INFERENTIAL STATISTICS
Simple Correlation Analysis
• Correlation analysis concerned with the relationship in the changes of such variables.
• Degrees of correlation or relationship between two variables
1. Perfect correlation (negative and positive)
2. Some degrees of correlation (negative and positive)
3. No correlation
• The concept of correlation in terms of computed value is called correlation coefficient.
The value of the correlation coefficient ranges from -1 to +1.
• Pearson r test. The Pearson Product-Moment Coefficient of Correlation, otherwise
known as Pearson r is the most commonly used correlation coefficient.
• Pearson r, as the most widely used measure of correlation has two basic assumptions,
to wit:
1. The existence of linear relationship; and
2. The level of measurement of the data for the two variables are either in interval or
ratio scale.
31. INFERENTIAL STATISTICS
• The value of r (degree of linear relationship) can be interpreted according to the use of
range of values for the Pearson Product Moment of Correlation Coefficient, as follows:
• Notably, Pearson r is not a measure of causality. The significant of the obtained
correlation coefficient can be determined through the use of t-test for testing the
significance of r.
𝐭 = 𝐫
𝐧 − 𝟐
𝟏 − 𝐫𝟐
WHERE: t = t-test
r = obtained Pearson r value
n = paired sample size
FORMULA:
Degree of freedom = n – 2
32. INFERENTIAL STATISTICS
FORMULA for Pearson r:
𝒓 =
𝐍 𝚺𝐗𝐘 − 𝚺𝐗 (𝚺𝐘)
𝐍 𝜮𝐗𝟐 − (𝜮𝐗)𝟐 𝐍 𝜮𝐘𝟐 − (𝜮𝐘)𝟐
Where: r = correlation coefficient
N = total number of pair variables
X = the first variable under study
Y = the second variable under study
EXAMPLE:
A researcher wants to find out about the relationship between the performance of a sample
of five Peace and Security students in Political Science and Peace Security subjects:
34. INFERENTIAL STATISTICS
• Thus, there is moderate negative relationship between the performance of a sample of five Peace
and Security students in Political Science and Peace Security subjects
• The significance of t-value to determine whether to reject 𝐇𝐎 and accept 𝐇𝐚 or otherwise, thus the
researcher can generalize whether there is direct, indirect or no correlation between variables.
Computed t value = -1.30
Critical value of t at 0.05 level of significance = 2.353
If computed t-value > critical value of t = REJECT 𝐇𝐎
If the computed t-value < critical value of t = ACCEPT 𝐇𝐎
CONCLUSION: Since the computed t-value is lesser than the critical value of t, the null
hypothesis, is ACCEPTED
• Hence, we can say that the performance of the five students of Peace and Security in Political
Science and Peace Security subjects had a moderate negative correlation with no significant
relationship exist between the said variables.
Editor's Notes
NOTE 1: (Definition of Inferential). Inferential statistics demands a higher order of critical judgment and mathematical methods. It aims to give information about large groups of data without dealing with each and every element of these groups. It uses only a small portion of the total set of data in order to draw conclusions or judgments regarding the entire set.
NOTE 2: (Test of Hypothesis). It is a procedure used to substantiate or invalidate a claim which is stated as a null hypothesi.s
NOTE 1 (NULL HYPOTHESIS). The hypothesis is ACCEPTED when the difference or relationship found are due to chance variations. This means that the independent variable had no effect on dependent variable or that the two means are not statistically different.
NOTE 2 (NULL HYPOTHESIS). The hypothesis is REJECTED when the difference or relationship is too large to have occurred due to chance. It means that there exists a real relationship or difference between two variables in the populations.
NOTE (TYPE 1 AND TYPE II ERROR). Type I error (alpa-error) – when we reject the null hypothesis (action) when in fact the null hypothesis or H0 is true (actual condition) and therefore the alternative hypothesis or Ha is false. Type II error (beta-error) - when we accept the null hypothesis (action) when in fact the null hypothesis is false (actual condition) and therefore the alternative hypothesis or Ha is true.
NOTE ON ITEM NO. 3. Use z-test if population standard deviation is given, and t-test if the standard deviation given is from the samples
NOTE ON ITEM NO. 4. For z-test use the table of the critical values of z based on the area of the normal curve. For t-test, one must first compute for the degrees of freedom, then look for the tabular value from the table of t-distribution. In getting the degrees of freedom is, for a single sample, df = number of items – 1 (df = n – 1). For two samples the formula is df = n1 + n2
NOTE 1 (ANOVA): Is a technique in inferential statistics designed to test whether or not more than two samples are significantly different from each other.
NOTE 1: (CHI-SQUARE TEST): Chi-square is a versatile statistical test named after the chi-square distribution which is derived under the assumption of normality of the population. It is used to compare the observed proportion of observations falling into different categories (observed frequencies) with the proportion that would occur by chance (expected frequencies)
NOTE: With the distribution, it appears that 1’s and 6’s came out more than they are expected to come out. The other came out fewer than expected. Thus, the differences occurred by chance. Using the Chi-square test, the researcher can estimate the likelihood that the values observed in said Dice A occurred by chance. The idea of the chi-square for good-of-fit test is to compare the observed and expected values. There were six terms in the above table. The number of the degrees of freedom is five (number of terms minus one).
NOTE1: To make forecasting, one must rely on the relationship between what is already known and what is to be estimated.
NOTE2: Regression analysis determine both the nature and strength of a relationship between two variables. The known variables is called independent variable (denoted as X) and the dependent variable (denoted as Y)
NOTE3: LSRA is a statistical tool that analyses the relationship between the independent and dependent variables.
NOTE4: LSRL: The term “Least Square” means that the most accurate trend line that may be drawn is one where the sum of the squares of the vertical distances of the points from the line is least or minimum. All other lines will yield a higher result. This is the same as saying that the sum of the vertical distances of the points above the line should be equal to the sum of the vertical distances of the points below the line. When these sums (above and below) are not equal, then the sum of the squares of the vertical distances of all points from the line is not minimum.Type equation here.
NOTE5: (EQUATION): Therefore, if we kow the a and b in the equation, we can solve for Y for any given value of X. The method using the LSRL is reduced to finding the equation of the trend line which in turn is found by solving for a and b in the equation. The formulas for a and b are derived from what are referred to as “NORMAL EQUATIONS”.
NOTE1: (BASED ON..): From Algebra, we known that under such a system we can solve for the values of the two unknowns (a and b) by employing any of the following methods: 1. Substitution; 2. Elimination; and 3. Determinants
NOTE2: In both formulas, we all need to know 𝚺𝐘, N, 𝚺𝐗, 𝚺𝐗𝐘, and 𝜮𝐗 𝟐
NOTE1: Positive Correlation relates two variables whose values are both increasing while Negative Correlation describes a situation where as one variable increases, the other variable decreases.
NOTE2: -1 signifies perfect negative correlation while +1 indicates perfect positive correlation. These in-between values, except zero, indicate some degree of correlation, whether positive or negative. A correlation coefficient of 0 indicates no correlation at all.
NOTE3: PEARSON r. It is used to describe or measure the closeness of the relationship between the two variable.