The document discusses correlation, regression, and hypothesis testing involving two variables. It defines correlation and the correlation coefficient r, which measures the strength of a linear relationship between two variables. Regression analyzes the relationship between variables to determine if it is positive/negative and linear/nonlinear. Hypothesis tests using r evaluate whether a linear correlation exists between two variables in a population. Confidence intervals and predictions can be made from significant relationships.
The document provides an overview of the chi-squared test and examples of its applications. It introduces the chi-squared test as a method to assess how well observed data fits expected theoretical results. Several examples are given demonstrating chi-squared tests of goodness of fit for binomial, Poisson, normal and contingency table distributions. Practice questions are also provided involving a range of chi-squared test applications.
This document discusses statistical methods for comparing means, including t-tests and analysis of variance (ANOVA). It explains how t-tests can be used to compare two means or paired samples, and how ANOVA can compare two or more means. Key assumptions and procedures are outlined for one-sample t-tests, paired t-tests, independent t-tests with equal and unequal variances, and one-way between-subjects ANOVAs.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
The document provides an overview of the chi-square test, including its formula, steps to calculate it, degrees of freedom, and uses. The chi-square test is a statistical test used to compare observed data to expected data. Its formula adds up the squared differences between observed and expected frequencies, divided by the expected frequencies. Degrees of freedom depend on whether the data is in a row/column or contingency table. The chi-square test can test goodness of fit, independence of attributes, and homogeneity.
This document provides information about statistical tests that can be used to make inferences when comparing two samples or populations. Specifically, it discusses:
- Tests for comparing two proportions, means, variances or standard deviations from independent and dependent samples using z-tests, t-tests and F-tests.
- The assumptions and procedures for each test, including how to determine critical values and calculate test statistics.
- Examples of how to perform hypothesis tests and construct confidence intervals for various statistical comparisons between two samples or populations using a TI calculator.
The chi-square test is used to determine if there is a relationship between two categorical variables in two or more independent groups. It can be used when data is arranged in a contingency table with observed and expected frequencies. A sample problem demonstrates how to calculate chi-square by finding the difference between observed and expected counts, squaring these differences, dividing by the expected counts, and summing across all cells. Degrees of freedom and critical values from tables determine whether to reject or fail to reject the null hypothesis of independence. Larger tables can be partitioned into subtables to identify where differences lie. Guidelines are provided for when chi-square or Fisher's exact test should be used based on sample size and expected cell counts.
Class24 chi squaretestofindependenceposthocBetynatha Kb
This document provides an overview of the chi-square test of independence through 18 slides. It defines independence, demonstrates it, discusses expected frequencies, and outlines the 5 steps for conducting a chi-square test of independence: 1) checking assumptions, 2) stating hypotheses and significance level, 3) identifying the sampling distribution and test statistic, 4) computing the test statistic, and 5) making a decision and interpreting results. It also covers examining standardized residuals to identify which cells are contributing most to a significant result.
The document provides instructions for conducting an independent samples t-test in SPSS. It explains how to specify the grouping and test variables, define the groups being compared, and set options. It also demonstrates running a t-test to compare mile times between athletes and non-athletes, checking assumptions, and interpreting the output, including Levene's test for equal variances and the t-test results.
The document provides an overview of the chi-squared test and examples of its applications. It introduces the chi-squared test as a method to assess how well observed data fits expected theoretical results. Several examples are given demonstrating chi-squared tests of goodness of fit for binomial, Poisson, normal and contingency table distributions. Practice questions are also provided involving a range of chi-squared test applications.
This document discusses statistical methods for comparing means, including t-tests and analysis of variance (ANOVA). It explains how t-tests can be used to compare two means or paired samples, and how ANOVA can compare two or more means. Key assumptions and procedures are outlined for one-sample t-tests, paired t-tests, independent t-tests with equal and unequal variances, and one-way between-subjects ANOVAs.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 10: Correlation and Regression
10.1: Correlation
The document provides an overview of the chi-square test, including its formula, steps to calculate it, degrees of freedom, and uses. The chi-square test is a statistical test used to compare observed data to expected data. Its formula adds up the squared differences between observed and expected frequencies, divided by the expected frequencies. Degrees of freedom depend on whether the data is in a row/column or contingency table. The chi-square test can test goodness of fit, independence of attributes, and homogeneity.
This document provides information about statistical tests that can be used to make inferences when comparing two samples or populations. Specifically, it discusses:
- Tests for comparing two proportions, means, variances or standard deviations from independent and dependent samples using z-tests, t-tests and F-tests.
- The assumptions and procedures for each test, including how to determine critical values and calculate test statistics.
- Examples of how to perform hypothesis tests and construct confidence intervals for various statistical comparisons between two samples or populations using a TI calculator.
The chi-square test is used to determine if there is a relationship between two categorical variables in two or more independent groups. It can be used when data is arranged in a contingency table with observed and expected frequencies. A sample problem demonstrates how to calculate chi-square by finding the difference between observed and expected counts, squaring these differences, dividing by the expected counts, and summing across all cells. Degrees of freedom and critical values from tables determine whether to reject or fail to reject the null hypothesis of independence. Larger tables can be partitioned into subtables to identify where differences lie. Guidelines are provided for when chi-square or Fisher's exact test should be used based on sample size and expected cell counts.
Class24 chi squaretestofindependenceposthocBetynatha Kb
This document provides an overview of the chi-square test of independence through 18 slides. It defines independence, demonstrates it, discusses expected frequencies, and outlines the 5 steps for conducting a chi-square test of independence: 1) checking assumptions, 2) stating hypotheses and significance level, 3) identifying the sampling distribution and test statistic, 4) computing the test statistic, and 5) making a decision and interpreting results. It also covers examining standardized residuals to identify which cells are contributing most to a significant result.
The document provides instructions for conducting an independent samples t-test in SPSS. It explains how to specify the grouping and test variables, define the groups being compared, and set options. It also demonstrates running a t-test to compare mile times between athletes and non-athletes, checking assumptions, and interpreting the output, including Levene's test for equal variances and the t-test results.
Small sample theory deals with statistical inference when sample sizes are small (n ≤ 30). It involves t and F distributions which are defined in terms of degrees of freedom. The t-distribution was developed by William Gosset and is used when sample sizes are small. It has a bell shape but is more spread out than the normal distribution. The F-distribution is used to test if two variances are equal and is defined as the ratio of two chi-square variables. Both distributions depend on degrees of freedom.
This document discusses Chi Square and related procedures for analyzing categorical data. It explains that Chi Square can be used for goodness of fit tests to check if a sample follows a particular distribution, and for tests of association to check if two categorical variables are related. It provides examples of how to conduct and interpret Chi Square goodness of fit and association tests using SPSS. Other related procedures discussed include Fisher's Exact Test for small sample sizes and McNamer's Test for analyzing changes in paired categorical data.
The chi-square test is used to compare observed data to expected data. It determines if differences between the observed and expected numbers are due to chance or something more significant. The chi-square test has several key steps: stating the null and alternative hypotheses, choosing a significance level, finding the critical value, calculating the test statistic by summing the squared differences between observed and expected values divided by the expected value, and making a conclusion by comparing the test statistic to the critical value. The chi-square test has assumptions of adequate sample sizes and independence of data. It is useful for testing goodness of fit, independence of attributes, and homogeneity.
The document discusses the Chi-square (χ2) test, which is a non-parametric test used to test hypotheses about distributions of frequencies across categories of data. It can be used to test for comparing variance and to test for independence between two variables. The summary provides steps for applying the Chi-square test, including calculating expected frequencies, observed vs expected values, the Chi-square statistic, and comparing it to critical values. An example application to test the effectiveness of vaccination in preventing smallpox is shown.
The document discusses Chi-Square tests, which are used when assumptions of normality are violated. It provides requirements for Chi-Square tests, including that variables must be independent and samples sufficiently large. The key steps are outlined: determine appropriate test, establish significance level, formulate hypotheses, calculate test statistic using frequencies, determine degrees of freedom, and compare to critical value. An example compares party membership to opinions on gun control to demonstrate a Chi-Square test of independence.
The chi-square test is used to determine if an observed distribution of data differs from the theoretical distribution. It compares observed frequencies to expected frequencies based on a hypothesis. The chi-square value is calculated by summing the squared differences between observed and expected frequencies divided by the expected frequency. The chi-square value is then compared to a critical value from the chi-square distribution table based on the degrees of freedom. If the chi-square value is greater than the critical value, the null hypothesis that the distributions are the same can be rejected.
Binary logistic regression analysis is used to predict a dichotomous dependent variable from continuous and/or categorical independent variables. SPSS is used to conduct binary logistic regression by entering the dependent variable as 1/0 and independent variables as predictors, and the output provides coefficients, odds ratios, classification tables, and goodness of fit tests. Factors like multicollinearity between predictors and sample size need to be considered to develop the best fitting and most predictive logistic regression model.
This is the information about biostatistics and there are various test which are performed in the laboratory to the field. these tests are f test chi square test etc. on the basis of these data we confirmed probability and calculation of variability. here is the whole information about the chi square test
Generalized Linear Models for Between-Subjects Designssmackinnon
This document provides an overview of generalized linear models (GLiM) for analyzing between-subjects designs. It discusses key assumptions of between-subjects ANOVA such as normality and homogeneity of variance. It then explains how GLiM in SPSS can be used as an alternative approach that describes the distribution of the outcome variable, specifies a link function, and uses maximum likelihood estimation rather than ordinary least squares. The document walks through an example comparing models with different distributions and link functions, and demonstrates interpreting output including parameter estimates, tests of effects, and estimated marginal means.
This document discusses Chi-Square tests. It begins with an overview of Chi-Square, noting that it is a non-parametric test where the test statistic follows a Chi-Square distribution. It then discusses the characteristics of Chi-Square tests, including that they are distribution free and easy to calculate. Several common uses of Chi-Square tests are provided, such as testing goodness of fit and independence. The document then separates into two phases - the first discussing theory and the second providing examples. Phase one delves further into the equation, level of significance, and degrees of freedom. Phase two demonstrates steps for performing a Chi-Square test using observed and expected values.
Statistical Inference Part II: Types of Sampling DistributionDexlab Analytics
This is an in-depth analysis of the way different types of sampling distribution works focusing on their specific functions and interrelations as part of the discussion on the theory of sampling.
The Chi-Square test of independence is used to determine if two categorical variables are independent or dependent. It examines if understanding one variable depends on the other. The test calculates an observed versus expected frequency for each cell. If the Chi-Square value exceeds the critical value, the null hypothesis of independence is rejected, indicating a dependent relationship. The document provides an example comparing education level and news source, finding the variables are dependent based on a significant Chi-Square value.
1) There are two main types of data in NLP: continuous data (like temperatures) analyzed with regression and t-tests, and categorical data (like part-of-speech tags) analyzed with tests like Wilcoxon signed-rank, Fisher's exact, Pearson's chi-square, and McNemar's.
2) Common significance tests for NLP include one sample t-test to compare a sample mean to a known population mean, paired two sample t-test to compare two samples tested twice, and Wilcoxon signed-rank test as a non-parametric alternative to the paired t-test.
3) Other tests mentioned are Fisher's exact for comparing binary classifications, Pearson's chi
The chi-square test is used to determine if there is a significant relationship between two nominal variables by comparing observed and expected frequencies. It can be used to examine relationships between categorical variables like education and income level or brand preference and gender. The null hypothesis states there is no relationship, while the alternative or research hypothesis states there is a relationship. Expected frequencies are calculated and chi-square is obtained using a formula. This value is then compared to a critical value based on the level of significance and degrees of freedom to determine whether to reject or accept the null hypothesis.
General Linear Model is an ANOVA procedure in which the calculations are performed using the least square regression approach to describe the statistical relationship between one or more prediction in continuous response variable. Predictors can be factors and covariates. Copy the link given below and paste it in new browser window to get more information on General Linear Model:- http://www.transtutors.com/homework-help/statistics/general-linear-model.aspx
Overview of Advance Marketing ResearchEnamul Islam
This document provides information on frequency distributions, cross-tabulation, hypothesis testing, and analysis of variance. It defines key terms like frequency distribution, measures of location and variability, cross-tabulation, chi-square test, and one-way ANOVA. It also outlines the general procedures for hypothesis testing and conducting one-way ANOVA, including decomposing total variation, measuring effects, and interpreting results.
The document discusses the one-sample t-test and how it addresses limitations of the z-test. The t-test can be used to compare a sample mean to a population mean or hypothetical mean when the population standard deviation is unknown. It uses the sample standard deviation and degrees of freedom to calculate the t-statistic, which follows a t-distribution. The t-test procedure involves stating hypotheses, determining critical values, calculating statistics, and making conclusions, similar to the z-test. Effect sizes can also be measured.
The Chi Square Test is used to determine if observed data fits a hypothesized distribution. It involves calculating the Chi Square statistic by comparing observed and expected values and interpreting the result using a Chi Square table. The document provides an example using Drosophila genetics to test if two traits are independently assorting. The null hypothesis is that the traits are independently assorting. Expected values are calculated based on this. The Chi Square value is found to be not statistically significant, so the null hypothesis that the traits are independently assorting is not rejected.
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSSICFAI Business School
This document provides instructions for performing a t-test in Microsoft Excel and SPSS. It explains that a t-test is used to test the null hypothesis that the means of two populations are equal. It then outlines the 7 step process to run a t-test in Excel, including selecting the data ranges, hypothesized mean difference, and output range. For SPSS, it lists the 4 step process of selecting the grouping and test variables, defining the groups, and running the independent samples t-test.
This document discusses testing differences between two dependent samples using matched pairs. It provides examples of how to:
1) Calculate the differences between matched pairs and find the mean and standard deviation of the differences.
2) Use a t-test to determine if the mean difference is statistically significant and construct a 90% confidence interval for the true mean difference between two dependent samples.
3) Apply these methods to an example comparing cholesterol levels before and after a mineral supplement, testing the claim that the supplement changes cholesterol levels.
This document discusses hypothesis testing and constructing confidence intervals for comparing two means from independent populations. It provides:
1. Requirements for using a z-test or t-test to compare two means, including that the samples must be independent and randomly selected, and meet certain size or normality criteria.
2. Formulas and steps for conducting a z-test when population variances are known, and a t-test when they are unknown, to test claims about differences in population means.
3. Instructions for using a calculator to perform two-sample z-tests, t-tests, and to construct confidence intervals for the difference between two means.
4. An example comparing hotel room rates using
The document discusses various statistical methods for analyzing relationships between variables, including chi-square tests, measures of association like lambda and gamma, and rank correlation. Chi-square tests can be used to test for independence and goodness of fit between nominal or ordinal variables. Measures like lambda and gamma range from 0 to 1 and indicate the strength of association while controlling for errors. Rank correlation assesses relationships between variables when only ordinal data is available by analyzing the agreement between ranks. Cross tabulation allows investigating patterns of bivariate association through distribution analysis.
Small sample theory deals with statistical inference when sample sizes are small (n ≤ 30). It involves t and F distributions which are defined in terms of degrees of freedom. The t-distribution was developed by William Gosset and is used when sample sizes are small. It has a bell shape but is more spread out than the normal distribution. The F-distribution is used to test if two variances are equal and is defined as the ratio of two chi-square variables. Both distributions depend on degrees of freedom.
This document discusses Chi Square and related procedures for analyzing categorical data. It explains that Chi Square can be used for goodness of fit tests to check if a sample follows a particular distribution, and for tests of association to check if two categorical variables are related. It provides examples of how to conduct and interpret Chi Square goodness of fit and association tests using SPSS. Other related procedures discussed include Fisher's Exact Test for small sample sizes and McNamer's Test for analyzing changes in paired categorical data.
The chi-square test is used to compare observed data to expected data. It determines if differences between the observed and expected numbers are due to chance or something more significant. The chi-square test has several key steps: stating the null and alternative hypotheses, choosing a significance level, finding the critical value, calculating the test statistic by summing the squared differences between observed and expected values divided by the expected value, and making a conclusion by comparing the test statistic to the critical value. The chi-square test has assumptions of adequate sample sizes and independence of data. It is useful for testing goodness of fit, independence of attributes, and homogeneity.
The document discusses the Chi-square (χ2) test, which is a non-parametric test used to test hypotheses about distributions of frequencies across categories of data. It can be used to test for comparing variance and to test for independence between two variables. The summary provides steps for applying the Chi-square test, including calculating expected frequencies, observed vs expected values, the Chi-square statistic, and comparing it to critical values. An example application to test the effectiveness of vaccination in preventing smallpox is shown.
The document discusses Chi-Square tests, which are used when assumptions of normality are violated. It provides requirements for Chi-Square tests, including that variables must be independent and samples sufficiently large. The key steps are outlined: determine appropriate test, establish significance level, formulate hypotheses, calculate test statistic using frequencies, determine degrees of freedom, and compare to critical value. An example compares party membership to opinions on gun control to demonstrate a Chi-Square test of independence.
The chi-square test is used to determine if an observed distribution of data differs from the theoretical distribution. It compares observed frequencies to expected frequencies based on a hypothesis. The chi-square value is calculated by summing the squared differences between observed and expected frequencies divided by the expected frequency. The chi-square value is then compared to a critical value from the chi-square distribution table based on the degrees of freedom. If the chi-square value is greater than the critical value, the null hypothesis that the distributions are the same can be rejected.
Binary logistic regression analysis is used to predict a dichotomous dependent variable from continuous and/or categorical independent variables. SPSS is used to conduct binary logistic regression by entering the dependent variable as 1/0 and independent variables as predictors, and the output provides coefficients, odds ratios, classification tables, and goodness of fit tests. Factors like multicollinearity between predictors and sample size need to be considered to develop the best fitting and most predictive logistic regression model.
This is the information about biostatistics and there are various test which are performed in the laboratory to the field. these tests are f test chi square test etc. on the basis of these data we confirmed probability and calculation of variability. here is the whole information about the chi square test
Generalized Linear Models for Between-Subjects Designssmackinnon
This document provides an overview of generalized linear models (GLiM) for analyzing between-subjects designs. It discusses key assumptions of between-subjects ANOVA such as normality and homogeneity of variance. It then explains how GLiM in SPSS can be used as an alternative approach that describes the distribution of the outcome variable, specifies a link function, and uses maximum likelihood estimation rather than ordinary least squares. The document walks through an example comparing models with different distributions and link functions, and demonstrates interpreting output including parameter estimates, tests of effects, and estimated marginal means.
This document discusses Chi-Square tests. It begins with an overview of Chi-Square, noting that it is a non-parametric test where the test statistic follows a Chi-Square distribution. It then discusses the characteristics of Chi-Square tests, including that they are distribution free and easy to calculate. Several common uses of Chi-Square tests are provided, such as testing goodness of fit and independence. The document then separates into two phases - the first discussing theory and the second providing examples. Phase one delves further into the equation, level of significance, and degrees of freedom. Phase two demonstrates steps for performing a Chi-Square test using observed and expected values.
Statistical Inference Part II: Types of Sampling DistributionDexlab Analytics
This is an in-depth analysis of the way different types of sampling distribution works focusing on their specific functions and interrelations as part of the discussion on the theory of sampling.
The Chi-Square test of independence is used to determine if two categorical variables are independent or dependent. It examines if understanding one variable depends on the other. The test calculates an observed versus expected frequency for each cell. If the Chi-Square value exceeds the critical value, the null hypothesis of independence is rejected, indicating a dependent relationship. The document provides an example comparing education level and news source, finding the variables are dependent based on a significant Chi-Square value.
1) There are two main types of data in NLP: continuous data (like temperatures) analyzed with regression and t-tests, and categorical data (like part-of-speech tags) analyzed with tests like Wilcoxon signed-rank, Fisher's exact, Pearson's chi-square, and McNemar's.
2) Common significance tests for NLP include one sample t-test to compare a sample mean to a known population mean, paired two sample t-test to compare two samples tested twice, and Wilcoxon signed-rank test as a non-parametric alternative to the paired t-test.
3) Other tests mentioned are Fisher's exact for comparing binary classifications, Pearson's chi
The chi-square test is used to determine if there is a significant relationship between two nominal variables by comparing observed and expected frequencies. It can be used to examine relationships between categorical variables like education and income level or brand preference and gender. The null hypothesis states there is no relationship, while the alternative or research hypothesis states there is a relationship. Expected frequencies are calculated and chi-square is obtained using a formula. This value is then compared to a critical value based on the level of significance and degrees of freedom to determine whether to reject or accept the null hypothesis.
General Linear Model is an ANOVA procedure in which the calculations are performed using the least square regression approach to describe the statistical relationship between one or more prediction in continuous response variable. Predictors can be factors and covariates. Copy the link given below and paste it in new browser window to get more information on General Linear Model:- http://www.transtutors.com/homework-help/statistics/general-linear-model.aspx
Overview of Advance Marketing ResearchEnamul Islam
This document provides information on frequency distributions, cross-tabulation, hypothesis testing, and analysis of variance. It defines key terms like frequency distribution, measures of location and variability, cross-tabulation, chi-square test, and one-way ANOVA. It also outlines the general procedures for hypothesis testing and conducting one-way ANOVA, including decomposing total variation, measuring effects, and interpreting results.
The document discusses the one-sample t-test and how it addresses limitations of the z-test. The t-test can be used to compare a sample mean to a population mean or hypothetical mean when the population standard deviation is unknown. It uses the sample standard deviation and degrees of freedom to calculate the t-statistic, which follows a t-distribution. The t-test procedure involves stating hypotheses, determining critical values, calculating statistics, and making conclusions, similar to the z-test. Effect sizes can also be measured.
The Chi Square Test is used to determine if observed data fits a hypothesized distribution. It involves calculating the Chi Square statistic by comparing observed and expected values and interpreting the result using a Chi Square table. The document provides an example using Drosophila genetics to test if two traits are independently assorting. The null hypothesis is that the traits are independently assorting. Expected values are calculated based on this. The Chi Square value is found to be not statistically significant, so the null hypothesis that the traits are independently assorting is not rejected.
QUANTITATIVE DATA ANALYSIS HOW TO DO A T-TEST ON MS-EXCEL AND SPSSICFAI Business School
This document provides instructions for performing a t-test in Microsoft Excel and SPSS. It explains that a t-test is used to test the null hypothesis that the means of two populations are equal. It then outlines the 7 step process to run a t-test in Excel, including selecting the data ranges, hypothesized mean difference, and output range. For SPSS, it lists the 4 step process of selecting the grouping and test variables, defining the groups, and running the independent samples t-test.
This document discusses testing differences between two dependent samples using matched pairs. It provides examples of how to:
1) Calculate the differences between matched pairs and find the mean and standard deviation of the differences.
2) Use a t-test to determine if the mean difference is statistically significant and construct a 90% confidence interval for the true mean difference between two dependent samples.
3) Apply these methods to an example comparing cholesterol levels before and after a mineral supplement, testing the claim that the supplement changes cholesterol levels.
This document discusses hypothesis testing and constructing confidence intervals for comparing two means from independent populations. It provides:
1. Requirements for using a z-test or t-test to compare two means, including that the samples must be independent and randomly selected, and meet certain size or normality criteria.
2. Formulas and steps for conducting a z-test when population variances are known, and a t-test when they are unknown, to test claims about differences in population means.
3. Instructions for using a calculator to perform two-sample z-tests, t-tests, and to construct confidence intervals for the difference between two means.
4. An example comparing hotel room rates using
The document discusses various statistical methods for analyzing relationships between variables, including chi-square tests, measures of association like lambda and gamma, and rank correlation. Chi-square tests can be used to test for independence and goodness of fit between nominal or ordinal variables. Measures like lambda and gamma range from 0 to 1 and indicate the strength of association while controlling for errors. Rank correlation assesses relationships between variables when only ordinal data is available by analyzing the agreement between ranks. Cross tabulation allows investigating patterns of bivariate association through distribution analysis.
The document discusses chi-square test and its properties. It defines chi-square as a non-parametric statistical test used for discrete data to test for independence and goodness of fit between observed and expected frequencies. The chi-square test has some key assumptions including independent random samples, nominal or ordinal level data, and no expected cell counts below 5. It is calculated by subtracting expected from observed frequencies, squaring the differences, and dividing by expected counts. The chi-square test can identify if there is a significant association between variables but does not measure the strength of the association.
marketing research & applications on SPSSANSHU TIWARI
The document discusses various statistical techniques used in marketing research to analyze survey data, including frequency distributions, measures of central tendency and variability, hypothesis testing, and cross-tabulation. Frequency distributions are used to determine the mean, mode, median and answer questions about single variables. Hypothesis testing involves forming hypotheses, selecting a test, determining significance levels, collecting data, and making statistical decisions. Cross-tabulation examines relationships between two or more variables using techniques like chi-square tests. Both parametric and non-parametric tests are used depending on variable scales.
An independent t-test is used to compare the means of two independent groups on a continuous dependent variable. It tests if there is a statistically significant difference between the population means of the two groups. The test assumes the groups are independent, the dependent variable is normally distributed for each group, and the groups have equal variances. To perform the test, the researcher states the hypotheses, sets an alpha level, calculates the t-statistic and degrees of freedom, and determines whether to reject or fail to reject the null hypothesis by comparing the t-statistic to the critical value.
- A sample is a small group selected from a population to represent that population. Sampling provides benefits like being less time-consuming, less expensive, and allowing results to be repeated.
- There are two main types of samples: probability and non-probability. Probability samples include simple random, systematic, stratified, and cluster samples. Sample size is determined based on factors like the type of study, expected results, costs, and available resources.
- Inferential statistics allow generalization from a sample to a population through hypothesis testing and significance tests. Tests include t-tests, F-tests, chi-squared tests, and correlation/regression to analyze relationships between variables. Significant results suggest differences are likely not due to chance
The document discusses various statistical concepts including:
- The functions of statistics such as expressing facts numerically and establishing relationships between facts.
- The importance of statistics to fields like administration, economics, research, and education.
- Common measures of central tendency including the mean, median, and mode.
- The difference between theoretical and empirical probabilities.
- Types of correlation like positive, negative, simple, and multiple correlation.
- Key statistical tests including t-tests, chi-square, F-tests, and measures of accuracy, precision, and confidence intervals.
Basic Statistical Descriptions of Data.pptxAnusuya123
This document provides an overview of 7 basic statistical concepts for data science: 1) descriptive statistics such as mean, mode, median, and standard deviation, 2) measures of variability like variance and range, 3) correlation, 4) probability distributions, 5) regression, 6) normal distribution, and 7) types of bias. Descriptive statistics are used to summarize data, variability measures dispersion, correlation measures relationships between variables, and probability distributions specify likelihoods of events. Regression models relationships, normal distribution is often assumed, and biases can influence analyses.
This document discusses correlation coefficient and path coefficient analysis. It defines correlation as a statistical method to analyze the relationship between two or more variables. Correlation determines the degree of relationship but not causation. The document then discusses different types of correlation including positive, negative, linear, non-linear, simple, multiple and partial correlation. It also discusses methods to measure correlation including scatter diagrams, Karl Pearson's coefficient, Spearman's coefficient and concurrent deviation method. Finally, it explains path analysis which can be used to partition correlations into direct and indirect effects when studying causal relationships between variables.
This document provides an overview of key concepts in descriptive statistics including measures of central tendency (mode, median, mean), measures of dispersion (range, variance, standard deviation), the normal distribution, z-scores, hypothesis testing, and the t-distribution. It defines each concept and provides examples of calculating and interpreting common statistics.
Basic of Statistical Inference Part-III: The Theory of Estimation from Dexlab...Dexlab Analytics
In this 3rd segment of the basic of statistical inference series, the estimation theory, its elements, methods and characteristics have been discussed.
1. The document discusses parameter estimation, effect size, bivariate statistics including correlation and regression, and chi-square analysis.
2. Parameter estimation refers to using sample data to estimate population parameters, and sample statistics are estimations of population parameters.
3. Effect size measures the strength of the relationship between two variables and can be measured by eta square, partial eta square, and omega square, among others.
4. Correlation measures the association between variables, while regression predicts one variable from another. Chi-square analysis examines relationships between discrete variables.
1. The document discusses linear correlation and regression between plasma amphetamine levels and amphetamine-induced psychosis scores using data from 10 patients.
2. A positive correlation was found between the two variables, and a linear regression equation was established to predict psychosis scores from amphetamine levels.
3. However, further statistical tests were needed to determine if the correlation and regression model could be generalized to the overall patient population.
For more classes visit
www.snaptutorial.com
1
To make tests of hypotheses about more than two population means, we use the:
t distribution
normal distribution
chi-square distribution
analysis of variance distribution
This document provides an overview of simple linear regression and correlation analysis. It defines regression as estimating the relationship between two variables and correlation as measuring the strength and direction of that relationship. The key points covered include:
- Regression finds an estimating equation to relate known and unknown variables. Correlation determines how well that equation fits the data.
- Pearson's correlation coefficient r measures the linear relationship between two variables on a scale from -1 to 1.
- The coefficient of determination r2 indicates what percentage of variation in the dependent variable is explained by the independent variable.
- Statistical tests can evaluate whether a correlation is statistically significant or could be due to chance.
This document summarizes statistical tests for comparing two samples, including paired and independent samples t-tests, confidence intervals, and effect sizes. For paired samples from within-subject designs, a paired t-test is used to test for differences between means. For independent samples from between-subject designs, an independent samples t-test is used. Both tests calculate a t-statistic based on the mean difference and standard error. Confidence intervals and effect sizes can also be calculated for paired and independent sample designs. Examples are provided to demonstrate how to perform the statistical tests and calculations.
Data Processing and Statistical Treatment: Spreads and CorrelationJanet Penilla
A hyperlinked presentation. The objectives of the topic were written. The presentation was started with the variance and then the standard deviation provided with examples. It also answers on when to use the sample standard deviation and the population standard deviation or what type of data should we use when we calculate a standard deviation. The presentation also includes Correlations and other correlation techniques(Pearson-product moment correlation; Spearman - rank order correlation coefficient; t-test for correlation).
This document provides an overview of linear regression analysis. It discusses (1) why regression is used, including for description, adjustment for covariates, identifying predictors, and prediction; (2) the basics of linear regression in predicting an interval outcome variable based on predictor variables; and (3) how to conduct univariate linear regression in SPSS, including interpreting results and ensuring assumptions are met. Key assumptions include no outliers, independent data points, normally distributed residuals with constant variance.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 4
Chapter 9: Inferences about Two Samples
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 4
Chapter 8: Hypothesis Testing
Solution to the practice test ch 10 correlation reg ch 11 gof ch12 anovaLong Beach City College
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 5
Module 5
Chapter 10: Correlation and Regression
Chapter 11: Goodness of Fit and Contingency Tables
Chapter 12: Analysis of Variance
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 5
Module 5
Chapter 10: Correlation and Regression
Chapter 11: Goodness of Fit and Contingency Tables
Chapter 12: Analysis of Variance
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 4
Module 4:
Chapter 8, Hypothesis Testing
Chapter 9: Two Populations
Solution to the practice test ch 8 hypothesis testing ch 9 two populationsLong Beach City College
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 4
Module 4:
Chapter 8, Hypothesis Testing
Chapter 9: Two Populations
Solution to the Practice Test 3A, Chapter 6 Normal Probability DistributionLong Beach City College
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 3
Practice Test Chapter 6 (Normal Probability Distributions)
Chapter 6: Normal Probability Distributions
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 3
Practice Test Chapter 6 (Normal Probability Distributions)
Chapter 6: Normal Probability Distributions
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 2 Solutions
Chapter 4: Probability
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 2
Chapter 4: Probability
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Elementary Statistics Practice Test 1
Module 1: Chapters 1-3
Chapter 1: Introduction to Statistics.
Chapter 2: Exploring Data with Tables and Graphs.
Chapter 3: Describing, Exploring, and Comparing Data.
This document summarizes the solutions to three one-way ANOVA problems testing claims about population means.
The first problem analyzes readability scores of three books and finds sufficient evidence to reject the claim that the means are all the same.
The second problem examines tree weights under different treatments and fails to support the claim that all treatment means are equal.
The third problem also looks at tree weights but in a different region, and finds sufficient evidence to fail to reject the claim that all treatment means are the same.
1. Analysis of variance (ANOVA) is a statistical technique used to test whether the means of three or more groups are equal. It analyzes the variations between and within groups.
2. ANOVA requires assumptions of normality, equal variances, independence, and random sampling. It uses sum of squares, mean squares and the F-test statistic to determine if group means are significantly different.
3. If the p-value is less than the significance level (often 0.05), the null hypothesis of equal group means is rejected, indicating at least one group mean is significantly different from the others.
The document provides an overview of goodness-of-fit tests for multinomial experiments and contingency tables, which are used to test if observed frequency distributions fit expected distributions. It defines multinomial experiments, goodness-of-fit tests, and contingency tables, and explains how to perform tests of independence and homogeneity using chi-square tests on contingency tables. Sample problems are provided to test claims about categories of outcomes and the independence of variables in contingency tables.
1. The document discusses correlation and regression analysis. It defines the linear correlation coefficient r and how it measures the strength of a linear relationship between two variables.
2. It presents the formula for calculating r and describes how to test for a linear correlation between two variables.
3. It also defines the regression equation y=mx+b, where m is the slope and b is the y-intercept. It describes how to use a regression equation to predict values of the dependent variable y given values of the independent variable x.
This document provides an overview of two-way analysis of variance (ANOVA). It explains that two-way ANOVA involves two categorical independent variables and one continuous dependent variable. The document outlines the objectives of two-way ANOVA, which are to analyze interactions between the two factors, and evaluate the effects of each factor. It then provides examples of how to set up and perform two-way ANOVA calculations and interpretations.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 12: Analysis of Variance
12.1: One-Way ANOVA
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.2: Contingency Tables
The document provides information about goodness-of-fit tests and contingency tables. It defines a goodness-of-fit test as testing whether an observed frequency distribution fits a claimed distribution. It also provides the notation, requirements, and steps to conduct a goodness-of-fit test including: defining the null and alternative hypotheses, calculating the test statistic as a chi-square value, finding the critical value, and making a decision to reject or fail to reject the null hypothesis. Several examples demonstrate how to perform goodness-of-fit tests to determine if sample data fits a claimed distribution.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
LAND USE LAND COVER AND NDVI OF MIRZAPUR DISTRICT, UPRAHUL
This Dissertation explores the particular circumstances of Mirzapur, a region located in the
core of India. Mirzapur, with its varied terrains and abundant biodiversity, offers an optimal
environment for investigating the changes in vegetation cover dynamics. Our study utilizes
advanced technologies such as GIS (Geographic Information Systems) and Remote sensing to
analyze the transformations that have taken place over the course of a decade.
The complex relationship between human activities and the environment has been the focus
of extensive research and worry. As the global community grapples with swift urbanization,
population expansion, and economic progress, the effects on natural ecosystems are becoming
more evident. A crucial element of this impact is the alteration of vegetation cover, which plays a
significant role in maintaining the ecological equilibrium of our planet.Land serves as the foundation for all human activities and provides the necessary materials for
these activities. As the most crucial natural resource, its utilization by humans results in different
'Land uses,' which are determined by both human activities and the physical characteristics of the
land.
The utilization of land is impacted by human needs and environmental factors. In countries
like India, rapid population growth and the emphasis on extensive resource exploitation can lead
to significant land degradation, adversely affecting the region's land cover.
Therefore, human intervention has significantly influenced land use patterns over many
centuries, evolving its structure over time and space. In the present era, these changes have
accelerated due to factors such as agriculture and urbanization. Information regarding land use and
cover is essential for various planning and management tasks related to the Earth's surface,
providing crucial environmental data for scientific, resource management, policy purposes, and
diverse human activities.
Accurate understanding of land use and cover is imperative for the development planning
of any area. Consequently, a wide range of professionals, including earth system scientists, land
and water managers, and urban planners, are interested in obtaining data on land use and cover
changes, conversion trends, and other related patterns. The spatial dimensions of land use and
cover support policymakers and scientists in making well-informed decisions, as alterations in
these patterns indicate shifts in economic and social conditions. Monitoring such changes with the
help of Advanced technologies like Remote Sensing and Geographic Information Systems is
crucial for coordinated efforts across different administrative levels. Advanced technologies like
Remote Sensing and Geographic Information Systems
9
Changes in vegetation cover refer to variations in the distribution, composition, and overall
structure of plant communities across different temporal and spatial scales. These changes can
occur natural.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
2. Correlation and Regression
Correlation
2
Objectives:
• Draw a scatter plot for a set of ordered pairs.
• Compute the correlation coefficient.
• Test the hypothesis 𝐻0: ρ = 0.
• Compute the equation of the regression line & the coefficient of determination.
• Compute the standard error of the estimate & a prediction interval.
3. For population 1 & 2:
Recall: Inferences about Two Proportions
1 2
1 2
X X
p
n n
1q p
The pooled sample proportion combines the two samples
proportions into one proportion & Test Statistic :
1 2 1 2 1 2 1 2
1 21 2
ˆ ˆ ˆ ˆ
: Or
1 1
p p p p p p p p
TS z
pq pq
pq
n nn n
Confidence Interval Estimate of p1 − p2
1 1 2 2 1 1 2 2
1 2 2 1 2 1 2 2
1 2 1 2
ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ
ˆ ˆ ˆ ˆ
p q p q p q p q
p p z p p p p z
n n n n
1 2
ˆ ˆp p E
The P-value method and the critical value method are equivalent, but the confidence
interval method is not equivalent to the P-value method or the critical value method.
2
2 2 2
2
ˆ ˆ ˆ, 1
x
p q p
n
1
1 1 1
1
ˆ ˆ ˆ, 1
x
p q p
n
3
TI Calculator:
Confidence Interval: 2
proportion
1. Stat
2. Tests
3. 2-prop ZINT
4. Enter:𝒏 𝟏 , 𝒏 𝟐, 𝒙 𝟏, 𝒙 𝟐 & CL
TI Calculator:
2 - Proportion Z - test
1. Stat
2. Tests
3. 2 ‒ PropZTest
4. Enter Data or Stats
𝒏 𝟏 , 𝒏 𝟐, 𝒙 𝟏, 𝒙 𝟐
5. Choose RTT, LTT,
or 2TT
4. Recall: Two Means: Independent Samples: 1. The two samples are
independent. 2. Both samples are simple random samples. 3. Either or both of
these conditions are satisfied: The two sample sizes are both large (with n1 > 30 and n2
> 30) or both samples come from populations having normal distributions.
σ1 and σ2 are known: Use the z test for comparing two means from independent populations
2 2
1 2
2
1 2
E z
n n
1 2
Confidence Inte
)
rv
(
al:
x x E
1 2 1 2 1 2
2 2 2 2
1 2 1 2
1 2 1 2
( ) ( )
: Or
x x x x
TS z z
n n n n
2 2
1 2
2
1 2
s s
E t
n n
Unequal Variances: σ1 ≠ σ2
df = smaller of n1 – 1 or n2 – 1
Equal Variances :
σ1 = σ2
Pool the Sample
Variances
df = n1 – 1 + n2 – 1
1 2
2 2
1 2
1 2
:
x x
TS t
s s
n n
1 2
2 2
1 2
:
p p
x x
TS t
s s
n n
σ1 and σ2 are unknown: Use the t test for comparing two means from independent populations
2 2
1 1 2 2
1 2
( 1) ( 1)
( 1) ( 1)
p
n s n s
s
n n
2 2
2
1 2
p ps s
E t
n n
4
TI Calculator:
2 - Sample Z - test
1. Stat
2. Tests
3. 2 ‒ SampZTest
4. Enter Data or Stats
𝝈 𝟏 , 𝝈 𝟐, 𝒙 𝟏 , 𝒏 𝟏, 𝒙 𝟐
𝒏 𝟏 , 𝒏 𝟐,
5. Choose RTT, LTT,
or 2TT
6. Calculate
TI Calculator:
2 - Sample Z - Interval
1. Stat
2. Tests
3. 2 ‒ SampZInt
4. Enter Data or Stats
𝝈 𝟏 , 𝝈 𝟐, 𝒙 𝟏 , 𝒏 𝟏, 𝒙 𝟐
𝒏 𝟏 , 𝒏 𝟐,
5. Choose RTT, LTT, or
2TT
6. Calculate
TI Calculator:
2 - Sample T - test
1. Stat
2. Tests
3. 2 ‒ SampTTest
4. Enter Data or Stats
𝒙 𝟏, 𝒔 𝟏 , 𝒏 𝟏, 𝒙 𝟐
𝒏 𝟏 , 𝒔 𝟐,
5. Choose RTT, LTT,
or 2TT
6. Pooled: No / Yes
7. Calculate
TI Calculator:
2 - Sample T - Interval
1. Stat
2. Tests
3. 2 ‒ SampTInt
4. Enter Data or Stats
𝒙 𝟏, 𝒔 𝟏 , 𝒏 𝟏, 𝒙 𝟐
𝒏 𝟏 , 𝒔 𝟐,
5. Choose RTT, LTT,
or 2TT
6. Pooled: No / Yes
7. Calculate
5. Key Concept: Testing hypotheses and constructing confidence intervals involving the
mean of the differences of the values from two populations that are dependent in the
sense that the data consist of matched pairs. The pairs must be matched according to
some relationship, such as before/after measurements from the same subjects .
Good Experimental Design: When designing an experiment or planning an
observational study, using dependent samples with matched pairs is generally better than
using two independent samples.
1. Hypothesis Test: Use the differences from two dependent samples (matched pairs)
to test a claim about the mean of the population of all such differences.
2. Confidence Interval: Use the differences from two dependent samples (matched
pairs) to construct a confidence interval estimate of the mean of the population of
all such differences.
• d = individual difference between the two values in a single matched pair
• µd = mean value of the differences d for the population of all matched pairs of data
• 𝑑 = mean value of the differences d for the paired sample data
• sd = standard deviation of the differences d for the paired sample data
• n = number of pairs of sample data
When the values are dependent, do a t test on the differences.
Denote the differences with the symbol d or D, the mean of the population differences
with μd or μD, and the sample standard deviation of the differences with sd or sD.
Recall: Two Means: Two Dependent Samples (Matched Pairs)
5
Use either d or
d.f. = 1
Or
D:
n
D d
D d
n n
d
d
d
t
s n
orD E d E
2
ds
E t
n
2 2 2
( ) ( )
1 ( 1)
D
d d n d d
s
n n n
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. Type in your data in
𝑳 𝟏 & 𝑳 𝟐
5. 𝑳 𝟏 − 𝑳 𝟐
6. Store in 𝑳 𝟑
7. Enter
Mean, SD, 5-number
summary
1. Stat
2. Calc
3. Select 1 for 1 variable
4. Type: L3 (second 3)
5. Calculate
TI Calculator:
T- interval
1. Tests
2. T - Interval
3. Data
4. Enter 𝝁 𝟎 =0, List:
𝑳 𝟑, Freq:1
5. Calculate
TI Calculator:
Matched pair: T ‒ Test
1. Tests
2. T ‒ Test
3. Data
4. Enter 𝝁 𝟎 =0, List: 𝑳 𝟑,
Freq:1
5. Choose RTT, LTT, or 2TT
6. Calculate
6. For the comparison of two variances or standard deviations, an F test is used.
• The F test should not be confused with the chi-square test, which compares a single
sample variance to a specific population variance.
Characteristics:
1. The values of F cannot be negative, because variances are always positive or zero.
2. The distribution is positively skewed.
3. The mean value of F is approximately equal to 1.
4. The F distribution is a family of curves based on the degrees of freedom of the
variance of the numerator and the degrees of freedom of the variance of the
denominator.
• The larger of the two variances is placed in the numerator regardless of the
subscripts.
• The F test has two terms for the degrees of freedom: that of the numerator, n1 – 1, and
that of the denominator, n2 – 1, where n1 is the sample size from which the larger
variance was obtained.
Recall: Two Variances or Standard Deviations
2
1
2
2
s
F
s
6
TI Calculator:
2 - Sample F - test
1. Stat
2. Tests
3. 2 ‒ SampFTest
4. Enter Data or Stats
𝒔 𝟏 , 𝒏 𝟏, 𝒔 𝟐 , 𝒏 𝟐,
5. Choose RTT, LTT,
or 2TT
6. Calculate
7. Key Concept: In addition to hypothesis testing and confidence intervals, inferential statistics determines if a
relationship between 2 or more quantitative variables exists.
A correlation exists between two variables when the values of one variable are somehow associated with the
values of the other variable.
A linear correlation exists between two variables when there is a correlation and the plotted points look like a
straight line.
This section considers only linear relationships, which means that when graphed in a scatterplot, the points
approximate a straight-line pattern and methods of conducting a formal hypothesis test that can be used to decide
whether there is a linear correlation between all population values for the two variables.
The linear correlation coefficient r, is a number that measures how well paired sample data fit a straight-line
pattern when graphed (measures the strength of the linear association between the two variables). Use the sample
of paired data (sometimes called bivariate data) to find the value of r, and then use r to decide whether there is a
linear correlation between the two variables.
Correlation
7
Regression is a statistical method used to describe the nature of the relationship between variables—that is,
positive or negative, linear or nonlinear.
Questions: 1. Are two or more variables related?
2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?
8. Scatterplots & The Strength of the Linear
Correlation: r
A scatter plot is a graph of the ordered pairs (x, y) x: the independent
variable
y: the dependent variable y.
a. Distinct straight-line, or linear, pattern. We say that there is a
positive linear correlation between x and y, since as the x values
increase, the corresponding y values also increase.
b. Distinct straight-line, or linear pattern. We say that there is a
negative linear correlation between x and y, since as the x values
increase, the corresponding y values decrease.
c. No distinct pattern, which suggests that there is no correlation
between x and y.
d. Distinct pattern suggesting a correlation between x and y, but the
pattern is not that of a straight line.
8
9. Linear Correlation Coefficient r
1. Are two or more variables related?
2. If so, what is the strength of the relationship?
To answer these two questions, statisticians use the correlation coefficient, a numerical measure to determine
whether two or more variables are related and to determine the strength of the relationship between or among the
variables.
• Linear Correlation Coefficient r
The linear correlation coefficient r measures the strength of the linear correlation between the paired quantitative x values and y
values in a sample. It determine whether there is a linear correlation between two variables.
3. What type of relationship exists?
There are two types of relationships: simple and multiple.
In a simple relationship, there are two variables: an independent variable (predictor variable) and a dependent
variable (response variable).
In a multiple relationship, there are two or more independent variables that are used to predict one dependent
variable.
4. What kind of predictions can be made from the relationship?
Predictions are made daily in all areas. Examples include weather forecasting, stock market analyses, sales
predictions, crop predictions, gasoline price predictions, and sports predictions. Some predictions are more accurate
than others, due to the strength of the relationship. That is, the stronger the relationship is between variables, the
more accurate the prediction is. 9
10. 10
Construct a scatter plot for the data
shown for car rental companies in the
United States for a recent year.
Example 1
Step 1: Draw and label the x and y axes.
Step 2: Plot each point on the graph.
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. Type in your data
in 𝑳 𝟏 & 𝑳 𝟐
TI Calculator:
Scatter Plot:
1. Press on Y & clear
2. 2nd y, Enter
3. On, Enter
4. Select X1-list: 𝑳 𝟏
5. Select Y1-list: 𝑳 𝟐
6. Mark: Select
Character
7. Press Zoom & 9 to
get Zoomstat
11. Calculating and Interpreting the Linear Correlation Coefficient denoted by r
The correlation coefficient computed from the sample data measures the strength and direction of a linear relationship between two
variables.
There are several types of correlation coefficients. The one explained in this section is called the Pearson product moment correlation
coefficient (PPMC).
The symbol for the sample correlation coefficient is r. The symbol for the population correlation coefficient is .
The range of the correlation coefficient is from 1 to 1.
If there is a strong positive linear relationship between the variables, the value of r will be close to 1.
If there is a strong negative linear relationship between the variables, the value of r will be close to 1.
n number of pairs of sample data, ∑x = sum of all x’s, ∑x² = Sum of (x values that are squared)
(∑x)² Sum up the x values and square the total. Avoid confusing ∑x² and (∑x)².
∑xy indicates that each x value should first be multiplied by its corresponding y value. After obtaining all such products, find their
sum.
r linear correlation coefficient for sample data
𝝆 (Rho) : linear correlation coefficient for a population of paired data
11
12. Given any collection of sample paired quantitative data, the linear correlation coefficient r can
always be computed, but the following requirements should be satisfied when using the sample
paired data to make a conclusion about linear correlation in the corresponding population of
paired data.
1. The sample of paired (x, y) data is a simple random sample of quantitative data.
2. Visual examination of the scatterplot must confirm that the points approximate a straight-line
pattern.
3. Because results can be strongly affected by the presence of outliers, any outliers must be removed if
they are known to be errors. The effects of any other outliers should be considered by calculating r
with and without the outliers included.
4. In other words, requirements 2 and 3 are simplified attempts at checking that the pairs of (x, y) data
have a bivariate normal distribution.
12
2 22 2
( )
( ) ( ) ( ) ( )
n xy x y
r
n x x n y y
( )
1
x yZ Z
r
n
𝒁 𝒙 denotes the z score for an individual sample value x
𝒁 𝒚 denotes the z score for the corresponding sample value y.
Calculating and Interpreting the Linear Correlation Coefficient denoted by r
13. Example 2: Finding r Using Technology
The table lists five paired data values. Use technology to find the value of the
correlation coefficient r for the data.
Chocolate 5 6 4 4 5
Nobel 6 9 3 2 11
13
Solution:
The value of r will be automatically calculated with
software or a calculator: r = 0.795
14. Example 3 a: Finding r Using the following Formula
Use this Formula to find the value of the linear correlation coefficient r for the five
pairs of chocolate/Nobel data listed in the table.
Chocolate 5 6 4 4 5
Nobel 6 9 3 2 11
14
x (Chocolate) y (Nobel) x² y² xy
5 6 25 36 30
6 9 36 81 54
4 3 16 9 12
4 2 16 4 8
5 11 25 121 55
∑x = 24 ∑y = 31 ∑x² = 118 ∑y² = 251 ∑xy = 159
2 2
5(159) 24 31
5(118) 24 5(251) 31
r
51
14(294)
0.795
2 22 2
( )
( ) ( ) ( ) ( )
n xy x y
r
n x x n y y
TI Calculator:
Linear Regression -
test
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳 𝟏 & 𝑳 𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. Type in your data in
𝑳 𝟏 & 𝑳 𝟐
15. Use Formula to find the value of the linear correlation coefficient r for the five pairs of
chocolate/Nobel data listed in the table.
Solution: The z scores for all of the chocolate values (see the third column) and the z scores
for all of the Nobel values (see the fourth column) are below. The last column lists the
products zx · zy.
x (Chocolate) y (Nobel)
5 6
6 9
4 3
4 2
5 11
blank blank
15
Example 3b: Finding r Using the following Formula
( )
1
x yZ Z
r
n
3.179746
5 1
0.795
This Formula has the
advantage of making it
easier to understand
how r works. The
variable x is used for
the chocolate values,
and the variable y is
used for the Nobel
values. Each sample
value is replaced by its
corresponding z score.
zx zy zx · zy
0.239046 −0.052164 −0.012470
1.434274 0.730297 1.047446
−0.956183 −0.834625 0.798054
−0.956183 −1.095445 1.047446
0.239046 1.251937 0.299270
blank blank ∑ (zx · zy) = 3.179746
For Example:
Chocolates: 𝑥 =
𝑥
𝑛
= 4.8, 𝑠 𝑥 =
(𝑥− 𝑥)2
𝑛−1
= 0.836660,
𝑥 = 5 → 𝑧 𝑥 =
𝑥 − 𝑥
𝑠 𝑥
=
5 − 4.8
0.83666
= 0.23905
16. Example 4: Finding r Using the following Formula (Skip)
Find the correlation coefficient for the given data.
16
Company
Cars x
(in 10,000s)
Income y
(in billions) xy x2
y2
A
B
C
D
E
F
63.0
29.0
20.8
19.1
13.4
8.5
7.0
3.9
2.1
2.8
1.4
1.5
441.00
113.10
43.68
53.48
18.76
12.75
3969.00
841.00
432.64
364.81
179.56
72.25
49.00
15.21
4.41
7.84
1.96
2.25
Σx =
153.8
Σy =
18.7
Σxy =
682.77
Σx2
=
5859.26
Σy2
=
80.67
Σx = 153.8, Σy = 18.7, Σxy = 682.77, Σx2 = 5859.26, Σy2 = 80.67, n = 6
2 2
6(682.77) 153.8 18.7
6(5859.26) 153.8 6(80.67) 18.7
r
strong positive relat0.9 ion82 ( )shipr
2 22 2
( )
( ) ( ) ( ) ( )
n xy x y
r
n x x n y y
TI Calculator:
Linear Regression -
test
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳 𝟏 & 𝑳 𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. Type in your data in
𝑳 𝟏 & 𝑳 𝟐
17. Null Hypothesis: H0: ρ = 0 (No correlation) Alternative Hypothesis: H1: ρ ≠ 0 (Correlation)
Using P-Value from Technology to Interpret r:
P-value ≤ α: Reject 𝐻0 → Supports the claim of a linear correlation.
P-value > α: Does not support the claim of a linear correlation.
Using Pearson Correlation coefficient table to Interpret r: Consider critical values from
this Table or technology as being both positive and negative:
• Correlation If |r| ≥ critical value ⇾ There is sufficient evidence to support the claim of a linear
correlation.
• No Correlation If |r| < critical value ⇾ There is not sufficient evidence to support the claim of a
linear correlation.
Properties of the Linear Correlation Coefficient r
1. −1 ≤ r ≤ 1.
2. If all values of either variable are converted to a different scale, the value of r does not change.
4. The value of r is not affected by the choice of x or y. Interchange all x values and y values, and
the value of r will not change. r measures the strength of a linear relationship. It is not designed
to measure the strength of a relationship that is not linear.
5. r is very sensitive to outliers in the sense that a single outlier could dramatically affect its value. 17
Calculating and Interpreting the Linear Correlation Coefficient denoted by r
18. 18
Correlation
A correlation exists between two variables when the values of one variable are
somehow associated with the values of the other variable. A linear correlation exists
between two variables when there is a correlation and the plotted points look like a
straight line. The linear correlation coefficient r, is a number that measures how well
paired sample data fit a straight-line pattern when graphed (measures the strength of the
linear association between paired data called bivariate data). The value of r² is the
proportion of the variation in y that is explained by the linear relationship between x
and y.
Properties of the Linear Correlation Coefficient r: −1 ≤ r ≤ 1
zx denotes the z score for
an individual sample
value x
zy is the z score for the
corresponding sample
value y.
2
2
:
:
, 2
1
T
n
t r d
O
r
r
fS n
r
2 22 2
)
1
:
(
,
x y
n xy x y Z Z
r r
n
n x x n y y
Or
Step 1: H0 :𝜌 = 0, H1: 𝜌 ≠ 0
claim & Tails
Step 2: TS: 𝑡 = 𝑟
𝑛−2
1−𝑟2 , OR: r
Step 3: CV using α From the
T-table or r-table
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There is
/ is not sufficient evidence to
support the claim that…
There is a linear Correlation If
|r| ≥ critical value
There is No Correlation If |r| <
critical value
TI Calculator:
How to enter data:
1. Stat
2. Edit
3. ClrList 𝑳 𝟏 & 𝑳 𝟐
4. Type in your data in
𝑳 𝟏 & 𝑳 𝟐
TI Calculator:
Scatter Plot:
1. Press on Y & clear
2. 2nd y, Enter
3. On, Enter
4. Select X1-list: 𝑳 𝟏
5. Select Y1-list: 𝑳 𝟐
6. Mark: Select
Character
7. Press Zoom & 9 to
get ZoomStat
TI Calculator:
Linear Regression -
test
1. Stat
2. Tests
3. LinRegTTest
4. Enter 𝑳 𝟏 & 𝑳 𝟐
5. Freq = 1
6. Choose ≠
7. Calculate
19. 19
Test the significance of the given correlation coefficient
using α = 0.05, n = 6 and r = 0.982.
Example 5
Decision:
a. Reject H0
b. The claim is True
c. There is a significant relationship between
the 2 variables.
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using α
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There
is / is not sufficient evidence to
support the claim that…
H0: 𝜌 = 0, H1: 𝜌 ≠ 0, claim, 2TT
TS: t-distribution 2nd Method: Pearson Correlation
CV: α = 0.05,
𝑑𝑓 = 𝑛 − 2 = 6 − 2 = 4
2 2
2
, 2
1 1
2
n r
t r df n
r r
n
2
6 2
0.982
1 0.982
t
10.3981 TS: 𝑟 = 0.982
CV: From Pearson
Correlation coefficient
table: 𝑛 = 6,α = 0.05
→ 𝑡 = ±2.776 → 𝑟 = ±0.811
20. 20
Given the value of r = 0.801 for 23 pairs of data regarding chocolate consumption and numbers of Nobel
Laureates, and using a significance level of 0.05; is there sufficient evidence to support a claim that there
is a linear correlation between chocolate consumption and numbers of Nobel Laureates?
Example 6
Decision:
a. Reject H0
b. The claim is True
c. There is sufficient evidence to
support the conclusion that for
countries, there is a linear
correlation between chocolate
consumption and numbers of
Nobel Laureates.
TS:
CV: α = 0.05, 𝑡 = 𝑛 − 2 = 23 − 2 = 21
2
2
, 2
1
n
t r df n
r
2
23 2
0.801
1 0.801
t
6.1314 TS: 𝑟 = 0.801
CV: From Pearson Correlation
coefficient table: n= 23, α = 0.05
Interpretation: Although we have found a linear
correlation, it would be absurd to think that eating more
chocolate would help win a Nobel Prize.
Step 1: H0 , H1, claim & Tails
Step 2: TS Calculate (TS)
Step 3: CV using α
Step 4: Make the decision to
a. Reject or not H0
b. The claim is true or false
c. Restate this decision: There
is / is not sufficient evidence to
support the claim that…
→ 𝑡 = ±2.080 Table: r = 0.396 < CV < r = 0.444
Technology: r = 0.413 → 𝑟 = ±0.413
H0: 𝜌 = 0, H1: 𝜌 ≠ 0, claim, 2TT
21. Interpreting r: Explained Variation
The value of r² is the proportion of the variation in y that is explained by the linear
relationship between x and y.
Using the 23 pairs of chocolate/Nobel data, we get r = 0.801. What proportion of the
variation in numbers of Nobel Laureates can be explained by the variation in the
consumption of chocolate?
21
Solution
With r = 0.801 we get r² = 0.642.
Interpretation
We conclude that 0.642 (or about 64%) of the variation in numbers of Nobel
Laureates can be explained by the linear relationship between chocolate consumption
and numbers of Nobel Laureates.
This implies that about 36% of the variation in numbers of Nobel Laureates cannot be
explained by rates of chocolate consumption.
22. When the null hypothesis has been rejected for a specific value, any of the following five possibilities can exist.
1. There is a direct cause-and-effect relationship between the variables. That is, x causes y.
water causes plants to grow poison causes death heat causes ice to melt
2. There is a reverse cause-and-effect relationship between the variables. That is, y causes x.
Suppose a researcher believes excessive coffee consumption causes nervousness, but the researcher fails to consider that the
reverse situation may occur. That is, it may be that an extremely nervous person craves coffee to calm his or her nerves.
3. The relationship between the variables may be caused by a third variable.
If a statistician correlated the number of deaths due to drowning and the number of cans of soft drink consumed daily during
the summer, he or she would probably find a significant relationship. However, the soft drink is not necessarily responsible
for the deaths, since both variables may be related to heat and humidity.
4. There may be a complexity of interrelationships among many variables.
A researcher may find a significant relationship between students’ high school grades and college grades. But there probably
are many other variables involved, such as IQ, hours of study, influence of parents, motivation, age, and instructors.
5. The relationship may be coincidental.
A researcher may be able to find a significant relationship between the increase in the number of people who are exercising
and the increase in the number of people who are committing crimes. But common-sense dictates that any relationship
between these two values must be due to coincidence.
Correlation, Possible Relationships Between Variables
22
23. Interpreting r with Causation:
Correlation does not imply causality!
We noted previously that we should use common sense when interpreting results.
Clearly, it would be absurd to think that eating more chocolate would help win a
Nobel Prize
23
Common Errors Involving Correlation:
1. Assuming that correlation implies causality
2. Using data based on averages
3. Ignoring the possibility of a nonlinear relationship
Hypotheses If conducting a formal hypothesis test to determine whether there is a significant
linear correlation between two variables, use the following null and alternative hypotheses that
use ρ to represent the linear correlation coefficient of the population:
Null Hypothesis H0: ρ = 0 (No correlation)
Alternative Hypothesis H1: ρ ≠ 0 (Correlation)