This document provides examples of statistical concepts and calculations related to business statistics exercises covering weeks 36-50. It includes 3 examples:
1) Calculating probabilities using the normal distribution for scenarios involving a company's vending machine electricity consumption.
2) Hypothesis testing examples comparing a sample mean to a hypothesized population mean using z-tests and t-tests.
3) An example calculating the chi-squared test statistic to test if a sample variance matches a hypothesized population variance.
This document provides an overview of lectures for a business statistics course covering weeks 43-50. It discusses key concepts in estimation including:
- Point and interval estimators for population parameters based on sample statistics
- Properties of unbiased and consistent estimators
- How to construct confidence intervals for estimating the population mean when the standard deviation is known, including interpreting the results.
- How sample size affects the width of confidence intervals and precision of estimates.
This document discusses key concepts in descriptive statistics including:
- Measures of central tendency like mean, median, and mode.
- Measures of variability such as range, interquartile range, variance, and standard deviation.
- Frequency distributions, percentages, and probability distributions.
- Population and sample distributions as well as the sampling distribution of the mean and the central limit theorem.
The document discusses various statistical concepts related to hypothesis testing including:
- Hypothesis, null hypothesis, and alternative hypothesis
- Types of statistical analyses for testing hypotheses (univariate, bivariate, multivariate)
- Common statistical tests like z-test, t-test, chi-square test, and tests of proportions
- Key steps in hypothesis testing like defining the hypotheses, determining significance levels, calculating test statistics, and making conclusions
- Types I and II errors that can occur in hypothesis testing
Examples are provided to demonstrate how to set up and conduct hypothesis tests using z-test, t-test, chi-square test, and test of proportions.
This document discusses various measures of dispersion used in statistics. It defines dispersion as measuring how varied or spread out data values are from the average. The document outlines both absolute measures like range and interquartile range, and relative measures like coefficient of range and coefficient of variation. It provides examples of calculating each measure and discusses their properties, merits, limitations, and uses. The key measures covered are range, interquartile range, mean deviation, and how to calculate them from both raw data and frequency distributions.
The document discusses various methods for describing data distributions numerically, including measures of center (mean, median), measures of spread (standard deviation, interquartile range), and graphical representations (boxplots). It explains how to calculate and interpret the mean, median, quartiles, five-number summary, standard deviation, and identifies outliers. Choosing an appropriate measure of center and spread depends on the symmetry of the distribution and presence of outliers. Changing the measurement units affects the calculated values but not the underlying shape of the distribution.
This document discusses interval estimation for proportions. It defines point estimates and interval estimates. A point estimate is a single value of a statistic used to estimate a population parameter, like the sample proportion p estimating the population proportion P. An interval estimate provides a range of values between which the population parameter is expected to lie with a certain confidence level, like a 95% confidence interval for a proportion. Two examples are provided to demonstrate how to calculate a confidence interval for a sample proportion and interpret whether it supports or contradicts a claimed population proportion.
- Point estimation involves using sample data to calculate a single number (point estimate) that estimates an unknown population parameter.
- A point estimator is a statistic used to calculate the point estimate. For example, when estimating an unknown population mean μ, the sample mean x̅ is a point estimator for μ.
- An unbiased estimator has an expected value equal to the true population parameter value. A biased estimator has an expected value that is not equal to the true parameter value.
- Common methods for finding estimators include maximum likelihood estimation and the method of moments. Maximum likelihood estimation identifies the value of the parameter that maximizes the likelihood function based on the sample data. The method of moments equates sample moments
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
This document provides an overview of lectures for a business statistics course covering weeks 43-50. It discusses key concepts in estimation including:
- Point and interval estimators for population parameters based on sample statistics
- Properties of unbiased and consistent estimators
- How to construct confidence intervals for estimating the population mean when the standard deviation is known, including interpreting the results.
- How sample size affects the width of confidence intervals and precision of estimates.
This document discusses key concepts in descriptive statistics including:
- Measures of central tendency like mean, median, and mode.
- Measures of variability such as range, interquartile range, variance, and standard deviation.
- Frequency distributions, percentages, and probability distributions.
- Population and sample distributions as well as the sampling distribution of the mean and the central limit theorem.
The document discusses various statistical concepts related to hypothesis testing including:
- Hypothesis, null hypothesis, and alternative hypothesis
- Types of statistical analyses for testing hypotheses (univariate, bivariate, multivariate)
- Common statistical tests like z-test, t-test, chi-square test, and tests of proportions
- Key steps in hypothesis testing like defining the hypotheses, determining significance levels, calculating test statistics, and making conclusions
- Types I and II errors that can occur in hypothesis testing
Examples are provided to demonstrate how to set up and conduct hypothesis tests using z-test, t-test, chi-square test, and test of proportions.
This document discusses various measures of dispersion used in statistics. It defines dispersion as measuring how varied or spread out data values are from the average. The document outlines both absolute measures like range and interquartile range, and relative measures like coefficient of range and coefficient of variation. It provides examples of calculating each measure and discusses their properties, merits, limitations, and uses. The key measures covered are range, interquartile range, mean deviation, and how to calculate them from both raw data and frequency distributions.
The document discusses various methods for describing data distributions numerically, including measures of center (mean, median), measures of spread (standard deviation, interquartile range), and graphical representations (boxplots). It explains how to calculate and interpret the mean, median, quartiles, five-number summary, standard deviation, and identifies outliers. Choosing an appropriate measure of center and spread depends on the symmetry of the distribution and presence of outliers. Changing the measurement units affects the calculated values but not the underlying shape of the distribution.
This document discusses interval estimation for proportions. It defines point estimates and interval estimates. A point estimate is a single value of a statistic used to estimate a population parameter, like the sample proportion p estimating the population proportion P. An interval estimate provides a range of values between which the population parameter is expected to lie with a certain confidence level, like a 95% confidence interval for a proportion. Two examples are provided to demonstrate how to calculate a confidence interval for a sample proportion and interpret whether it supports or contradicts a claimed population proportion.
- Point estimation involves using sample data to calculate a single number (point estimate) that estimates an unknown population parameter.
- A point estimator is a statistic used to calculate the point estimate. For example, when estimating an unknown population mean μ, the sample mean x̅ is a point estimator for μ.
- An unbiased estimator has an expected value equal to the true population parameter value. A biased estimator has an expected value that is not equal to the true parameter value.
- Common methods for finding estimators include maximum likelihood estimation and the method of moments. Maximum likelihood estimation identifies the value of the parameter that maximizes the likelihood function based on the sample data. The method of moments equates sample moments
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 3: Describing, Exploring, and Comparing Data
3.2: Measures of Variation
The document provides an overview of descriptive statistics and statistical graphs, including measures of center such as mean, median, and mode, measures of variation such as range and standard deviation, and different types of statistical graphs like histograms, boxplots, and normal distributions. It discusses key concepts like outliers, percentiles, quartiles, sampling distributions, and the central limit theorem. The document is intended to describe important statistical tools and concepts for summarizing and describing the characteristics of data sets.
This document discusses various measures of variability or variation used in statistics. It defines variability as the extent to which observations vary from each other or from the average. The key measures discussed are range, interquartile range, average deviation, and standard deviation. Range is the simplest but ignores the distribution within its limits. Interquartile range excludes outliers but also ignores half the data. Average deviation measures average distance from the mean/median and indicates how compact 50% of data is. Examples are provided to demonstrate calculating and comparing these measures.
This document provides an overview of confidence intervals. It discusses calculating confidence intervals for a population mean using the normal and student's t distributions. The key steps are presented: determining the point estimate, finding the appropriate z-score for the confidence level, calculating the margin of error using the z-score and standard error, and stating the confidence interval. An example is also provided to demonstrate calculating a 90% confidence interval for a population mean when the standard deviation is known.
This document provides an overview of key concepts from chapters 5 and 6 of an introductory statistics textbook. It discusses continuous probability distributions and their properties, including the uniform and exponential distributions. It then focuses on the normal distribution and standard normal distribution, explaining how to calculate z-scores and use the empirical rule. Examples are provided for calculating probabilities using the normal distribution. The summary aims to introduce students to important concepts involving continuous random variables and the normal distribution.
This document provides an overview of sampling theory and statistical analysis. It discusses different sampling methods, important sampling terms, and statistical tests. The key points are:
1) There are two ways to collect statistical data - a complete enumeration (census) or a sample survey. A sample is a portion of a population that is examined to estimate population characteristics.
2) Common sampling methods include simple random sampling, systematic sampling, stratified sampling, cluster sampling, quota sampling, and purposive sampling.
3) Important terms include parameters, statistics, sampling distributions, and statistical inferences about populations based on sample data.
4) Statistical tests covered include hypothesis testing, types of errors, test statistics, critical values,
1) The document discusses density curves and normal distributions, which are important mathematical models for describing the overall pattern of data. A density curve describes the distribution of a large number of observations.
2) It specifically covers the normal distribution and some of its key properties, including that about 68%, 95%, and 99.7% of observations fall within 1, 2, and 3 standard deviations of the mean, respectively.
3) The document shows how to work with normal distributions using techniques like standardizing data, finding areas under the normal curve using the standard normal table, and assessing normality with a normal quantile plot.
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
Introduction to Inference, Estimating with Confidence, Inference, Statistical Confidence, Confidence Intervals, Confidence Interval for a Population Mean, Choosing the Sample Size
This document provides an overview of statistical inference. It discusses descriptive statistics, which summarize data, and inferential statistics, which are used to generalize from samples to populations. Key concepts covered include estimation, hypothesis testing, parameters, statistics, confidence intervals, significance levels, types of errors. Examples are given of how to calculate confidence intervals for means and proportions and how to perform hypothesis tests using z-tests and t-tests. Steps for conducting hypothesis tests are outlined.
This chapter discusses point estimates and confidence intervals. A point estimate is a statistic used to estimate a population parameter, while a confidence interval provides a range of values that is likely to include the true population parameter. The width of a confidence interval depends on the sample size, population variability, and desired confidence level. Confidence intervals for a mean can be constructed using the t or z distributions depending on whether the population standard deviation is known. Confidence intervals can also be constructed for a population proportion. Sample sizes needed for estimating means and proportions are also addressed.
This document provides an overview of key statistical concepts taught in a statistics lab lesson, including point estimation, confidence intervals, and hypothesis testing. It defines point estimators like the sample mean that summarize a population using a sample. Confidence intervals give a range of values that the population parameter is expected to lie within. Hypothesis testing involves setting up null and alternative hypotheses and using a test statistic and critical value to reject or fail to reject the null hypothesis. Formulas for confidence intervals and hypothesis tests are presented for situations involving normal, t, and binomial distributions.
Inferential vs descriptive tutorial of when to useKen Plummer
The document discusses the differences between descriptive and inferential statistics. Descriptive statistics are used to describe characteristics of a whole population, while inferential statistics are used when the whole population cannot be measured and conclusions are drawn from a sample to generalize to the larger population. Examples are provided to illustrate when each type of statistic would be used. Key differences include descriptive statistics examining entire populations while inferential statistics examine samples that aim to infer conclusions about populations.
This document provides an overview of statistical estimation and inference. It discusses point estimation, which provides a single value to estimate an unknown population parameter, and interval estimation, which gives a range of plausible values for the parameter. The key aspects of interval estimation are confidence intervals, which provide a probability statement about where the true population parameter lies. The document also covers important concepts like sampling distributions, the central limit theorem, and factors that influence the width of a confidence interval like sample size. Examples are provided to demonstrate calculating point estimates, confidence intervals, and dealing with independent samples.
This document provides an outline and summaries of topics related to error analysis:
- It outlines topics including binomial distribution, Poisson distribution, normal distribution, confidence interval, and least squares analysis.
- The binomial distribution section provides an example of calculating the probability of getting 2 and 3 heads out of 6 coin tosses.
- The normal distribution section explains how to calculate the probability of scoring between 90-110 on an IQ test with a mean of 100 and standard deviation of 10.
- The confidence interval section provides an example of calculating the 95% confidence interval for the population mean boiling temperature based on 6 sample measurements.
Point estimate for a population proportion pMuel Clamor
This document provides information about point estimates for population proportions:
1) A point estimate predicts a parameter with a single number, while an interval estimate provides a range of numbers that could be the true parameter value.
2) The point estimator for a population proportion p is the sample proportion p, which is calculated as the number of successes divided by the sample size n.
3) Two examples are given to demonstrate calculating the point estimate of a population proportion p from sample data on the number of successes.
This document discusses statistical analysis techniques including measures of central tendency, variance, standard deviation, t-tests, and levels of significance. It provides an example of using these techniques to analyze plant height data from a fertilizer experiment and determine if differences in heights between treated and untreated plants are statistically significant. The document introduces the concepts and calculations involved in describing and analyzing quantitative data using common statistical methods.
This document discusses parameter estimation and interval estimation. It defines point estimates as single values that estimate population parameters and interval estimates as ranges of values within which population parameters are expected to fall. It provides examples of using the sample mean and variance as point estimators for the population mean and variance. It also discusses how to construct confidence intervals for population parameters based on sample statistics, sample size, and the desired confidence level.
This chapter discusses confidence interval estimation. It defines point estimates and confidence intervals, and explains how to construct confidence intervals for a population mean when the population standard deviation is known or unknown, as well as for a population proportion. When the population standard deviation is unknown, a t-distribution rather than normal distribution is used. Formulas and examples are provided. The chapter also addresses determining the required sample size to estimate a mean or proportion within a specified margin of error.
This document discusses various measures of variability and dispersion in statistics, including the range, quartiles, interquartile range, percentiles, and five number summary. It provides definitions and examples of each measure. The range is defined as the difference between the highest and lowest values in a data set. Quartiles split a data set into four equal parts, with the first (Q1) and third (Q3) quartiles used to calculate the interquartile range. Percentiles indicate the percentage of values below a given score. The five number summary encapsulates the minimum, first quartile, median, third quartile, and maximum.
This document discusses sampling variability and sampling distributions. It defines key terms like statistic, sampling distribution, and population distribution. It presents examples of how sampling distributions are impacted by sample size and population characteristics. The central limit theorem is introduced, stating that sampling distributions become normally distributed as sample size increases, even if the population is not normal. Properties of sampling distributions for the sample mean and sample proportion are provided. Examples demonstrate how to calculate probabilities using these sampling distributions.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
Okay, here are the steps to convert each score to a z-score:
For history test:
Z = (X - Mean) / Standard Deviation
Z = (78 - 79) / 6
Z = -0.167
For math test:
Z = (X - Mean) / Standard Deviation
Z = (82 - 84) / 5
Z = 0.8
So the z-score for the history test is -0.167 and the z-score for the math test is 0.8.
This document contains 5 questions and their answers. Question 1 analyzes survey data to determine if more than 61% of people sleep 7 or more hours per night on weekends. Question 2 calculates a p-value for a hypothesis test comparing the means of two employment tests. Question 3 performs a hypothesis test to examine if a sample's mean score differs from the expected population mean. Question 4 uses a chi-squared test to determine if there is a preference for certain class times. Question 5 provides commute data and asks to calculate the line of best fit, confidence intervals, and determine if distance can indicate travel time.
The document provides an overview of descriptive statistics and statistical graphs, including measures of center such as mean, median, and mode, measures of variation such as range and standard deviation, and different types of statistical graphs like histograms, boxplots, and normal distributions. It discusses key concepts like outliers, percentiles, quartiles, sampling distributions, and the central limit theorem. The document is intended to describe important statistical tools and concepts for summarizing and describing the characteristics of data sets.
This document discusses various measures of variability or variation used in statistics. It defines variability as the extent to which observations vary from each other or from the average. The key measures discussed are range, interquartile range, average deviation, and standard deviation. Range is the simplest but ignores the distribution within its limits. Interquartile range excludes outliers but also ignores half the data. Average deviation measures average distance from the mean/median and indicates how compact 50% of data is. Examples are provided to demonstrate calculating and comparing these measures.
This document provides an overview of confidence intervals. It discusses calculating confidence intervals for a population mean using the normal and student's t distributions. The key steps are presented: determining the point estimate, finding the appropriate z-score for the confidence level, calculating the margin of error using the z-score and standard error, and stating the confidence interval. An example is also provided to demonstrate calculating a 90% confidence interval for a population mean when the standard deviation is known.
This document provides an overview of key concepts from chapters 5 and 6 of an introductory statistics textbook. It discusses continuous probability distributions and their properties, including the uniform and exponential distributions. It then focuses on the normal distribution and standard normal distribution, explaining how to calculate z-scores and use the empirical rule. Examples are provided for calculating probabilities using the normal distribution. The summary aims to introduce students to important concepts involving continuous random variables and the normal distribution.
This document provides an overview of sampling theory and statistical analysis. It discusses different sampling methods, important sampling terms, and statistical tests. The key points are:
1) There are two ways to collect statistical data - a complete enumeration (census) or a sample survey. A sample is a portion of a population that is examined to estimate population characteristics.
2) Common sampling methods include simple random sampling, systematic sampling, stratified sampling, cluster sampling, quota sampling, and purposive sampling.
3) Important terms include parameters, statistics, sampling distributions, and statistical inferences about populations based on sample data.
4) Statistical tests covered include hypothesis testing, types of errors, test statistics, critical values,
1) The document discusses density curves and normal distributions, which are important mathematical models for describing the overall pattern of data. A density curve describes the distribution of a large number of observations.
2) It specifically covers the normal distribution and some of its key properties, including that about 68%, 95%, and 99.7% of observations fall within 1, 2, and 3 standard deviations of the mean, respectively.
3) The document shows how to work with normal distributions using techniques like standardizing data, finding areas under the normal curve using the standard normal table, and assessing normality with a normal quantile plot.
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
Introduction to Inference, Estimating with Confidence, Inference, Statistical Confidence, Confidence Intervals, Confidence Interval for a Population Mean, Choosing the Sample Size
This document provides an overview of statistical inference. It discusses descriptive statistics, which summarize data, and inferential statistics, which are used to generalize from samples to populations. Key concepts covered include estimation, hypothesis testing, parameters, statistics, confidence intervals, significance levels, types of errors. Examples are given of how to calculate confidence intervals for means and proportions and how to perform hypothesis tests using z-tests and t-tests. Steps for conducting hypothesis tests are outlined.
This chapter discusses point estimates and confidence intervals. A point estimate is a statistic used to estimate a population parameter, while a confidence interval provides a range of values that is likely to include the true population parameter. The width of a confidence interval depends on the sample size, population variability, and desired confidence level. Confidence intervals for a mean can be constructed using the t or z distributions depending on whether the population standard deviation is known. Confidence intervals can also be constructed for a population proportion. Sample sizes needed for estimating means and proportions are also addressed.
This document provides an overview of key statistical concepts taught in a statistics lab lesson, including point estimation, confidence intervals, and hypothesis testing. It defines point estimators like the sample mean that summarize a population using a sample. Confidence intervals give a range of values that the population parameter is expected to lie within. Hypothesis testing involves setting up null and alternative hypotheses and using a test statistic and critical value to reject or fail to reject the null hypothesis. Formulas for confidence intervals and hypothesis tests are presented for situations involving normal, t, and binomial distributions.
Inferential vs descriptive tutorial of when to useKen Plummer
The document discusses the differences between descriptive and inferential statistics. Descriptive statistics are used to describe characteristics of a whole population, while inferential statistics are used when the whole population cannot be measured and conclusions are drawn from a sample to generalize to the larger population. Examples are provided to illustrate when each type of statistic would be used. Key differences include descriptive statistics examining entire populations while inferential statistics examine samples that aim to infer conclusions about populations.
This document provides an overview of statistical estimation and inference. It discusses point estimation, which provides a single value to estimate an unknown population parameter, and interval estimation, which gives a range of plausible values for the parameter. The key aspects of interval estimation are confidence intervals, which provide a probability statement about where the true population parameter lies. The document also covers important concepts like sampling distributions, the central limit theorem, and factors that influence the width of a confidence interval like sample size. Examples are provided to demonstrate calculating point estimates, confidence intervals, and dealing with independent samples.
This document provides an outline and summaries of topics related to error analysis:
- It outlines topics including binomial distribution, Poisson distribution, normal distribution, confidence interval, and least squares analysis.
- The binomial distribution section provides an example of calculating the probability of getting 2 and 3 heads out of 6 coin tosses.
- The normal distribution section explains how to calculate the probability of scoring between 90-110 on an IQ test with a mean of 100 and standard deviation of 10.
- The confidence interval section provides an example of calculating the 95% confidence interval for the population mean boiling temperature based on 6 sample measurements.
Point estimate for a population proportion pMuel Clamor
This document provides information about point estimates for population proportions:
1) A point estimate predicts a parameter with a single number, while an interval estimate provides a range of numbers that could be the true parameter value.
2) The point estimator for a population proportion p is the sample proportion p, which is calculated as the number of successes divided by the sample size n.
3) Two examples are given to demonstrate calculating the point estimate of a population proportion p from sample data on the number of successes.
This document discusses statistical analysis techniques including measures of central tendency, variance, standard deviation, t-tests, and levels of significance. It provides an example of using these techniques to analyze plant height data from a fertilizer experiment and determine if differences in heights between treated and untreated plants are statistically significant. The document introduces the concepts and calculations involved in describing and analyzing quantitative data using common statistical methods.
This document discusses parameter estimation and interval estimation. It defines point estimates as single values that estimate population parameters and interval estimates as ranges of values within which population parameters are expected to fall. It provides examples of using the sample mean and variance as point estimators for the population mean and variance. It also discusses how to construct confidence intervals for population parameters based on sample statistics, sample size, and the desired confidence level.
This chapter discusses confidence interval estimation. It defines point estimates and confidence intervals, and explains how to construct confidence intervals for a population mean when the population standard deviation is known or unknown, as well as for a population proportion. When the population standard deviation is unknown, a t-distribution rather than normal distribution is used. Formulas and examples are provided. The chapter also addresses determining the required sample size to estimate a mean or proportion within a specified margin of error.
This document discusses various measures of variability and dispersion in statistics, including the range, quartiles, interquartile range, percentiles, and five number summary. It provides definitions and examples of each measure. The range is defined as the difference between the highest and lowest values in a data set. Quartiles split a data set into four equal parts, with the first (Q1) and third (Q3) quartiles used to calculate the interquartile range. Percentiles indicate the percentage of values below a given score. The five number summary encapsulates the minimum, first quartile, median, third quartile, and maximum.
This document discusses sampling variability and sampling distributions. It defines key terms like statistic, sampling distribution, and population distribution. It presents examples of how sampling distributions are impacted by sample size and population characteristics. The central limit theorem is introduced, stating that sampling distributions become normally distributed as sample size increases, even if the population is not normal. Properties of sampling distributions for the sample mean and sample proportion are provided. Examples demonstrate how to calculate probabilities using these sampling distributions.
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...nszakir
Mathematics, Statistics, Sampling Distributions for Counts and Proportions, Binomial Distributions for Sample Counts,
Binomial Distributions in Statistical Sampling, Binomial Mean and Standard Deviation, Sample Proportions, Normal Approximation for Counts and Proportions, Binomial Formula
Okay, here are the steps to convert each score to a z-score:
For history test:
Z = (X - Mean) / Standard Deviation
Z = (78 - 79) / 6
Z = -0.167
For math test:
Z = (X - Mean) / Standard Deviation
Z = (82 - 84) / 5
Z = 0.8
So the z-score for the history test is -0.167 and the z-score for the math test is 0.8.
This document contains 5 questions and their answers. Question 1 analyzes survey data to determine if more than 61% of people sleep 7 or more hours per night on weekends. Question 2 calculates a p-value for a hypothesis test comparing the means of two employment tests. Question 3 performs a hypothesis test to examine if a sample's mean score differs from the expected population mean. Question 4 uses a chi-squared test to determine if there is a preference for certain class times. Question 5 provides commute data and asks to calculate the line of best fit, confidence intervals, and determine if distance can indicate travel time.
This document provides an introduction to business intelligence and data analytics. It discusses key concepts such as data sources, data warehouses, data marts, data mining, and data analytics. It also covers topics like univariate analysis, measures of dispersion, heterogeneity measures, confidence intervals, cross validation, and ROC curves. The document aims to introduce fundamental techniques and metrics used in business intelligence and data mining.
The learning outcomes of this topic are:
- Recognize the terms sample statistic and population parameter
- Use confidence intervals to indicate the reliability of estimates
- Know when approximate large sample or exact confidence intervals are appropriate
This topic will cover:
- Sampling distributions
- Point estimates and confidence intervals
- Introduction to hypothesis testing
This document provides information about binomial and Poisson distributions. It includes examples of calculating probabilities for binomial distributions using the binomial probability formula and binomial tables. It also provides the key characteristics and formula for the Poisson distribution. The mean, variance and standard deviation are defined for binomial distributions. Examples are provided to demonstrate calculating these values.
This document presents hypotheses concerning proportions. It introduces proportions and the binomial distribution. It discusses hypotheses for one proportion using a z-test and provides an example. It also discusses hypotheses for two proportions using a z-test and provides an example. Finally, it discusses analyzing contingency tables using a chi-squared test and provides an example. References for further information are also included.
The document provides examples and solutions for calculating confidence intervals for various statistical parameters. It discusses how to find the 95% confidence interval for the difference of two means based on paired data. It also demonstrates how to calculate the 95% confidence interval for a single population proportion and the 90% confidence interval for the difference of two proportions. Finally, it shows an example of constructing a 99% confidence interval for a population variance based on a sample.
Quality is defined as customers' perception of how well a product or service meets their expectations. There are three types of quality: quality of design, quality of performance, and quality of conformance. Statistical quality control uses statistical techniques to control, improve, and maintain quality. Control charts are used to determine if a process is in or out of control by monitoring for random or assignable variation. Process capability indices like Cp and Cpk compare process variability to specification limits to determine if a process is capable of meeting specifications.
Binomial and Poission Probablity distributionPrateek Singla
The document discusses binomial and Poisson distributions. Binomial distribution describes random events with two possible outcomes, like success/failure. Poisson distribution models rare, independent events occurring randomly over an interval of time/space. An example calculates the probability of defective thermometers using binomial distribution. It also fits a Poisson distribution to automobile accident data from a 50-day period.
This document discusses measures of central tendency, including the mode, median, quartiles, and percentiles. It provides definitions and formulas for calculating each measure. The mode is defined as the value that occurs most frequently. The median divides the data set into two equal parts. Quartiles divide the data set into four equal parts, with the second quartile being the median. Percentiles divide the data set into 100 equal parts. Several examples are provided to demonstrate calculating the mode, median, quartiles, deciles and percentiles from data sets.
This document discusses measures of central tendency, specifically the arithmetic mean. It provides formulas and examples for calculating the arithmetic mean using direct, short-cut, and step-deviation methods for both ungrouped and grouped data. It also discusses calculating the weighted mean and combined mean of two or more related groups. Key characteristics of the arithmetic mean are that the sum of deviations from the mean is zero and the sum of squared deviations is minimum.
Process Capability for certificate course for marketing engineers onlineDevendraLokhande
The document discusses fundamentals of process capability including:
1) Achieving process control is key to reducing variation and non-conforming parts to achieve quality and cost targets.
2) Process capability indices like Cp, Cpk measure how well a process performs versus specifications and a value of 1.33 or higher indicates a stable and capable process.
3) Many factors can impact a process's capability including the machine, environment, operator, material and more; conducting process capability studies helps evaluate a process and identify sources of variation.
The document provides a review for a statistics final exam. It includes definitions students should know, examples of hypothesis tests and probability questions, instructions on how to find probabilities using normal distributions, and other statistical concepts. It emphasizes knowing key terms, being able to perform hypothesis tests and find probabilities using normal distributions, and understanding the differences between binomial and Poisson distributions.
The document provides examples and solutions for confidence intervals from a statistics textbook. It includes confidence intervals for means from sample data on reading scores, worker distractions, actuary exam salaries, television viewing hours, hospital noise levels, time spent on homework, chocolate chips per cookie, number of women representatives, and dance company student membership. For each problem, it provides the sample size, mean, standard deviation, and confidence level to calculate the confidence interval for the true population mean. Several examples compare the sample mean or sample estimates to actual population values. Unusual outlier data is noted to affect the sample mean in one example.
The document discusses properties of the normal distribution and how to calculate probabilities using the standard normal distribution. It provides examples of finding areas under the normal curve to calculate probabilities for different cases, such as finding the probability that a value falls within a certain range of scores or above/below a cutoff score. It also shows how to find z-scores and their corresponding x-values given a probability or area under the curve. Key aspects covered include converting between z-scores and raw scores, using z-score tables to find probabilities and areas, and drawing diagrams to illustrate the areas being calculated.
This document discusses experimental design and different types of designs used in statistics. It begins by introducing the basic principles of experimental design such as randomization, replication, and blocking to control extraneous variables. It then describes the three basic designs: completely randomized design, randomized block design, and Latin square design. For each design, it provides an example to illustrate how treatments are assigned randomly or systematically. Finally, it introduces analysis of variance (ANOVA) which is used to analyze the effects of factors in experimental designs.
The document discusses binomial and Poisson distributions. It provides the probability mass functions (PMF) for binomial and Poisson distributions. For binomial, the PMF shows the probability of getting x successes in n trials with probability of success p. For Poisson, the PMF gives the probability of getting x events occurring with average rate λ. Examples are given where variables follow a Poisson distribution, such as number of deaths in a city daily.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 7: Estimating Parameters and Determining Sample Sizes
7.2: Estimating a Population Mean
R = R0(1 + α(t - 20))
- The resistance (R) of a copper wire is calculated using a formula that relates it to the resistance at 20°C (R0), the coefficient of resistance (α), and the temperature (t).
- R0 is given as 6Ω with an uncertainty of ±0.3%.
- To determine the uncertainty in R, the uncertainties in R0, α, and t must be determined and propagated through the equation using partial derivatives.
- The overall uncertainty in R combines the individual uncertainties from each variable according to the propagation of uncertainty formula.
This document discusses confidence intervals for the difference between two population proportions and provides an example. It also discusses sampling distributions of sample variances and provides examples for confidence intervals of variances and determining sample size. Finally, it discusses determining sample size for confidence intervals of population means and proportions.
Similar to Business statistics I - Excercises (20)
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
How to Setup Warehouse & Location in Odoo 17 InventoryCeline George
In this slide, we'll explore how to set up warehouses and locations in Odoo 17 Inventory. This will help us manage our stock effectively, track inventory levels, and streamline warehouse operations.
Reimagining Your Library Space: How to Increase the Vibes in Your Library No ...Diana Rendina
Librarians are leading the way in creating future-ready citizens – now we need to update our spaces to match. In this session, attendees will get inspiration for transforming their library spaces. You’ll learn how to survey students and patrons, create a focus group, and use design thinking to brainstorm ideas for your space. We’ll discuss budget friendly ways to change your space as well as how to find funding. No matter where you’re at, you’ll find ideas for reimagining your space in this session.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
A review of the growth of the Israel Genealogy Research Association Database Collection for the last 12 months. Our collection is now passed the 3 million mark and still growing. See which archives have contributed the most. See the different types of records we have, and which years have had records added. You can also see what we have for the future.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...PECB
Denis is a dynamic and results-driven Chief Information Officer (CIO) with a distinguished career spanning information systems analysis and technical project management. With a proven track record of spearheading the design and delivery of cutting-edge Information Management solutions, he has consistently elevated business operations, streamlined reporting functions, and maximized process efficiency.
Certified as an ISO/IEC 27001: Information Security Management Systems (ISMS) Lead Implementer, Data Protection Officer, and Cyber Risks Analyst, Denis brings a heightened focus on data security, privacy, and cyber resilience to every endeavor.
His expertise extends across a diverse spectrum of reporting, database, and web development applications, underpinned by an exceptional grasp of data storage and virtualization technologies. His proficiency in application testing, database administration, and data cleansing ensures seamless execution of complex projects.
What sets Denis apart is his comprehensive understanding of Business and Systems Analysis technologies, honed through involvement in all phases of the Software Development Lifecycle (SDLC). From meticulous requirements gathering to precise analysis, innovative design, rigorous development, thorough testing, and successful implementation, he has consistently delivered exceptional results.
Throughout his career, he has taken on multifaceted roles, from leading technical project management teams to owning solutions that drive operational excellence. His conscientious and proactive approach is unwavering, whether he is working independently or collaboratively within a team. His ability to connect with colleagues on a personal level underscores his commitment to fostering a harmonious and productive workplace environment.
Date: May 29, 2024
Tags: Information Security, ISO/IEC 27001, ISO/IEC 42001, Artificial Intelligence, GDPR
-------------------------------------------------------------------------------
Find out more about ISO training and certification services
Training: ISO/IEC 27001 Information Security Management System - EN | PECB
ISO/IEC 42001 Artificial Intelligence Management System - EN | PECB
General Data Protection Regulation (GDPR) - Training Courses - EN | PECB
Webinars: https://pecb.com/webinars
Article: https://pecb.com/article
-------------------------------------------------------------------------------
For more information about PECB:
Website: https://pecb.com/
LinkedIn: https://www.linkedin.com/company/pecb/
Facebook: https://www.facebook.com/PECBInternational/
Slideshare: http://www.slideshare.net/PECBCERTIFICATION
How to Build a Module in Odoo 17 Using the Scaffold MethodCeline George
Odoo provides an option for creating a module by using a single line command. By using this command the user can make a whole structure of a module. It is very easy for a beginner to make a module. There is no need to make each file manually. This slide will show how to create a module using the scaffold method.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
How to Add Chatter in the odoo 17 ERP ModuleCeline George
In Odoo, the chatter is like a chat tool that helps you work together on records. You can leave notes and track things, making it easier to talk with your team and partners. Inside chatter, all communication history, activity, and changes will be displayed.
1. BUSINESS STATISTICS I
Excercises — Weeks 36 – 50
Antonio Rivero Ostoic
School of Business and Social Sciences
September − December
AARHUS
UNIVERSITYAU
2. Example
Using X
A company’s vending machines consume on average 460 kwh of
electricity with a standard deviation of 5 kwh
• What is the probability that a vending machine in a given
location consumes less than 470 kwh?
P(X < 470) = P X−µ
σ
< 470−460
5
= P(Z < 2) = 0.9772 = 98%
• ...and the probability for using more than 470 kwh?
P(X > 470) = P X−µ
σ
> 470−460
5
= P(Z > 2) = 1 − P(Z < 2)
= 1 − .9772 = 0.0228 = 2%
16 / 32
3. Example II
Using X
• What is the probability that 3 vending machines consume less
than 465 kwh?
i.e. P(X 465)
Since we assume that X is normally distributed, the standard
error of the mean must consider the sample size
σx = σ√
n
= 5√
3
= 2.89
P(X 465) = P
(X−µx)
σx
465−460
2.89
= P(Z 1.73) = .9582 = 96%
17 / 32
4. Inference of the sample mean
Example
A sample distribution with n = 3 tells us that a vending machine
consumes on average 470 kwh with σ = 5 kwh
Then we can compute the 95% probability that the mean is located
in a certain range from the sample mean
Since z.025 = 1.96, then P(−1.96 Z 1.96) = .95
By adding µ and by multiplying by σ/
√
n to all terms in the
probability statement we get:
P(µ − 1.96 σ√
n
X 1.96 σ√
n
+ µ) = .95
P(470 − 1.96 5√
3
X 1.96 5√
3
+ 470) = .95
P(464.3 X 475.7) = .95
ª hence the sample mean will fall between 464.3 and 475.7 with 95%
probability, which means that the computed sample mean is
supported by the sample statistic
20 / 32
5. Example
finding ˆP
Last year 30% of the schools in town have installed our vending
machine cooler, and we want to see whether or not a proportion of
schools will continue using our machine in the next year
• If we make a random sample of 25 schools, what is the probability
that more than 35% of the sample schools will choose our machine?
Since we have just a success or failure we have a binomial
experiment with p = .30 with n = 25
We want to find P(ˆP .35)
σˆp = p(1 − p)/n = (.30)(.70)/25 = .0917
P(ˆP .35) =
ˆP−p
√
p(1−p)/n
.35−.30
.0917
= P(Z .545) = 1 − P(Z .545) ≈ 1 − .705 = .295 = 30%
26 / 32
6. Example X1 − X2
Our company’s vending machines electricity consumption is normally
distributed with mean of 460 kwh and standard deviation of 5 kwh.
A rival company produces vending machine coolers with normally
distributed consumption of electricity with 455 kwh on average and
10 kwh as standard deviation.
• What is the probability that the average of electricity consumption
of our company’s machines exceed the rival machines if we take
random samples of size 30 and 10 respectively?
i.e. P(X1 − X2 0) with µ1 − µ2 = 460 − 455 = 5 and
σx1−x2
=
σ2
1
n1
+
σ2
2
n2
= 52
30
+ 102
10
=
√
10.833 = 3.29
P(X1 − X2 0) = P
(X1−X2)−(µ1−µ2)
σ2
1
n1
+
σ2
2
n2
0−5
3.29
= P(Z −1.52) = 1 − P(Z −1.52) = 1 − .0643 = .9357
29 / 32
7. Example finding µ with known σ
Our vending machines delivers a soft drink can after few seconds
the costumer press the bottom, but the competition is about to
launch a new vending machine model that we suspect that is faster
than our product
• We need to estimate a 95% confidence interval estimate of the mean
from a sample of 15 machines, and we know from technical
specifications that the standard deviation from the mean in our
machines is .38 seconds.
Response time in seconds to deliver the product:
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
CI estimator for µ with known σ:
x = 2.13. 95% confidence level means α = .05; zα/2 = z.025 = 1.96
x ± zα/2
σ√
n
2.13 ± 1.96 .38√
15
‘error’: ± 0.19
Thus LCL = 1.93 and UCL = 2.32 or else 1.93; 2.32
17 / 24
8. Hypothesis Testing about Parameters
Examples
We have seen in the examples that: the mean of our company’s
vending machines electricity consumption is 460 kwh, or
H0 : µ = 460
• Is there enough evidence that the mean parameter is not
equal to this value?
H1 : µ = 460
• What if we want to test whether there is evidence of an
increase or decrease in the average electricity
consumption of the machines:
H1 : µ 460
H1 : µ 460
Similar statements are made for µ less/more than or equal to a certain value
9 / 31
9. Example
Testing µ when σ is known
We have µ = 460 as the average electricity consumption in kwh
for our machines with σ = 5
• Compute the rejection region for a sample mean of 465 with
random sample size 3 and a 5% significance level
xL−µ
σ/
√
n
= zα xL−460
5/
√
3
= 1.645 xL = 464.7
Which means that the rejection region is:
x 464.7
Since the sample mean of 465 is in the rejection region we
reject the null hypothesis
We conclude that there is sufficient evidence that the average
electricity consumption for our machines exceeds 460 khw
17 / 31
10. Example
Standardized test
With a standardized test statistic we check that
z zα
z = x−µ
σ/
√
n
= 465−460
5/
√
3
= 1.73
Because the value of z (= 1.73) is greater than the z-score of
the chosen significance level (z.05 = 1.65), then we reject the
null hypothesis
...and conclude once more that there is sufficient evidence
that the average electricity consumption for our machines
exceeds 460 khw
ª The results of both the test statistic x and the standardized test statistic z
are identical, and hence the standardized test statistic is typically used and
it is just called as the test statistic
18 / 31
11. Computing the p-value
In order to compute the p-value the example we calculate the
probability of observing a sample mean at least as large as
465 given that µ = 470
That is,
p-value = P(X 465) = x−µ
σ/
√
n
465−460
5/
√
3
= P(Z 1.73)
= 1 − P(Z 1.73) = 1 − .9582 = 0.0418
As a result, the probability of observing a sample mean at
least as large as 465 given that µ = 470 is 4%
ª which is relative small and we reject H0 in favor of the
alternative hypothesis
20 / 31
12. Example
computing β
In case that the hypothesized population mean is 470 kwh with
σ = 5, and a sample size of 3, the value of β for a 5% confidence
level is:
β = P X−µ
σ/
√
n
464.7−470
5/
√
3
= P(Z −1.84) = 0.03
Thus when the mean is 470, the probability of incorrectly not
rejecting a false H0 is 3%
26 / 31
13. Example t-statistic
Vending machines
A random sample of 15 of our vending machines shows that the
response time in seconds to deliver a soft drink is 2.125
seconds on average. Data cf. lecture week 41(43)
But the competition is about to launch a new vending machine
model that might deliver the product faster than ours and
according to their specifications it is going take about 2
seconds.
Do we have enough evidence to conclude that our vending
machines are still competitive?
• In this case the research hypothesis is in relation to the
competition
H1 : µ 2
whereas the null hypothesis is
H0 : µ = 2
10 / 22
14. Example t-statistic II
Since we do not count with the population standard deviation, we
apply the t test statistic with the usual 5% α level and check that
t tα,ν
ª In this case we need the value of s, which is .411
• The test statistic is computed next
t = 2.125−2
.411/
√
15
= 1.18
and we calculate the rejection region for n − 1 df as
t.05,14 = 1.761
ª Because the score of the test statistic is smaller than the
critical value we do not reject the null hypothesis in favor to H1
There is not sufficient statistical evidence that the response time
for our machines exceeds the 2 seconds
11 / 22
15. Example t-statistic revisited
Vending machines
In order to be ‘competitive’ our machines should be faster
than the machines of the competition, and the research
hypothesis can be formulated as
H1 : µ 1.9
In this case the outcome of the test statistic becomes
t = 2.125−1.9
.411/
√
15
= 2.12
ª Now the test statistic score is greater than the critical
value at .05 α level, which means that we reject the null
hypothesis in favor of the alternative
As a result there is enough statistical evidence that the
response time for our machines exceeds the 1.9 seconds
15 / 22
16. Example t-statistic revisited II
Vending machines
For a t score of 2.12 the p-value is calculated as follows
1.761 2.12 2.145
which means that
.05 P(t 2.12) .025
Thus the p-value lies between 2.5% and 5% and it is under
the significance level, which means that H0 is rejected in
favor of the alternative
16 / 22
17. Example χ2
statistic
Vending machines
We have seen previously from a random sample n = 15 that the response
time of our vending machines to deliver the product is on average
2.125 seconds.
From technical specifications the standard deviation parameter is .38
seconds. cf. lecture week 41(43)
• Thus x = 2.125, whereas σ2 = (σ)2 = .382 = .1444
Is there sufficient statistical evidence from the sample data to
claim that the variability in time response is not larger such
theoretical value?
• Now the research hypothesis is related to the population variance
H1 : σ2
.1444
and the null hypothesis is (or should be)
H0 : σ2 = .1444 (H0 : σ2 .1444)
12 / 21
18. Example χ2
statistic II
test statistic
For the variance parameter we apply the χ2 statistic with the
standard significance level of 5%
χ2 =
(n−1)s2
σ2
0
ª And the computation requires the sample variance
s2 = (s)2 = .4112 = .169
So the test statistic is
χ2 = 14×.169
.1444
= 16.39
With the lower-tail critical value of χ2
.95,14 = 6.57
• Because the test statistic (16.39) is larger than the critical value
(6.57) we are unable to reject the null hypothesis
There is insufficient evidence to claim that to the variability in
time response is not greater than the value given in the technical
specifications
13 / 21
19. Example χ2
statistic III
upper test
However, most of the times we want to test whether or not the lack
of precision in a certain process or product exceeds a particular
value specified in the standards.
In such case the claim we want to test is whether or not the
standard deviation of the vending machines do not exceed .38 seconds,
which means that the research hypothesis becomes
H1 : σ2 .1444
• Here the comparison of the test statistic with the critical value is
χ2 χ2
α,ν
• And since the test statistic value 16.39 is smaller than χ2
.05, 14 = 23.7,
then we are unable to reject the null hypothesis
We do not have sufficient evidence either to claim that the time
response is greater than the value given in the specifications
14 / 21
20. Example testing proportions
Vending machines
• The proportion of defective vending machines in our sample are the
units having a response time greater than the upper 95% confidence
level
ª The value of the UCL was calculated to 2.32 seconds -cf. lecture week
41 (43)- and the defective units in the sampling are then emphasized
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
This means that, out of 15 machines, 5 have shown to be defective.
Hence ˆp = 5
15
= 1
3
or equally 0.333
8 / 26
21. Example testing proportions
p0
• The specifications for the vending machines however establish
that a defective machine is a unit having a response time
greater than 2.50
The defective machines in the sample according to the prescribed
limits are emphasized in bold
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
And the proportion in this case is 3
15
= 0.2, which will
represent the parameter
ª but be aware that the proportion parameter can come from
another source
9 / 26
22. Example testing proportions
hypothesis formulation
The hypotheses are formulated in order to answer a question
like:
• Should we invest in a new product delivery system in the vending
machines to fulfill the technical specifications?
Hence we want to see whether there is enough statistical
evidence to claim that the proportion of defects units found in
the sample data is larger than the population proportion.
And in this case we perform a one-sided test where the null
hypothesis is that p = p0, whereas the alternative hypothesis is
that p p0
10 / 26
23. Example testing proportions
test statistic
The test statistic takes the sample proportion as the
departure from the prescribed limit and the standard error of
the proportion (e.g.)
z =
ˆP−p0
p0(1−p0)
n
= 0.333−0.2
0.2(0.8)
15
= 1.41
ª This outcome has to be confronted to the critical value
z.95 = 1.645 from the cumulative Z probabilities table
• Since the value of this test statistic 1.41 is less than the
critical value (1.645) we are not able to reject H0
There is not sufficient evidence at the 5% significance level
to claim that the sample proportion of defect machines exceeds
the population proportion and hence no new product delivery
system is indeed needed.
11 / 26
24. Example CI for Proportions
Vending machines
• We wish to find the proportion of defective vending machines in our
sample that are the units having a response time greater than the
upper 95% confidence level.
ª Since the UCL equals to 2.32 seconds -cf. lecture week 41 (43)- the
defective units in the sampling are emphasized
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
This means that, out of 15 machines, 5 have shown to be defective.
• As a result, ˆp = 5/15 = 0.333, and the maximum error of the estimate is
1.96 × 0.333(1 − 0.333)/15 = 0.239
0.333 ± 0.239 LCL: 0.094; UCL: 0.572
14 / 26
25. Example Sample size
method 2
• The sample size for proportions within 3% error margin with
95% confidence level is
n =
1.96·
√
ˆp(1−ˆp)
.03
ª In the case of the vending machines the interval estimation
for such confidential level is ˆp = 0.333, which means that
n =
1.96×
√
0.333(0.666)
.03
2
= (30.79)2 948
Thus we need a sample size of at least 948 units to get a 3%
error estimate with the desired confidence level
19 / 26
26. Example F-Test
The response time in seconds for the vending machines has been
measured in a random sample of size 15 with the following
statistics: mean = 2.125, SD = 0.411.
While a random sample of 20 machines from the competition presumably
gives a mean of 2.089 and SD of .5 in the sample distribution
• The question is whether or not our machines are more accurate than
the machines from the competition, i.e. have smaller variability?
ª The parameter of interest is the ratio of two variances and we
perform a F-test of equality of variances.
Since the SD (and hence the variance) is smaller in our machines
than the competition, then σ2
2 corresponds to our machines and σ2
1 to
the competition
18 / 30
27. Stem-and-Leaf display and Box Plot of the data
OWN COMPETITION
0 | 9
1 | 4 1 | 4
1 | 5799 1 | 56899
2 | 0122344 2 | 00112244
2 | 578 2 | 55889
1 1.5 2 2.5 3
comp
own
Response time (seconds)
Gropus
19 / 30
28. Example F-Test
test statistic
• The research hypothesis is H1 : σ2
1/σ2
2 1 or whether the true ratio
of the variances is greater than one
• The test statistic is computed next
F = s2
1/s2
2 = 0.5
0.411
2
= 1.48
• And for an upper one-sided test with a standard significance
level the critical value is F.05, 19,14 = 2.4
ª Because the test statistic (1.48) is not greater than 2.4 then we
fail to reject the null hypothesis
There is not enough statistical evidence at a 5% significance
level to claim that our machines are more accurate than the
machines from the competition
20 / 30
29. Example µ1 − µ2
vending machines
In the F-test performed before we were unable to reject the
null hypothesis that the ratio of the two variances equals 1.
• This implies that to test the difference between two means in
our machines and in the competition we assume equal variances
in the populations
ª Recall that the sample statistics are s1 = .5, x1 = 2.089, n1 = 20;
s2 = .411, x2 = 2.125, n2 = 15
• Now the question is whether the mean µ2 for our machines is
greater than the mean µ1 from the competition?
This is a one-sided (lower-tail) test for the difference
between two means where H1 : (µ1 − µ2) 0.
26 / 30
30. Example test µ1 − µ2
ª For equal variances we compute first the pooled variance estimator
s2
p =
(20−1)(.52
) + (15−1)(.4112
)
20 + 15 − 2
= 7.13
33
= .216
• And the test statistic is calculated using such estimate
t =
(2.089−2.125) − 0
.216× 1
20
+ 1
15
= −.227
ª To determine the critical value we compute the number of degrees
of freedom ν = n1 + n2 − 2 = 33
• The rejection region at the standard alpha value is
t t1−α,ν = t1−.05,33 = −t.05,33 ≈ −t.05,35 = −1.69
• Because the t score is greater than the critical value we are
unable to reject H0.
There is not sufficient evidence at 5% α level to claim that the
average response time in our machines differ from the competition.
27 / 30
31. Example estimation µ1 − µ2
The estimation of the difference between two means with
equal variance parameters is based on the two-sided 95%
confidence intervals
(x1 − x2) ± tα/2, ν · s2
p
1
n1
+ 1
n2
(2.089 − 2.125) ± 2.03 · .216 × 1
20
+ 1
15
−.036 ± .322
(µ1 − µ2) ∈ [−0.36; 0.29]
As the 0 is within the interval, we cannot conclude that
there is a significant difference between the mean response
time in our machines and the competition.
28 / 30
32. p-value
Despite we did not rejected the null hypotheses in the example;
we compute a range of the p-values from the test statistics.
• For µ1 − µ2, we find from the t table for ν ≈ 35 that the
closest critical value to the t score is
1.306 −.227
which means that
.10 P(t −.227)
and hence we can reasonably say that p-value 10%
• For σ2
1/σ2
2 the F score is 1.48 that is smaller than (≈) F.100,20,15
1.92 1.48
So the p-value in this case is more than 10% as well because
.100 P(F 1.48) (or 5% with F.050,20,15 = 2.33 1.48).
29 / 30
33. Example Test of µD
• Recall that the response time in our vending machines was measured
in a random sample with n = 15, and we obtained the response time
from the competition from a random sample with n = 20.
Suppose that the first 15 measurements from the competitors sample
match the observations in our sample, and this is because e.g. the
machines use the same version of the main control board.
ª The paired data with CPU v., and the difference between samples are
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2.37 1.95 1.49 2.05 2.27 2.53 1.87 2.42 1.43 2.24 2.69 2.16 1.88 1.71 2.82
1.55 2.09 2.79 1.43 1.96 2.07 2.35 1.88 2.75 1.93 2.21 2.49 1.80 1.48 2.89
0.82 -0.14 -1.30 0.62 0.31 0.46 -0.48 0.54 -1.32 0.31 0.48 -0.33 0.08 0.23 -0.07
ª The sample statistics for our machines are x2 = 2.125, s2 = .411; and
for the paired competitors are x1 = 2.111, s1 = .468
• Due the experiment design the parameter of interest in this case is
µD or the mean of the population differences for dependent samples.
11 / 27
34. Example Test of µD II
• The question is whether there is a difference between the true mean
for our machines and the mean from the competition?
Because the sample mean in our machines is still higher than the
competition the alternative hypothesis is formulated as H1 : µD 0
• Next we compute the statistics of the paired differences xD = .014,
and sD = .646, and these outcomes are used in the t test statistic
t = 0.014 − 0
0.646/
√
15
= .084
• The rejection region at the standard α value is calculated
t t1−α, ν = t1−.05,14 = −t.05,14 = −1.76
• Because the t score (.084) is greater than the critical value we are
unable to reject the null hypothesis.
There is not sufficient evidence at 5% α level to infer that there
is a difference between the response time in our machines and from
the competition using the same version of the control board.
12 / 27
35. Example Estimation of µD
• The estimation of the mean difference for paired data is based on
a two-sided standard confidence level
xD ± tα/2, ν
sD
√
nD
0.014 ± 2.145 ×
0.646
√
15
ª Hence the maximum error estimate is .358
LCL = −.34 and UCL = .37
We cannot conclude either that there is a significant difference
between the mean response time in our machines and the competition
using the same version of the control board.
13 / 27
36. Example test of p1 − p2
vending machines
By testing the difference between two population proportions
we can compare e.g. the introduction years’ market for
vending machines in a given town belonging to either our
company or to the competition.
Assuming that 12 machines in the samples are placed in this
market; the question is whether or not there enough evidence
to conclude that our machines are more popular than the
competition in this town?
• To be consistent with the coding of the populations the
research hypothesis is that (p1 − p2) 0, and hence the null
hypothesis adopts equal proportions in the testing.
• For the test statistics we require both sample proportions
and also the pooled estimate
ª ˆp1 =
12
20
= .6 ˆp2 =
12
15
= .8 and ˆp =
12 + 12
20 + 15
= .686
23 / 27
37. Example test of p1 − p2
test statistic
• The test statistic is computed next
z =
(.6 − .8)
(.686)(1−.686)× 1
20
+ 1
15
= −1.26
ª And the standard rejection region for a cumulated Z statistic is
z zα = z.05 = −1.645
• Since the outcome of the z test statistic is greater than the
critical value we do not reject H0
At 5% significance level there is not sufficient evidence to
conclude that our vending machines were more popular than the
competition in town during the introductory year.
Here P(Z z) = .1038 constitutes the associated p-value.
24 / 27
38. Example test of p1 − p2 Revisited
• However if we consider only the paired sample data and only 7 machines
from the competition were placed on town during the introductory year,
then for the same hypothesis testing design the test statistics become
z =
(.533 − .8)
(.667)(1−.667)× 1
15
+ 1
15
= −1.89
• Which means that zobs zα and hence we reject the null hypothesis in
favor to the alternative.
There is enough statistical evidence at the standard significance
level to conclude that our machines were more popular in town during
the introductory year than the competition if we consider that they
were using the same main control board (cf. test of µD)
The p-value then must be computed P(Z z) = .0294, and since it is
less than α, it means that the result is statistical significant.
It is also possible to test proportion parameters with an hypothesized
difference value different than zero, so e.g. H0 : (p1 − p2) = .05
25 / 27
39. Example estimation of p1 − p2
To estimate the difference between the two proportions we
use the standard confidence intervals with a 95% level.
(ˆp1 − ˆp2) ± zα/2
ˆp1(1−ˆp1)
n1
+
ˆp2(1−ˆp2)
n2
• For the example with all observations:
(.6 − .8) ± 1.96 ×
.6(1−.6)
20
+
.8(1−.8)
15
−.2 ± .3 [−.5; .1]
• And for the revisited version with paired data:
(.533 − .8) ± 1.96 ×
.533(1−.533)
15
+
.8(1−.8)
15
−.333 ± .32 [−.66; −.01]
26 / 27