Presentation delivered by Brian Peck of AP42 at the Bishop Ranch B2B Seminar on July 17, 2012. For a copy of the slides, please contact info@ap42.com or visit http://www.ap42.com/
Presentation delivered by Brian Peck of AP42 at the Bishop Ranch B2B Seminar on July 17, 2012. For a copy of the slides, please contact info@ap42.com or visit http://www.ap42.com/
Statistics is the science of dealing with numbers.
It is used for collection, summarization, presentation and analysis of data.
Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).
Statistics is the science of dealing with numbers.
It is used for collection, summarization, presentation and analysis of data.
Statistics provides a way of organizing data to get information on a wider and more formal (objective) basis than relying on personal experience (subjective).
Quantitative MethodsChoosing a Sample.pptxChoosing a Samp.docxamrit47
Quantitative Methods/Choosing a Sample.pptx
Choosing a Sample
Leedy, P., and Ormrod, J., Practical Research. (8th ed.)
Fink, A. 1995. From the Survey Toolkit published by Sage.
Choosing a Sample to Survey
Population – the group to be covered by your research plan
Sample – a subset of your population
Generalize results – only if the sample is representative of the population
Probability sampling
Non-probability sampling
2
Probability sampling –
Random Sampling
Each member of the population has an equal chance of being selected.
3
Probability sampling –
Stratified Random Sampling
Take equal samples from each group (layers, strata).
4
Probability sampling –
Proportional Stratified Sampling
Take equal proportions of samples from each group (layers, strata).
5
Probability sampling –
Cluster Sampling
Take equal proportions of samples from certain regions only.
6
Non-probability Sampling –
Convenience Sampling
No attempt to have a representative sample
Examples:
Survey people in your neighborhood.
Customer satisfaction cards in a restaurant.
Survey all companies who have had projects done by NWMOC.
Survey all Human Resources Managers at Stout Career Fair.
7
Sample size?
Entire population, if N<100
20-50% of population, if 100 < N < 2000
About 400, if N > 2000
Affects the time and cost of the study, the precision of statistical results
Be sure to consider the response rate
8
Sampling Bias
Bias – an influence, condition, or set of conditions which distort the data
Sampling bias – is the sample random?
Examples:
Political polls by phone interview
A mail survey of alumni satisfaction, with 30% response rate
9
10
Dilbert
Click to edit Master text styles
Second level
Third level
Fourth level
Fifth level
Quantitative Methods/Confidence Intervals.pptx
Confidence Intervals
Some adapted from http://stattrek.com/estimation/confidence-interval.aspx and http://www.stat.yale.edu/Courses/1997-98/101/confint.htm
1
Confidence Interval
Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample estimate of a population parameter.
It gives an estimated range of values which is likely to include an unknown population parameter,
the estimated range is calculated from a set of sample data.
Confidence Interval
Gives the probability that the interval produced by the sample method includes the true value of the parameter
You must assume a normal distribution
Confidence Interval Selection
Common choices for the confidence level are 0.90, 0.95, and 0.99. These levels correspond to percentages of the area of the normal density curve. For example, a 95% confidence interval covers 95% of the normal curve --
Normal Distribution
Confidence Intervals
Supp ...
Confidence Interval ModuleOne of the key concepts of statist.docxmaxinesmith73660
Confidence Interval Module
One of the key concepts of statistics enabling statisticians to make incredibly accurate predictions is called the Central Limit Theorem. The Central Limit Theorem is defined in this way:
· For samples of a sufficiently large size, the real distribution of means is almost always approximately normal.
· The distribution of means gets closer and closer to normal as the sample size gets larger and larger, regardless of what the original variable looks like (positively or negatively skewed).
· In other words, the original variable does not have to be normally distributed.
· This is because, if we as eccentric researchers, drew an almost infinite number of random samples from a single population (such as the student body of NMSU), the means calculated from the many samples of that population will be normally distributed and the mean calculated from all of those samples would be a very close approximation to the true population mean. It is this very characteristic that makes it possible for us, using sound probability based sampling techniques, to make highly accurate statements about characteristics of a population based upon the statistics calculated on a sample drawn from that population.
· Furthermore, we can calculate a statistic known as the standard error of the mean (abbreviated s.e.) that describes the variability of the distribution of all possible sample means in the same way that we used the standard deviation to describe the variability of a single sample. We will use the standard error of the mean (s.e.) to calculate the statistic that is the topic of this module, the confidence interval.
The formula that we use to calculate the standard error of the mean is:
s.e. = s / √N – 1
where s = the standard deviation calculated from the sample; and
N = the sample size.
So the formula tells us that the standard error of the mean is equal to the
standard deviation divided by the square root of the sample size minus 1.
This is the preferred formula for practicing professionals as it accounts for errors that may be a function of the particular sample we have selected.
THE CONFIDENCE INTERVAL (CI)
The formula for the CI is a function of the sample size (N).
For samples sizes ≥ 100, the formula for the CI is:
CI = (the sample mean) + & - Z(s.e.).
Let’s look at an example to see how this formula works.
* Please use a pdf doc. “how to solve the problem”, I have provided for you under the “notes” link.
Example 1
Suppose that we conducted interviews with 140 randomly selected individuals (N = 140) in a large metropolitan area. We assured these individuals that their answers would remain confidential, and we asked them about their law-breaking behavior. Among other questions the individuals were asked to self-report the number of times per month they exceeded the speed limit. One of the objectives of the study was to estimate (make an inference about) the average nu.
Assessing Model Performance - Beginner's GuideMegan Verbakel
Introduction on how to assess the performance of a classifier model. Covers theories (bias-variance trade-off, over/under-fitting), data preparation (train/test split, cross-validation), common performance plots (e.g. ROC curve and confusion matrix), and common metrics (e.g. accuracy, precision, recall, f1-score).
As mentioned earlier, the mid-term will have conceptual and quanti.docxfredharris32
As mentioned earlier, the mid-term will have conceptual and quantitative multiple-choice questions. You need to read all 4 chapters and you need to be able to solve problems in all 4 chapters in order to do well in this test.
The following are for review and learning purposes only. I am not indicating that identical or similar problems will be in the test. As I have indicated in the class syllabus, all the exams in this course will have multiple-choice questions and problems.
Suggestion: treat this review set as you would an actual test. Sit down with your one page of notes and your calculator, and give it a try. That way you will know what areas you still need to study.
ADMN 210
Answers to Review for Midterm #1
1) Classify each of the following as nominal, ordinal, interval, or ratio data.
a. The time required to produce each tire on an assembly line – ratio since it is numeric with a valid 0 point meaning “lack of”
b. The number of quarts of milk a family drinks in a month - ratio since it is numeric with a valid 0 point meaning “lack of”
c. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor – ordinal since it is ranking data only
d. The telephone area code of clients in the United States – nominal since it is a label
e. The age of each of your employees - ratio since it is numeric with a valid 0 point meaning “lack of”
f. The dollar sales at the local pizza house each month - ratio since it is numeric with a valid 0 point meaning “lack of”
g. An employee’s identification number – nominal since it is a label
h. The response time of an emergency unit - ratio since it is numeric with a valid 0 point meaning “lack of”
2) True or False: The highest level of data measurement is the ratio-level measurement.
True (you can do the most powerful analysis with this kind of data)
3) True or False: Interval- and ratio-level data are also referred to as categorical data.
False (Interval and ratio level data are numeric and therefore quantitative, NOT qualitative….Nominal is qualitative)
4) A small portion or a subset of the population on which data is collected for conducting statistical analysis is called __________.
A sample! A population is the total group, a census IS the population, and a data set can be either a sample or a population.
5) One of the advantages for taking a sample instead of conducting a census is this:
a sample is more accurate than census
a sample is difficult to take
a sample cannot be trusted
a sample can save money when data collection process is destructive
6) Selection of the winning numbers is a lottery is an example of __________.
convenience sampling
random sampling
nonrandom sampling
regulatory sampling
7) A type of random sampling in which the population is divided into non-overlapping subpopulations is called __________.
stratified random sampling
cluster sampling
systematic random sampling
regulatory sampling
8) A ...
Findings, Conclusions, & Recommendations
Report Writing
Findings
Conclusions
Recommendations
Findings
Conclusions
Recommendations
Findings
Data
Conclusions
What the data means
Recommendations
What should we do?
Types of Reports
Proposal
Feasibility
Analysis
Annual/Quarterly
Sales/Revenue
Investment
Marketing
Research
Consumer
Research
Types of Reports
Proposal
Feasibility
Analysis
Annual/Quarterly
Sales/Revenue
Investment
Marketing
Research
Consumer
Research
Report Sections
1. Title page
2. Table of contents
3. Executive summary
4. Body sections
a. Purpose
b. Scope
c. Factors
d. Conclusions
5. References (endnotes)
Report Sections
1. Title page
2. Table of contents
3. Executive summary
4. Body sections
a. Purpose
b. Scope
c. Factors
d. Conclusions
5. References (endnotes)
New Page
New Page
New Page
New Page
New Page
Title Page
1. Title
2. Author
3. Date (use due date)
4. Audience*
5. No page number
Findings
Conclusions
Recommendations
65% of employees use Facebook
during company time.
Employees are wasting time at
work.
We should establish a social
media policy.
Findings
Conclusions
Recommendations
SHA applications are down 15%.
Exploring Report Myths
Myth Truth
Reports are entirely different
from memos and letters.
Reports may be formatted as
memos or letters.
Exploring Report Myths
Myth Truth
Reports are strictly “objective”
presentations of factual data.
Report writers use their best
judgement to select data to
provide in reports.
Exploring Report Myths
Myth Truth
Reports are mere collections
of data: they should not
incorporate the writer’s
opinion.
Reports should be adapted to
the needs of the readers.
-If readers merely need numerical or
factual data, then mere numerical or
factual data should be sufficient.
Exploring Report Myths
Myth Truth
Reports are mere collections
of data: they should not
incorporate the writer’s
opinion.
Reports should be adapted to
the needs of the readers.
-If readers rely on the report writer to
interpret the data, then the report
should incorporate the writer’s best
attempt to draw conclusions and, if
appropriate, recommendations.
Exploring Report Myths
Myth Truth
A report should be structured
as a sequence of steps in
which the writer engaged in
the “discovery process” to
collect the data.
A report should be structured
according to the needs of the
readers: to learn conclusions
or to act on recommendations.
Google Report
Hilton Annual Report
Hilton Annual Report
Aramark
Report Examples
https://storage.googleapis.com/gfw-touched-accounts-pdfs/google-cloud-security-and-compliance-whitepaper.pdf
http://ir.hilton.com/~/media/Files/H/Hilton-Worldwide-IR-V3/annual-report/Hilton_2013_AR.pdf
http://ir.hilton.com/~/media/Files/H/Hilton-Worldwide-IR-V3/annual-report/1948-Annual-Report.pdf
http://www.elon.edu/docs/e-web/bft/sustainability/ARAMARK%20Trayless%20Dining%20July ...
Module 7 Interval estimatorsMaster for Business Statistics.docxgilpinleeanna
Module 7
Interval estimators
Master for Business Statistics
Dane McGuckian
Topics
7.1 Interval Estimate of the Population Mean with a Known Population Standard Deviation
7.2 Sample Size Requirements for Estimating the Population Mean
7.3 Interval Estimate of the Population Mean with an Unknown Population Standard Deviation
7.4 Interval Estimate of the Population Proportion
7.5 Sample Size Requirements for Estimating the Population Proportion
7.1
Interval Estimate of the Population Mean with a Known Population Standard Deviation
Interval Estimators
Quantities like the sample mean and the sample standard deviation are called point estimators because they are single values derived from sample data that are used to estimate the value of an unknown population parameter.
The point estimators used in Statistics have some very desirable traits; however, they do not come with a measure of certainty.
In other words, there is no way to determine how close the population parameter is to a value of our point estimate. For this reason, the interval estimator was developed.
An interval estimator is a range of values derived from sample data that has a certain probability of containing the population parameter.
This probability is usually referred to as confidence, and it is the main advantage that interval estimators have over point estimators.
The confidence level for a confidence interval tells us the likelihood that a given interval will contain the target parameter we are trying to estimate.
The Meaning of “Confidence Level”
Interval estimates come with a level of confidence.
The level of confidence is specified by its confidence coefficient – it is the probability (relative frequency) that an interval estimator will enclose the target parameter when the estimator is used repeatedly a very large number of times.
The most common confidence levels are 99%, 98%, 95%, and 90%.
Example: A manufacturer takes a random sample of 40 computer chips from its production line to construct a 95% confidence interval to estimate the true average lifetime of the chip. If the manufacturer formed confidence intervals for every possible sample of 40 chips, 95% of those intervals would contain the population average.
The Meaning of “Confidence Level”
In the previous example, it is important to note that once the manufacturer has constructed a 95% confidence interval, it is no longer acceptable to state that there is a 95% chance that the interval contains the true average lifetime of the computer chip.
Prior to constructing the interval, there was a 95% chance that the random interval limits would contain the true average, but once the process of collecting the sample and constructing the interval is complete, the resulting interval either does or does not contain the true average.
Thus there is a probability of 1 or 0 that the true average is contained within the interval, not a 0.95 probability.
The interval limits are random variables because the ...
6. Sampling used to measure Parent Population ******* >>>^^^^^^^ ********^^^^ ^^*******>>>> ********^^^^^^ ^^^^^^^^^^^ ^^^>>>>>>> ******^^ * Parent Population : Group under study >***>^^^^**^ Sample is taken from parent population. Measurements are taken on the sample. If sampling was done correctly. measurements are representative of the Parent Population
17. Confidence Intervals 1.96 std deviations S ( ) = 95% of all observations 14,000 20,000 Est. Mean = 17,000 95% chance the true mean falls within here
18.
19. Confidence Intervals 1 S ( ) std deviations = 68.2% of all observations 15,500 18,500 17,000 68% chance the true mean falls within here
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31. Proof that CAGR is right We can check our answer by adding 11.4186% growth every year and you’ll arrive at the correct final year revenue 3.04B, exactly right!
Parameter- a measurement of of the Parent Population.(e.g. average income of all the Volvo in the world)
NONProbabuility Samples:Convenience Samples Jugement Samples --sample elements are hand-picked which is what we doQuota Samples-
Often we don’t know how the “real world” population is distributed, so we have to estimate ().There is no problem for two reasons: 1. variation usually changes for most variables of interest in marketing So, if the study is a repeat we can use old values we found for .2. We can calculate smaple variance Need to add blurb on “standard error of estimate”
Often we don’t know how the “real world” population is distributed, so we have to estimate ().There is no problem for two reasons: 1. variation usually changes for most variables of interest in marketing So, if the study is a repeat we can use old values we found for .2. We can calculate smaple variance Need to add blurb on “standard error of estimate”
Often we don’t know how the “real world” population is distributed, so we have to estimate ().There is no problem for two reasons: 1. variation usually changes for most variables of interest in marketing So, if the study is a repeat we can use old values we found for .2. We can calculate smaple variance Need to add blurb on “standard error of estimate”
Often we don’t know how the “real world” population is distributed, so we have to estimate ().There is no problem for two reasons: 1. variation usually changes for most variables of interest in marketing So, if the study is a repeat we can use old values we found for .2. We can calculate smaple variance Need to add blurb on “standard error of estimate”
Often we don’t know how the “real world” population is distributed, so we have to estimate ().There is no problem for two reasons: 1. variation usually changes for most variables of interest in marketing So, if the study is a repeat we can use old values we found for .2. We can calculate smaple variance Need to add blurb on “standard error of estimate”
Often we don’t know how the “real world” population is distributed, so we have to estimate ().There is no problem for two reasons: 1. variation usually changes for most variables of interest in marketing So, if the study is a repeat we can use old values we found for .2. We can calculate smaple variance Need to add blurb on “standard error of estimate”
Revenues above are from Wholesale Carrier B&C Forecast.From Frost & sullivan report 2006