Data Collection Methods, Sampling Techniques & Statistics Guide
1. Unit 2 & 3
Data Collection
Sampling
Measurement Concept
Questionnaire Designing-Types
2. Goals
After completing this chapter, you should be
able to:
īŽ Describe key data collection methods
īŽ Know key definitions:
ī¨Population vs. Sample ī¨Primary vs. Secondary data types
ī¨Qualitative vs. Qualitative data ī¨Time Series vs. Cross-Sectional data
īŽ Explain the difference between descriptive and
inferential statistics
īŽ Describe different sampling methods & Experiments
3. īŽ Descriptive statistics
īŽ Collecting, presenting, and describing data
īŽ Inferential statistics
īŽ Drawing conclusions and/or making decisions
concerning a population based only on
sample data
Tools of Business Statistics
4. Descriptive Statistics
īŽ Collect data
īŽ e.g. Survey, Observation,
Experiments
īŽ Present data
īŽ e.g. Charts and graphs
īŽ Characterize data
īŽ e.g. Sample mean =
n
xiīĨ
7. Survey Design Steps
īŽ Define the issue
īŽ what are the purpose and objectives of the survey?
īŽ Define the population of interest
īŽ Formulate survey questions
īŽ make questions clear and unambiguous
īŽ use universally-accepted definitions
īŽ limit the number of questions
8. Survey Design Steps
īŽ Pre-test the survey
īŽ pilot test with a small group of participants
īŽ assess clarity and length
īŽ Determine the sample size and sampling
method
īŽ Select Sample and administer the survey
(continued)
9. Types of Questions
īŽ Closed-end Questions
īŽ Select from a short list of defined choices
Example: Major: __business __liberal arts
__science __other
īŽ Open-end Questions
īŽ Respondents are free to respond with any value, words, or
statement
Example: What did you like best about this course?
īŽ Demographic Questions
īŽ Questions about the respondentsâ personal characteristics
Example: Gender: __Female __ Male
10. īŽ A Population is the set of all items or individuals
of interest
īŽ Examples: All likely voters in the next election
All parts produced today
All sales receipts for November
īŽ A Sample is a subset of the population
īŽ Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Every 100th receipt selected for audit
Populations and Samples
11. Population vs. Sample
a b c d
ef gh i jk l m n
o p q rs t u v w
x y z
Population Sample
b c
g i n
o r u
y
12. Why Sample?
īŽ Less time consuming than a census
īŽ Less costly to administer than a census
īŽ It is possible to obtain statistical results of a
sufficiently high precision based on samples.
14. Statistical Sampling
īŽ Items of the sample are chosen based on
known or calculable probabilities
Probability Samples
Simple
Random
SystematicStratified Cluster
15. Simple Random Samples
īŽ Every individual or item from the population has
an equal chance of being selected
īŽ Selection may be with replacement or without
replacement
īŽ Samples can be obtained from a table of
random numbers or computer random number
generators
16. Stratified Samples
īŽ Population divided into subgroups (called strata)
according to some common characteristic
īŽ Simple random sample selected from each
subgroup
īŽ Samples from subgroups are combined into one
Population
Divided
into 4
strata
Sample
17. īŽ Decide on sample size: n
īŽ Divide frame of N individuals into groups of k
individuals: k=N/n
īŽ Randomly select one individual from the 1st
group
īŽ Select every kth individual thereafter
Systematic Samples
N = 64
n = 8
k = 8
First Group
18. Cluster Samples
īŽ Population is divided into several âclusters,â
each representative of the population
īŽ A simple random sample of clusters is selected
īŽ All items in the selected clusters can be used, or items can be
chosen from a cluster using another probability sampling
technique
Population
divided into
16 clusters. Randomly selected
clusters for sample
21. Data Types
īŽ Time Series Data
īŽ Ordered data values observed over time
īŽ Cross Section Data
īŽ Data values observed at a fixed point in time
22. Data Types
Sales (in $1000âs)
2003 2004 2005 2006
Atlanta 435 460 475 490
Boston 320 345 375 395
Cleveland 405 390 410 395
Denver 260 270 285 280
Time
Series
Data
Cross Section
Data
23. Data Measurement Levels
Ratio/Interval Data
Ordinal Data
Nominal Data
Highest Level
Complete Analysis
Higher Level
Mid-level Analysis
Lowest Level
Basic Analysis
Categorical Codes
ID Numbers
Category Names
Rankings
Ordered Categories
Measurements
25. Experiment Vocabulary
īŽ Experimental units
īŽ Individuals on which the experiment is done
īŽ Subjects
īŽ Experimental units that are human
īŽ Treatment
īŽ Specific experimental condition applied to the units
īŽ Factors
īŽ Explanatory variables in an experiment
īŽ Level
īŽ Specific value of a factor
26. Example of an Experiment
īŽ Does regularly taking aspirin help protect people
against heart attacks?
īŽ Subjects: 21,996 male physicians
īŽ Factors
īŽ Aspirin (2 levels: yes and no)
īŽ Beta carotene (2 levels: yes and no)
īŽ Treatments
īŽ Combination of the 2 factor levels (4 total)
īŽ Conclusion
īŽ Aspirin does reduce heart attacks, but beta carotene has no
effect.
27.
28. Block designs
īŽ Random assignment of units to treatments is carried out separately
within each block (Group of experimental units or subjects that are
known before the experiment to be similar in some way that is
expected to affect the response to the treatments)
29. īŽ Making statements about a population by
examining sample results
Sample statistics Population parameters
(known) Inference (unknown, but can
be estimated from
sample evidence)
Sample
Population
Inferential Statistics
30. Key Definitions
īŽ A population is the entire collection of things
under consideration
īŽ A parameter is a summary measure computed to
describe a characteristic of the population
īŽ A sample is a portion of the population
selected for analysis
īŽ A statistic is a summary measure computed to
describe a characteristic of the sample
31. Statistical Inference Terms
īŽ A parameter is a number that describes the
population.
īŽ Fixed number which we donât know in practice
īŽ A statistic is a number that describes a sample.
īŽ Value is known when we have taken a sample
īŽ It can change from sample to sample
īŽ Often used to estimate an unknown parameter
32. Statistical Significance
īŽ An observed effect (i.e., a statistic) so large that
it would rarely occur by chance is called
statistically significant.
īŽ The difference in the responses (another
statistic) is so large that it is unlikely to happen
just because of chance variation.
33. Inferential Statistics
īŽ Estimation
īŽ e.g.: Estimate the population mean
weight using the sample mean
weight
īŽ Hypothesis Testing
īŽ e.g.: Use sample evidence to test
the claim that the population mean
weight is 120 pounds
Drawing conclusions and/or making decisions
concerning a population based on sample results.
34. Sampling variability
īŽ Sampling variability
īŽ Value of a statistic varies in repeated random
sampling
īŽ If the variation when we take repeat samples from
the same population is too great, we canât trust the
results of any one sample.
35. Chapter Summary
īŽ Reviewed key data collection methods
īŽ Introduced key definitions:
ī¨Population vs. Sample ī¨Primary vs. Secondary data types
ī¨Qualitative vs. Qualitative data ī¨Time Series vs. Cross-Sectional data
īŽ Examined descriptive vs. inferential statistics
īŽ Described different sampling techniques
īŽ Reviewed data types and measurement levels