Upcoming SlideShare
×

# 导论1

701 views

Published on

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Be the first to comment

• Be the first to like this

Views
Total views
701
On SlideShare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
2
0
Likes
0
Embeds 0
No embeds

No notes for slide
• Difference between target population and sample population may arise and is problematic, under certain circumstances. Selection bias in trade analysis for example. Example of the poll in the US. If having a phone or not is a systematic process or a random process, then, the implications are different.
• Random NE haphazard Typically don’t enumerate samples, then inclusion probs Populations usually too big More likely to assign the elements selection probs and proceed from there
• Start on most difficult part of this class Essential to understand material in Ch2, especially early sections/this week’s lecs Possible to by chance select an unrep. sample using a stat design (e.g., SRS of class)
• Start on most difficult part of this class Essential to understand material in Ch2, especially early sections/this week’s lecs Possible to by chance select an unrep. sample using a stat design (e.g., SRS of class)
• SAMPLING FRAME EXAMPLE Target population = Ames households OU = household There is no list of households, but we can list out telephone numbers Frame = list of all possible (land line) telephone numbers in the Ames area SU = telephone number Frame includes non-working and business numbers that do not correspond to households Frame excludes households who have no land line phone
• What disinguishes SS from the rest of your statistics classes We usually start with a regression model or AOV model that assumes errors are normal (write these models and the error assumptions on the board) OLD NOTES Finite Infinite Land in US Yield of corn variety (inf. # conditions) People in CA Impact of chemical on pests Strata blocks Clusters split-plots Means Percentiles Randomization Model based, e.g., resids ~ normal (Model assisted)
• Sampling counties Sampling states, then counties
• SAMPLING FRAME EXAMPLE Target population = Ames households OU = household There is no list of households, but we can list out telephone numbers Frame = list of all possible (land line) telephone numbers in the Ames area SU = telephone number Frame includes non-working and business numbers that do not correspond to households Frame excludes households who have no land line phone
• ### 导论1

1. 1. 抽样技术与方法 （ 2 学分 ， 明德主楼 0307 ） <ul><li>蒋妍 </li></ul><ul><li>[email_address] </li></ul><ul><li>明德主楼 1019 </li></ul>
2. 2. 课前交流 <ul><li>是否修过抽样课程？ 1. 有 2. 没有 </li></ul><ul><li>是否修过调查课程？ 1. 有 2. 没有 </li></ul><ul><li>实践经验 1. 有 2. 没有 </li></ul><ul><li>课程内容见教学进度表 </li></ul>
3. 3. 1.1 什么是抽样调查？ <ul><ul><li>a process for collecting data on a sample of observations which are selected from the population of interest. </li></ul></ul><ul><ul><ul><li>Population ： complete set of elements (finite) </li></ul></ul></ul><ul><ul><ul><li>Sample ： subset </li></ul></ul></ul><ul><ul><ul><li>survey sampling: a process for selecting sample from a population </li></ul></ul></ul>
4. 4. 比较数据来源 <ul><li>Experiment design </li></ul><ul><li>Observational study </li></ul><ul><li>Survey sampling </li></ul>
5. 5. 1.2 Sample Survey Definitions <ul><li>element : is an object on which a measurement is taken: (e.g individuals ). </li></ul><ul><li>population : is a collection of elements about which we wish to make an inference. </li></ul><ul><li>frame : is the list of sampling units (e.g. Council’s list of households) -this might no be equal to the population because might not be totally updated. </li></ul><ul><li>sampling units : are non-overlapping collections of elements from the population that cover the entire population: (e.g. households). </li></ul><ul><li>sample : is a collection of sample units drawn from the frame. </li></ul><ul><li>Variable </li></ul><ul><li>Estimates </li></ul>
6. 6. Comments <ul><li>调查复杂性各异（抽样，测量，总体等） </li></ul><ul><li>对个体进行测量，但是目的是估计总体参数 </li></ul><ul><li>目标总体参数成百上千 </li></ul><ul><li>普查并不意味着没有误差 </li></ul><ul><li>可能存在测量误差，无回答误差 </li></ul><ul><li>行政管理数据可以作为辅助变量用于改进估计 </li></ul>
7. 7. Example labor force survey ？ <ul><li>element : </li></ul><ul><li>population : </li></ul><ul><li>frame : </li></ul><ul><li>sampling units : </li></ul><ul><li>Domain: </li></ul><ul><li>Variable: </li></ul><ul><li>Parameter: </li></ul><ul><li>Estimates </li></ul>
8. 8. 1.3 Types of Surveys & Sampling Methods non-probabilistic Quota sample : elements are chosen in the field to meet predetermined number of cases in different categories (e.g. 40% men, 60% women) Expert sample : elements chosen on the basis of informed opinion that they are representative probabilistic Inferences about the underlying population cannot be made Probability of obtaining each sample can be computed, confidence intervals can be developed, bounds on sampling errors, etc. Simple Random Sampling Stratified Random Sampling Cluster Sampling Systematic Sampling
9. 9. Probability sampling <ul><li>Probability sampling is used when you need to obtain scientifically defensible and credible results </li></ul><ul><li>Other methods are subject to a large amount of selection bias (more later) </li></ul><ul><li>The method used to select the sample can have a large impact on the scientific credibility of the survey </li></ul>
10. 10. The nature of a probability sample <ul><li>A probability sample is the outcome of a random selection process </li></ul><ul><ul><li>Outcome = set of units in the sample </li></ul></ul><ul><ul><li>Random does not mean the process is by chance </li></ul></ul><ul><li>Specific rules define the sampling process </li></ul><ul><ul><li>Rules are dictated by the sample design </li></ul></ul><ul><ul><li>Rules specify the probability that a unit is included in the sample </li></ul></ul><ul><li>Outcome of the probability sampling process, the entire sample that is selected, is the random event associated with the sampling distribution (we’ll return to this key idea) </li></ul>
11. 11. Probability sample <ul><li>A probability sample is a sample in which each unit in the population has a known, nonzero probability of being included in the sample </li></ul>
12. 12. Probability sample <ul><li>Nonzero (positive) probability </li></ul><ul><ul><li>Every unit has a chance of being included in the sample </li></ul></ul><ul><ul><li>No portion of the population is omitted </li></ul></ul><ul><ul><li>Later, we’ll consider how the “sampling frame” affects these statements </li></ul></ul>
13. 13. Probability sample <ul><li>Known probability </li></ul><ul><ul><li>We can quantify the probability of a unit of being included in the sample (unlike convenience sample) </li></ul></ul><ul><li>Inclusion probability = probability a unit is included in the sample </li></ul><ul><ul><li>Specified by the sample design </li></ul></ul><ul><ul><li>Part of a sample weight (survey weight) </li></ul></ul><ul><ul><li>Critical piece of information in generating valid estimates from the survey data </li></ul></ul>
14. 14. Choosing a sample design <ul><li>Statistical factors </li></ul><ul><ul><li>Most precise estimate OR </li></ul></ul><ul><ul><li>Likelihood of generating a representative sample </li></ul></ul><ul><ul><ul><li>EX: a stratified design uses strata to legitimately exclude some samples that are unlikely to be representative </li></ul></ul></ul><ul><li>Practical factors </li></ul><ul><ul><li>A list (or sampling frame) may not exist for elements of the population </li></ul></ul><ul><ul><ul><li>EX: cluster design is needed when we have list of households, not adults </li></ul></ul></ul><ul><ul><li>Need to have different data collection methods for sectors of the population </li></ul></ul><ul><ul><ul><li>EX: stratified design, with diff. method for each stratum </li></ul></ul></ul>
15. 15. Selecting a probability sample <ul><li>Use a probability sample design to select “units” from a “list” </li></ul><ul><ul><li>“ Unit” = sampling unit = SU </li></ul></ul><ul><ul><li>“ List” = sampling frame = frame </li></ul></ul><ul><li>A probability sample is the collection of SUs selected from the frame using a probability sampling design </li></ul><ul><ul><li>This is the random outcome of the sampling process discussed earlier </li></ul></ul>
16. 16. 1.4 Sampling frame <ul><li>A sampling frame is a list of sampling units used to select a sample </li></ul><ul><li>Ideally, the frame contains the entire target population (and nothing else) </li></ul>
17. 17. Sampling frame example <ul><li>What is the average income for the 27,380 students enrolled at RUC? </li></ul><ul><ul><li>Frame = </li></ul></ul><ul><ul><li>Sampling unit = SU = </li></ul></ul>
18. 18. Sampling frame variations <ul><li>Sometimes the frame is not a list of elements, but a list of “clusters” or groups of elements </li></ul><ul><li>An area frame is a geographic area divided up into parcels or tracts of land </li></ul><ul><ul><li>Census Bureau has divided the US into tracts, block groups, and blocks </li></ul></ul><ul><ul><li>Blocks are clusters of households </li></ul></ul><ul><ul><li>Block groups are clusters of blocks </li></ul></ul><ul><ul><li>Tracts are clusters of block groups (and blocks) </li></ul></ul>
19. 19. Sampling frame example <ul><li>How many total acres of county parks are there in the US’s 3078 counties? </li></ul><ul><li>Alternatives? </li></ul>
20. 20. Sampling frame problems <ul><li>In practice, sampling frames that cover the entire population can be difficult to construct </li></ul><ul><ul><li>May have some SUs in the sampling frame that do not belong to the target population ( ineligible for survey) ， overcoverage </li></ul></ul><ul><ul><li>Some elements in the population may not be included in the frame (eligible, but not given an opportunity to be sampled)undercoverage </li></ul></ul><ul><ul><li>Duplicate listing </li></ul></ul>
21. 21. 1.5 比较 <ul><li>standard assumptions ： independently ， the same probability ， population distribution ， infinite ， sampling is unknown </li></ul><ul><li>—— introduce a complexity to the analysis, which must be accounted for in order to produce unbiased estimates and their associated levels of precision. </li></ul><ul><li>—— Adjustments to sampling weights (the inverse of the probability of selection) to account for nonresponse, as well as other weighting adjustments (such as poststratification to known population totals), further exacerbate the disparity in the weights among sample members. </li></ul>
22. 22. 1.6 Survey process <ul><li>SURVEY DESIGN </li></ul><ul><li>Define objectives, target population, & desired estimates/analyses </li></ul><ul><li>Choose sampling design </li></ul><ul><li>Choose data collection method </li></ul><ul><li>Choose analysis approach </li></ul><ul><li>PREPARATION </li></ul><ul><li>Create sampling frame </li></ul><ul><li>Select sample </li></ul><ul><li>Develop questions or measurements </li></ul><ul><li>Construct questionnaire or other data collection form </li></ul><ul><li>Pre-test & revise questionnaire/form </li></ul><ul><li>[Train interviewers, data collectors] </li></ul><ul><li>COLLECT & PREPARE DATA </li></ul><ul><li>Collect data (interview, observe, self-administer) </li></ul><ul><li>Edit and code data </li></ul><ul><li>Enter data (if paper) </li></ul><ul><li>Edit data file </li></ul><ul><li>DATA ANALYSIS </li></ul><ul><li>Exploratory data analysis </li></ul><ul><li>Calculate estimates of population characteristics </li></ul><ul><li>Make inferences about the population </li></ul>
23. 23. Survey design often requires trade-offs to be made between different sources of error . <ul><li>Non-sampling error: </li></ul><ul><ul><li>Nonobservation ： Non-responses, undercoverage </li></ul></ul><ul><ul><li>In observation ： measurement error ， processing </li></ul></ul><ul><li>Sampling error: </li></ul><ul><ul><li>Occurs as a sample is surveyed and not the entire population </li></ul></ul><ul><li>It cannot be avoided, but should be minimised </li></ul>
24. 24. 1.7 Total survey design 案例： 调查北京市老年人的住房条件
25. 25. Total survey design <ul><li>调查目标的确定； </li></ul><ul><li>把与主题相关的问题转换为调查问题 </li></ul><ul><li>目标总体的确定， </li></ul><ul><li>已知变量，研究变量，待估参数 </li></ul><ul><li>抽样框选择和建立 </li></ul><ul><li>可得资源明细 </li></ul><ul><li>进度表，允许估计误差 </li></ul><ul><li>数据收集方法 </li></ul><ul><li>抽样设计，样本选择机制，样本量 </li></ul><ul><li>数据整理方法 </li></ul><ul><li>估计量公式，方差估计量 </li></ul><ul><li>人员培训，现场工作的组织 </li></ul><ul><li>资源的分配 </li></ul><ul><li>质量控制和评估 </li></ul>
26. 26. 抽样方案设计 <ul><li>第一、确定抽样调查的目的、任务和要求； </li></ul><ul><li>第二、确定调查对象的范围和抽样单位； </li></ul><ul><li>第三、确定抽取样本方法； </li></ul><ul><li>第四、确定必要的样本数； </li></ul><ul><li>第五、对主要抽样指针的精度提出要求； </li></ul><ul><li>第六、确定总体目标量的估算方法； </li></ul><ul><li>第七、制订实施总体方案的办法和步骤。 </li></ul>
27. 27. 1.8 Example:your sample design? <ul><li>N=4, n=2 </li></ul><ul><li>Lable farm acreage corn acreage </li></ul><ul><li>1 4 1 </li></ul><ul><li>2 6 3 </li></ul><ul><li>3 6 5 </li></ul><ul><li>4 20 15 </li></ul><ul><li>Total 36 24 </li></ul>