Upcoming SlideShare
×

# B.lect1

384 views

Published on

• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

### B.lect1

1. 1. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Lecture 1Chapter 1: Basic Statistical Concepts M. George Akritas M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
2. 2. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsWhy Statistics?Populations, Samples, and CensusSome Sampling Concepts Representative Samples Simple Random and Stratiﬁed Sampling Sampling With and Without Replacement Non-representative Sampling M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
3. 3. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsExample (Examples of Engineering/Scientiﬁc Studies) Comparing the compressive strength of two or more cement mixtures. Comparing the eﬀectiveness of three cleaning products in removing four diﬀerent types of stains. Predicting failure time on the basis of stress applied. Assessing the eﬀectiveness of a new traﬃc regulatory measure in reducing the weekly rate of accidents. Testing a manufacturer’s claim regarding a product’s quality. Studying the relation between salary increases and employee productivity in a large corporation.What makes these studies challenging (and thus to requireStatistics) is the inherent or intrinsic variability: M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
4. 4. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The compressive strength of diﬀerent preparations of the same cement mixture will diﬀer. The ﬁgure in http://sites. stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf shows 32 compressive strength measurements, in MPa (MegaPascal units), of test cylinders 6 in. in diameter by 12 in. high, using water/cement ratio of 0.4, measured on the 28th day after they are made. Under the same stress, two beams will fail at diﬀerent times. The proportion of defective items of a certain product will diﬀer from batch to batch.Intrinsic variability renders the objectives of the case studies, asstated, ambiguous. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
5. 5. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsThe objectives of the case studies can be made precise if stated interms of averages or means. Comparing the average hardness of two diﬀerent cement mixtures. Predicting the average failure time on the basis of stress applied. Estimation of the average coeﬃcient of thermal expansion. Estimation of the average proportion of defective items.Moreover, because of variability, the words ”average” and ”mean”have a technical meaning which can be made clear through theconcepts of population and sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
6. 6. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsDeﬁnitionPopulation is a well-deﬁned collection of objects or subjects, ofrelevance to a particular study, which are exposed to the sametreatment or method. Population members are called units.Example (Examples of populations:) All water samples that can be taken from a lake. All items of a certain manufactured product. All students enrolled in Big Ten universities during the 2007-08 academic year. Two types of cleaning products. (Each type corresponds to a population.) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
7. 7. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsThe objective of a study is to investigate certain characteristic(s)of the units of the population(s) of interest.Example (Examples of characteristics:) All water samples taken from a lake. Characteristics: Mercury concentration; Concentration of other pollutants. All items of a certain manufactured product (that have, or will be produced). Characteristic: Proportion of defective items. All students enrolled in Big Ten universities during the 2007-08 academic year. Characteristics: Favorite type of music; Political aﬃliation. Two types of cleaning products. Characteristic: cleaning eﬀectiveness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
8. 8. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsIn the example where diﬀerent (but of the same type) beamsare exposed to diﬀerent stress levels: the characteristic of interest is time to failure of a beam under each stress level, and each stress level used in the study corresponds to a separate population which consists of all beams that will be exposed to that stress level.This emphasizes that populations are deﬁned not only by theunits they consist of, but also by the method or treatmentapplied to these units. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
9. 9. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsFull (i.e. population-level) understanding of a characteristicrequires the examination of all population units, i.e. a census. For example, full understanding of the relation between salary and productivity of a corporation’s employees requires obtaining these two characteristics from all employees.However, taking a census can be time consuming and expensive: The 2000 U.S. Census costed \$6.5 billion, while the 2010 Census costed \$13 billion. Moreover, census is not feasible if the population is hypothetical or conceptual, i.e. not all members are available for examination.Because of the above, we typically settle for examining allunits in a sample, which is a subset of the population. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
10. 10. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsDue to the intrinsic variability, the sample properties/attributes ofthe characteristic of interest will diﬀer from those of thepopulation. For example The average mercury concentration in 25 water samples will diﬀer from the overall mercury concentration in the lake. The proportion in a sample of 100 PSU students who favor the use of solar energy will diﬀer from the corresponding proportion of all PSU students. The relation between bear’s chest girth and weight in a sample of 10 bears, will diﬀer from the corresponding relation in the entire population of 50 bears in a forested region. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
11. 11. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsThe GOOD NEWS is that, if the sample is suitably drawn, thensample properties approximate the population properties. 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest GirthFigure: Population and sample relationships 1between Basic Statistical Concepts M. George Akritas Lecture Chapter 1: chest girth and
12. 12. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsSampling Variability Samples properties of the characteristic of interest also diﬀer from sample to sample. For example: 1. The number of US citizens, in a sample of size 20, who favor expanding solar energy, will (most likely) be diﬀerent from the corresponding number in a diﬀerent sample of 20 US citizens. 2. The average mercury concentration in two sets of 25 water samples drawn from a lake will diﬀer. The term sampling variability is used to describe such diﬀerences in the characteristic of interest from sample to sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
13. 13. Outline Why Statistics?Populations, Samples, and Census Some Sampling Concepts 400 300Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Illustration of Sampling Variability. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
14. 14. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsPopulation level properties/attributes of characteristic(s) ofinterest are called (population) parameters. Examples of parameters include averages, proportions, percentiles, and correlation coeﬃcient.The corresponding sample properties/attributes ofcharacteristics are called statistics. The term sports statisticscomes from this terminology.Sample statistics approximate the corresponding populationparameters but are not equal to them.Statistical inference deals with the uncertainty issues whicharise in approximating parameters by statistics.The tools of statistical inference include point and intervalestimation, hypothesis testing and prediction. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
15. 15. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsExample (Examples of Estimation, Hypothesis Testing andPrediction) Estimation (point and interval) would be used in the task of estimating the coeﬃcient of thermal expansion of a metal, or the air pollution level. Hypothesis testing would be used for deciding whether to take corrective action to bring the air pollution level down, or whether a manufacturer’s claim regarding the quality of a product is false. Prediction arises in cases where we would like to predict the failure time on the basis of the stress applied, or the age of a tree on the basis of its trunk diameter. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
16. 16. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingFor valid statistical inference the sample must berepresentative of the population. For example, a sample ofPSU basketball players is not representative of PSU students,if the characteristic of interest is height.Typically it is hard to tell whether a sample is representativeof the population. So, we deﬁne a sample to be representativeif . . . (cyclical deﬁnition!!) it allows for valid statistical inference.The only guarantee for that comes from the method used toselect the sample (sampling method).The good news is that there are several sampling methodsguarantee representativeness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
17. 17. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingDeﬁnitionA sample of size n is a simple random sample if the selectionprocess ensures that every sample of size n has equal chance ofbeing selected. To select a s.r.s. of size 10 from a population of 100 units, any of the 100!/(10!90!) samples of size 10 must be equally likely. In simple random sampling every member of the population has the same chance of being included in the sample. The reverse, however, is not true.ExampleTo select a sample of 2 students from a population of 20 male and20 female students, one selects at random one male and onefemale students. Is this a s.r.s.? (Does every student have thesame chance of being included in the sample?) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
18. 18. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingAnother sampling method for obtaining a representative sample iscalled stratiﬁed sampling.DeﬁnitionA stratiﬁed sample consists of simple random samples from eachof a number of groups (which are non-overlapping and make upthe entire population) called strata. Examples of strata include: ethnic groups, age groups, and production facilities. If the units in the diﬀerent strata diﬀer in terms of the characteristic under study, stratiﬁed sampling is preferable to s.r.s. For example, if diﬀerent production facilities diﬀer in terms of the proportion of defective products, a stratiﬁed sample is preferable. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
19. 19. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingHow do we select a s.r.s. of size n from a population of N units? STEP 1: Assign to each unit a number from 1 to N. STEP 2: Write each number on a slips of paper, place the N slips of paper in an urn, and shuﬄe them. STEP 3: Select n slips of paper at random, one at a time.Alternatively, the entire process can be performed in software likeR. We will see this in the next lab session. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
20. 20. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingSampling without replacement simply means that apopulation unit can be included in a sample at most once. Forexample, a simple random sample is obtained by samplingwithout replacement: Once a unit’s slip of paper is drawn, itis not placed back into the urn.Sampling with replacement means that after a unit’s slip ofpaper is chosen, it is put back in the urn. Thus a populationunit could be included in the sample anywhere between 0 andn times. Rolling a die can be thought of as sampling withreplacement from the numbers 1, 2, . . . , 6.Though conceptually undesirable, sampling with replacementis easier to work with from a mathematical point of view.When a population is very large, sampling with and withoutreplacement are practically equivalent. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
21. 21. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingNon-representative samples arise whenever the sampling planis such that a part, or parts, of the population of interest areeither excluded from, or systematically under-represented in,the sample. This is called selection bias.Two examples of non-representative samples are self-selectedand convenience samples.A self-selected sample often occurs when people are asked tosend in their opinions in surveys or questionnaires. Forexample, in a political survey, often those who feel that thingsare running smoothly or who support an incumbent will(apathetically) not respond, whereas those activists whostrongly desire change will voice their opinions. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
22. 22. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling A convenience sample is a sample made up from units that are most easily reached. For example, randomly selecting students from your classes will not result in a sample that is representative of all PSU students because your classes are mostly comprised of students with the same major as you. A famous example of selection bias is the following.Example (The Literary Digest poll of 1936)The magazine had been extremely successful in predicting theresults in US presidential elections, but in 1936 it predicted a3-to-2 victory for Republican Alf Landon over the Democraticincumbent Franklin Delano Roosevelt. Worth noting is that thisprediction was based on 2.3 million responses (out of 10 millionquestionnaires sent). On the other hand Gallup correctly predictedthe outcome of that election by surveying only 50,000 people. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
23. 23. Outline Representative Samples Why Statistics? Simple Random and Stratiﬁed Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingGo to next lesson http://www.stat.psu.edu/~mga/401/course.info/b.lect2.pdfGo to the Stat 401 home pagehttp://www.stat.psu.edu/~mga/401/course.info/http://www.stat.psu.edu/~mgahttp://www.google.com M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts