B.lect1

384 views

Published on

  • Be the first to comment

  • Be the first to like this

B.lect1

  1. 1. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts Lecture 1Chapter 1: Basic Statistical Concepts M. George Akritas M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  2. 2. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsWhy Statistics?Populations, Samples, and CensusSome Sampling Concepts Representative Samples Simple Random and Stratified Sampling Sampling With and Without Replacement Non-representative Sampling M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  3. 3. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsExample (Examples of Engineering/Scientific Studies) Comparing the compressive strength of two or more cement mixtures. Comparing the effectiveness of three cleaning products in removing four different types of stains. Predicting failure time on the basis of stress applied. Assessing the effectiveness of a new traffic regulatory measure in reducing the weekly rate of accidents. Testing a manufacturer’s claim regarding a product’s quality. Studying the relation between salary increases and employee productivity in a large corporation.What makes these studies challenging (and thus to requireStatistics) is the inherent or intrinsic variability: M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  4. 4. Outline Why Statistics? Populations, Samples, and Census Some Sampling Concepts The compressive strength of different preparations of the same cement mixture will differ. The figure in http://sites. stat.psu.edu/~mga/401/fig/HistComprStrCement.pdf shows 32 compressive strength measurements, in MPa (MegaPascal units), of test cylinders 6 in. in diameter by 12 in. high, using water/cement ratio of 0.4, measured on the 28th day after they are made. Under the same stress, two beams will fail at different times. The proportion of defective items of a certain product will differ from batch to batch.Intrinsic variability renders the objectives of the case studies, asstated, ambiguous. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  5. 5. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsThe objectives of the case studies can be made precise if stated interms of averages or means. Comparing the average hardness of two different cement mixtures. Predicting the average failure time on the basis of stress applied. Estimation of the average coefficient of thermal expansion. Estimation of the average proportion of defective items.Moreover, because of variability, the words ”average” and ”mean”have a technical meaning which can be made clear through theconcepts of population and sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  6. 6. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsDefinitionPopulation is a well-defined collection of objects or subjects, ofrelevance to a particular study, which are exposed to the sametreatment or method. Population members are called units.Example (Examples of populations:) All water samples that can be taken from a lake. All items of a certain manufactured product. All students enrolled in Big Ten universities during the 2007-08 academic year. Two types of cleaning products. (Each type corresponds to a population.) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  7. 7. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsThe objective of a study is to investigate certain characteristic(s)of the units of the population(s) of interest.Example (Examples of characteristics:) All water samples taken from a lake. Characteristics: Mercury concentration; Concentration of other pollutants. All items of a certain manufactured product (that have, or will be produced). Characteristic: Proportion of defective items. All students enrolled in Big Ten universities during the 2007-08 academic year. Characteristics: Favorite type of music; Political affiliation. Two types of cleaning products. Characteristic: cleaning effectiveness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  8. 8. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsIn the example where different (but of the same type) beamsare exposed to different stress levels: the characteristic of interest is time to failure of a beam under each stress level, and each stress level used in the study corresponds to a separate population which consists of all beams that will be exposed to that stress level.This emphasizes that populations are defined not only by theunits they consist of, but also by the method or treatmentapplied to these units. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  9. 9. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsFull (i.e. population-level) understanding of a characteristicrequires the examination of all population units, i.e. a census. For example, full understanding of the relation between salary and productivity of a corporation’s employees requires obtaining these two characteristics from all employees.However, taking a census can be time consuming and expensive: The 2000 U.S. Census costed $6.5 billion, while the 2010 Census costed $13 billion. Moreover, census is not feasible if the population is hypothetical or conceptual, i.e. not all members are available for examination.Because of the above, we typically settle for examining allunits in a sample, which is a subset of the population. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  10. 10. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsDue to the intrinsic variability, the sample properties/attributes ofthe characteristic of interest will differ from those of thepopulation. For example The average mercury concentration in 25 water samples will differ from the overall mercury concentration in the lake. The proportion in a sample of 100 PSU students who favor the use of solar energy will differ from the corresponding proportion of all PSU students. The relation between bear’s chest girth and weight in a sample of 10 bears, will differ from the corresponding relation in the entire population of 50 bears in a forested region. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  11. 11. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsThe GOOD NEWS is that, if the sample is suitably drawn, thensample properties approximate the population properties. 400 300 Weight 200 100 20 25 30 35 40 45 50 55 Chest GirthFigure: Population and sample relationships 1between Basic Statistical Concepts M. George Akritas Lecture Chapter 1: chest girth and
  12. 12. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsSampling Variability Samples properties of the characteristic of interest also differ from sample to sample. For example: 1. The number of US citizens, in a sample of size 20, who favor expanding solar energy, will (most likely) be different from the corresponding number in a different sample of 20 US citizens. 2. The average mercury concentration in two sets of 25 water samples drawn from a lake will differ. The term sampling variability is used to describe such differences in the characteristic of interest from sample to sample. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  13. 13. Outline Why Statistics?Populations, Samples, and Census Some Sampling Concepts 400 300Weight 200 100 20 25 30 35 40 45 50 55 Chest Girth Figure: Illustration of Sampling Variability. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  14. 14. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsPopulation level properties/attributes of characteristic(s) ofinterest are called (population) parameters. Examples of parameters include averages, proportions, percentiles, and correlation coefficient.The corresponding sample properties/attributes ofcharacteristics are called statistics. The term sports statisticscomes from this terminology.Sample statistics approximate the corresponding populationparameters but are not equal to them.Statistical inference deals with the uncertainty issues whicharise in approximating parameters by statistics.The tools of statistical inference include point and intervalestimation, hypothesis testing and prediction. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  15. 15. Outline Why Statistics? Populations, Samples, and Census Some Sampling ConceptsExample (Examples of Estimation, Hypothesis Testing andPrediction) Estimation (point and interval) would be used in the task of estimating the coefficient of thermal expansion of a metal, or the air pollution level. Hypothesis testing would be used for deciding whether to take corrective action to bring the air pollution level down, or whether a manufacturer’s claim regarding the quality of a product is false. Prediction arises in cases where we would like to predict the failure time on the basis of the stress applied, or the age of a tree on the basis of its trunk diameter. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  16. 16. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingFor valid statistical inference the sample must berepresentative of the population. For example, a sample ofPSU basketball players is not representative of PSU students,if the characteristic of interest is height.Typically it is hard to tell whether a sample is representativeof the population. So, we define a sample to be representativeif . . . (cyclical definition!!) it allows for valid statistical inference.The only guarantee for that comes from the method used toselect the sample (sampling method).The good news is that there are several sampling methodsguarantee representativeness. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  17. 17. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingDefinitionA sample of size n is a simple random sample if the selectionprocess ensures that every sample of size n has equal chance ofbeing selected. To select a s.r.s. of size 10 from a population of 100 units, any of the 100!/(10!90!) samples of size 10 must be equally likely. In simple random sampling every member of the population has the same chance of being included in the sample. The reverse, however, is not true.ExampleTo select a sample of 2 students from a population of 20 male and20 female students, one selects at random one male and onefemale students. Is this a s.r.s.? (Does every student have thesame chance of being included in the sample?) M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  18. 18. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingAnother sampling method for obtaining a representative sample iscalled stratified sampling.DefinitionA stratified sample consists of simple random samples from eachof a number of groups (which are non-overlapping and make upthe entire population) called strata. Examples of strata include: ethnic groups, age groups, and production facilities. If the units in the different strata differ in terms of the characteristic under study, stratified sampling is preferable to s.r.s. For example, if different production facilities differ in terms of the proportion of defective products, a stratified sample is preferable. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  19. 19. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingHow do we select a s.r.s. of size n from a population of N units? STEP 1: Assign to each unit a number from 1 to N. STEP 2: Write each number on a slips of paper, place the N slips of paper in an urn, and shuffle them. STEP 3: Select n slips of paper at random, one at a time.Alternatively, the entire process can be performed in software likeR. We will see this in the next lab session. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  20. 20. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingSampling without replacement simply means that apopulation unit can be included in a sample at most once. Forexample, a simple random sample is obtained by samplingwithout replacement: Once a unit’s slip of paper is drawn, itis not placed back into the urn.Sampling with replacement means that after a unit’s slip ofpaper is chosen, it is put back in the urn. Thus a populationunit could be included in the sample anywhere between 0 andn times. Rolling a die can be thought of as sampling withreplacement from the numbers 1, 2, . . . , 6.Though conceptually undesirable, sampling with replacementis easier to work with from a mathematical point of view.When a population is very large, sampling with and withoutreplacement are practically equivalent. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  21. 21. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingNon-representative samples arise whenever the sampling planis such that a part, or parts, of the population of interest areeither excluded from, or systematically under-represented in,the sample. This is called selection bias.Two examples of non-representative samples are self-selectedand convenience samples.A self-selected sample often occurs when people are asked tosend in their opinions in surveys or questionnaires. Forexample, in a political survey, often those who feel that thingsare running smoothly or who support an incumbent will(apathetically) not respond, whereas those activists whostrongly desire change will voice their opinions. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  22. 22. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative Sampling A convenience sample is a sample made up from units that are most easily reached. For example, randomly selecting students from your classes will not result in a sample that is representative of all PSU students because your classes are mostly comprised of students with the same major as you. A famous example of selection bias is the following.Example (The Literary Digest poll of 1936)The magazine had been extremely successful in predicting theresults in US presidential elections, but in 1936 it predicted a3-to-2 victory for Republican Alf Landon over the Democraticincumbent Franklin Delano Roosevelt. Worth noting is that thisprediction was based on 2.3 million responses (out of 10 millionquestionnaires sent). On the other hand Gallup correctly predictedthe outcome of that election by surveying only 50,000 people. M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts
  23. 23. Outline Representative Samples Why Statistics? Simple Random and Stratified Sampling Populations, Samples, and Census Sampling With and Without Replacement Some Sampling Concepts Non-representative SamplingGo to next lesson http://www.stat.psu.edu/~mga/401/course.info/b.lect2.pdfGo to the Stat 401 home pagehttp://www.stat.psu.edu/~mga/401/course.info/http://www.stat.psu.edu/~mgahttp://www.google.com M. George Akritas Lecture 1 Chapter 1: Basic Statistical Concepts

×