• Like
Lesson2
Upcoming SlideShare
Loading in...5
×

Lesson2

  • 110 views
Uploaded on

 

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
110
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Outline Comparative Studies The Role of Probability Lesson 2Chapter 1: Basic Statistical Concepts Michael Akritas Department of Statistics The Pennsylvania State University Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 2. Outline Comparative Studies The Role of Probability1 Comparative Studies Terminology and Comparative Graphics Randomization, Confounding and Simpson’s Paradox Causation: Experiments and Observational Studies Factorial Experiments2 The Role of Probability Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 3. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial Experiments• Comparative studies aim at discerning and explainingdifferences between two or more populations. Examplesinclude: The comparison of two methods of cloud seeding for hail and fog suppression at international airports, the comparison of two or more cement mixtures in terms of compressive strength, the comparison the survival times of a type of root system under different watering regimens, the comparison of the effectiveness of three cleaning products in removing four different types of stains. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 4. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsOutline 1 Comparative Studies Terminology and Comparative Graphics Randomization, Confounding and Simpson’s Paradox Causation: Experiments and Observational Studies Factorial Experiments 2 The Role of Probability Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 5. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsJargon used in comparative studies One-factor studies. Factor levels; treatments; populations Response variable Example In the comparison the survival times of a type of root system under different watering regimens, Watering is the factor. The different watering regimens are called factor levels or treatments. Treatments correspond to populations. The survival time is the response variable. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 6. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsMore jargon Experimental units: These are the subjects or objects on which measurements are made. In previous example, the roots are the experimental units Multi-factor studies. Factor levels combinations; treatments; populations Factor B Factor A 1 2 3 4 1 Tr11 Tr12 Tr13 Tr14 2 Tr21 Tr22 Tr23 Tr24 Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 7. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsExampleTo study the effect of five different temperature levels and fivedifferent humidity levels affect the yield of a chemical reaction: Factors are temperature and humidity, with 5 levels each. Treatments are the different factor level combinations, which, again, correspond to the different populations. Response is the yield of the chemical reaction. Experimental units is the set of materials used for the chemical reaction. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 8. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsExampleA study will compare the level of radiation emitted by five kindsof cell phones at each of three volume settings. State thefactors involved in this study, the number of levels for eachfactor, the total number of populations or treatments, theresponse variable and the experimental units. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 9. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsComparisons typically focus on differences (e.g. of means,or proportions, or medians), or ratios (e.g. ratios ofvariances).Differences are also called contrasts. The comparison of two different cloud seeding methods may focus on the contrast µ1 − µ2 .In studies involving more than two populations a number ofdifferent contrasts may be of interest.For example, in a study aimed at comparing the meantread life of four types of high performance tires designedfor use at higher speeds, possible sets of contrasts ofinterest are 1 µ1 − µ2 , µ1 − µ3 , µ1 − µ4 (control vs treatment) µ1 + µ2 µ3 + µ4 2 − (brand A vs brand B) 2 2 3 µ1 − µ, µ2 − µ, µ3 − µ, µ4 − µ (tire effects) Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 10. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsThe Comparative Boxplot The comparative boxplot consists of side-by-side individual boxplots for the data sets from each population. It is useful for providing a visual impression of differences in the median and percentiles. Example Iron concentration measurements from four different iron ore formations are given in http://www.stat.psu.edu/˜mga/ 401/Data/anova.fe.data.txt. The comparative boxplot can be seen in http://www.stat.psu.edu/˜mga/401/ fig/BoxplotComp_Fe.pdf Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 11. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsThe Comparative Bar Graph The comparative bar graph generalizes the bar graph in that for each category it plots several bars represents the category’s proportion in each of the populations being compared; different colors are used to distinguish bars that correspond to different populations. Example The light vehicle market share of car companies for the month of November in 2010 and 2011 is given in http://www.stat.psu.edu/˜mga/401/Data/ MarketShareLightVehComp.txt. The comparative bar graph can be seen in http: //stat.psu.edu/˜mga/401/fig/LvMsBarComp.pdf Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 12. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsThe Stacked Bar Graph Example The site http: //stat.psu.edu/˜mga/401/Data/QsalesIphone.txt shows worldwide iPhone sales data, in thousands of units, categorized by year and quarter. The stacked (or segmented) bar graph can be seen in http://sites.stat.psu.edu/ ˜mga/401/fig/QsIphones.pdf. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 13. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsOutline 1 Comparative Studies Terminology and Comparative Graphics Randomization, Confounding and Simpson’s Paradox Causation: Experiments and Observational Studies Factorial Experiments 2 The Role of Probability Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 14. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsTo avoid comparing apples with oranges, the experimental unitsfor the different treatments must be homogenous. If fabric age affects the effectiveness of cleaning products then, unless the fabrics used in different treatments are age- homogenous, the comparison of treatments will be distorted. If the meditation group in the diet study consists mainly of those subjects who had practiced meditation before, the comparison will be distorted.To mitigate the distorting effects, called confounding of otherpossible factors, called lurking variables, it is recommendedthat the allocation of units to treatments be randomized. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 15. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsRandomizing the allocation of fabric pieces to the differenttreatments (cleaning product and stain) avoidsconfounding with the factor age of fabric.Randomizing the allocation of subjects to the control (dietalone) and treatment (diet plus meditation) groups avoidsconfounding with the experience factor. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 16. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsThe distortion caused by lurking variables in the comparison ofproportions is called Simpson’s Paradox.ExampleThe success rates of two treatments, Treatments A and B, forkidney stones are: Treatment A Treatment B 78% (273/350) 83% (289/350)The obvious conclusion is that Treatment B is more effective.The lurking variable here is the size of the kidney stone. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 17. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsExample (Kidney Stone Example Continued)When the size of the treated kidney stone is taken intoconsideration, the success rates are as follows: Small Large Combined Tr.A 81/87 or .93 192/263 or .73 273/350 or .78 Tr.B 234/270 or .87 55/80 or .69 289/350 or .83Now we see that Treatment A has higher success rate for bothsmall and large stones. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 18. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsBatting Averages Example The overall batting average of baseball players Derek Jeter and David Justice during the years 1995 and 1996 were 0.310 and 0.270, respectively. But looking at each year separately we get a different picture: 1995 1996 Combined Jeter 12/48 or .250 183/582 or .314 195/630 or .310 Justice 104/411 or .253 45/140 or .321 149/551 or .270 Justice had a higher batting average than Jeter in both 1995 and 1996. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 19. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsOutline 1 Comparative Studies Terminology and Comparative Graphics Randomization, Confounding and Simpson’s Paradox Causation: Experiments and Observational Studies Factorial Experiments 2 The Role of Probability Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 20. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsDefinitionA study is called a statistical experiment if the investigatorcontrols the allocation of units to treatments or factor-levelcombinations, and this allocation is done in a randomizedfashion. Otherwise the study is called observational.• Causation can only be established via a statisticalexperiment. Thus, a relation between salary increase andproductivity does not imply that salary increases causeincreased productivity.• Observational studies cannot establish causation, unlessthere is additional corroborating evidence. Thus, the linkbetween smoking and health has been established throughobservational studies. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 21. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsOutline 1 Comparative Studies Terminology and Comparative Graphics Randomization, Confounding and Simpson’s Paradox Causation: Experiments and Observational Studies Factorial Experiments 2 The Role of Probability Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 22. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsA statistical experiment involving several factors is called afactorial experiment if all factor-level combinations areconsidered. Thus, Factor B Factor A 1 2 3 4 1 Tr11 Tr12 Tr13 Tr14 2 Tr21 Tr22 Tr23 Tr24is a factorial experiment if all 8 treatments are included inthe study.Of interest in factorial experiments is the comparison of thelevels within each factor. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 23. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsMain Effects and Interactions Definition Synergistic effects among the levels of two different factors, i.e., when a change in the level of factor A has different effects on the levels of factor B, we say that there is interaction between the two factors. The absence of interaction is called additivity. Example An experiment considers two types of corn, used for bio-fuel, and two types of fertilizer. The following two tables give possible population mean yields for the four combinations of seed type and fertilizer type. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 24. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial Experiments Fertilizer Row Main I II Averages Row Effects Seed A µ11 = 107 µ12 = 111 µ1· = 109 α1 = −0.25 Seed B µ21 = 109 µ22 = 110 µ2· = 109.5 α2 = 0.25 ColumnAverages µ·1 = 108 µ·2 = 110.5 µ·· = 109.25 Main Column β1 = −1.25 β2 = 1.25 Effects Here the factors interact. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 25. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial Experiments Fertilizer Row Main Row I II Averages Effects Seed A µ11 = 107 µ12 = 111 µ1· = 109 α1 = −1 Seed B µ21 = 109 µ22 = 113 µ2· = 111 α2 = 1 ColumnAverages µ·1 = 108 µ·2 = 112 µ·· = 110 Main Column β1 = −2 β2 = 2 Effects Here the factors do not interact. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 26. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsUnder additivity: There is an indisputably best level for each factor, and The best factor level combination is that of the best level of factor A with the best level of factor B. What is the best level of each factor in the above design?Under additivity, the comparison of the levels of each factorare based on the main effects: αi = µi· − µ·· , βj = µ·j − µ··See the main effects in the above two designs.Under additivity, µij = µ·· + αi + βjSee the above design. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 27. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsWhen the factors interact, the cell means are not given interms of the main effects as above.The difference γij = µij − (µ·· + αi + βj )quantifies the interaction effect.For example, in the above non-additive design, γ11 = µ11 − µ·· − α1 − β1 = 107 − 109.25 + 0.25 + 1.25 = −0.75. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 28. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsData Versions of Main Effects and Interactions Data from a two-factor factorial experiment use three subscripts: Factor B Factor A 1 2 3 1 x11k , x12k , x13k , k = 1, . . . , n11 k = 1, . . . , n12 k = 1, . . . , n13 2 x21k , x22k , x23k , k = 1, . . . , n21 k = 1, . . . , n22 k = 1, . . . , n23 Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 29. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsSample versions of main effects and interactions aredefined using nij 1 x ij = xijk , nij k=1instead of µij : Sample Main Row αi = x i· − x ·· , βj = x ·j − x ·· and Column Effects Sample Interaction γij = x ij − x ·· + αi + βj Effects Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 30. Terminology and Comparative Graphics Outline Randomization, Confounding and Simpson’s Paradox Comparative Studies Causation: Experiments and Observational Studies The Role of Probability Factorial ExperimentsSample versions of main effects and interactions estimatetheir population counterparts but, in general, they are notequal to them.Thus, even if the data has come from an additive design,the sample interaction effects will not be zero.The interaction plot is a graphical technique that can helpassess whether the sample interaction effects aresignificantly different from zero. For each level of, say, factor B, the interaction plot traces the cell means along the levels of factor A. See http://stat.psu.edu/˜mga/401/fig/ CloudSeedInterPlot.pdf for an example. For data coming from additive designs, these traces (or profiles) should be approximately parallel. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 31. Outline Comparative Studies The Role of ProbabilityProbability and Statistics Probability plays a central role in statistics, but the two differ: In a probability problem, the properties of the population of interest are assumed known, whereas statistics is concerned with learning those properties. Thus probability uses properties of the population to infer those of the sample, while statistical inference does the opposite. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 32. Outline Comparative Studies The Role of ProbabilityExample (Examples of Probability Questions) If 5% of electrical components have a certain defect, what are the chances that a batch of 500 such components will contain less than 20 defective ones? 60% of all batteries last more than 1500 hours of operation, what are the chances that in a sample of 100 batteries there will be at least 80 that last more than 1500 hours? If the highway mileage achieved by the 2011 Toyota Prius cars has population mean and standard deviation of 51 and 1.5 miles per gallon, respectively, what are the changes that in a sample of size 10 cars the average highway mileage is lass than 50 miles per gallon? Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 33. Outline Comparative Studies The Role of Probability Figure: The reverse actions of Probability and StatisticsIn spite of this difference, statistical inference itself would not bepossible without probability. Read also Example 1.9.3, p. 41,and the paragraph above it. Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts
  • 34. Outline Comparative Studies The Role of ProbabilityGo to previous lesson http://www.stat.psu.edu/˜mga/401/course.info/lesson1.pdfGo to next lesson http://www.stat.psu.edu/˜mga/401/course.info/lesson3.pdfGo to the Stat 401 home page http://www.stat.psu.edu/˜mga/401/course.info/ Michael Akritas Lesson 2 Chapter 1: Basic Statistical Concepts