19 Inference


Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

19 Inference

  1. 1. Stat310 Introduction to inference Hadley Wickham Saturday, 27 March 2010
  2. 2. Continuity correction Applies only to sums, not means! Important only where n smaller. Saturday, 27 March 2010
  3. 3. Homework Learn the change of variable steps! If X and Y are independent, and you know f(x) and f(y), what is f(x, y)? Common pattern Saturday, 27 March 2010
  4. 4. What proportion of Rice students smoke marijuana? Saturday, 27 March 2010
  5. 5. 1. Introduction to inference 2. A survey 3. Estimators and their properties 4. Feedback Saturday, 27 March 2010
  6. 6. Data vs. Distributions Random experiments produce data. A repeatable random experiment has some underlying distribution. We want to go from the data to say something about the underlying distribution. So far we have been doing the opposite. Saturday, 27 March 2010
  7. 7. Inference Up to now: Given a sequence of random variables with distribution F, what can we say about the mean of a sample? What we really want: Given the mean of a sample, what can we say about the underlying sequence of random variables? Saturday, 27 March 2010
  8. 8. Marijuana Up to now: If I told you that smoking pot was Binomially distributed with probability p, you could tell me what I would expect to see if I sampled n people at random. What we want: If I sample n people at random and find out m of them smoke pot, what does that tell me about p? Saturday, 27 March 2010
  9. 9. Your turn Let’s I selected 10 people at random from the Rice campus and asked them if they’d smoked pot in the last month. Two said yes. Let Yi be 1 if person i smoked pot and 0 otherwise. What’s a good guess for distribution of Yi? Saturday, 27 March 2010
  10. 10. 3.5e−08 3.0e−08 2.5e−08 2.0e−08 prob 1.5e−08 1.0e−08 5.0e−09 0.0e+00 0.0 0.2 0.4 0.6 0.8 1.0 p Saturday, 27 March 2010
  11. 11. Plug-in principle A good first guess at the true mean, variance, or any other parameter of a distribution is simply to estimate it from the data. We’ll learn more sophisticated ways after the break. Saturday, 27 March 2010
  12. 12. Problems If I picked someone at random out of the Rice campus directory, and asked them if they’d smoked pot in the last month, how likely do you think I am to get an honest answer? How can we get around this? Saturday, 27 March 2010
  13. 13. Your turn Compare your survey to someone with the other colour survey. What is the difference? How could you use the answers to estimate how many people smoke pot? Saturday, 27 March 2010
  14. 14. Questions 1. Have you smoked pot in the last month? 2. Flip a coin. If it’s heads, write yes. Otherwise write yes if you’ve smoked pot in the last month, and no otherwise. 3. In the last month, how many of the following activities have you done? One list contains smoking pot, the other doesn’t. Saturday, 27 March 2010
  15. 15. Results 1. The proportion of yeses in Q1. 2. The proportion of yeses in (Q2 - 0.5) * 2 3. (Q3a/4 - Q3b/5) / n. Saturday, 27 March 2010
  16. 16. Your turn Fill in the survey. Make sure no one else can see your answers! Saturday, 27 March 2010
  17. 17. Definitions Parameter space: set of all possible parameter values Estimator: process/function which takes data and gives best guess for parameter Point estimate: estimator for a single value Saturday, 27 March 2010
  18. 18. Your turn With a partner, brainstorm as many different properties that you could use to decide which estimator is best. Remember that there is some true unknown proportion that we are trying to estimate. Saturday, 27 March 2010
  19. 19. Important properties of an estimator Unbiased Small variance Consistent Sufficient Saturday, 27 March 2010
  20. 20. ˆ =θ E(θ) Unbiased ˆ1 ) < V ar(θ2 ) V ar(θ ˆ Minimum variance Common problem is to find UMVE (unbiased minimum variance estimator) across all possible estimators Saturday, 27 March 2010
  21. 21. Low bias, low variance Low bias, high variance High bias, low variance High bias, high variance Saturday, 27 March 2010
  22. 22. Consistent n=10 n=5 n=1 Saturday, 27 March 2010
  23. 23. Results Saturday, 27 March 2010
  24. 24. Your turn How can we describe this as a random experiment? Do you have any concerns about applying the results of this survey to the entire Rice campus? Saturday, 27 March 2010
  25. 25. Experiments Remember, the context is always the random experiment. For this case the randomness comes from picking the person at random. Repeating the experiment does not mean asking each person again (the answer would be the same) but asking a different set of people. Saturday, 27 March 2010
  26. 26. Next time (After the test and spring recess) Method of moments Maximum likelihood Saturday, 27 March 2010
  27. 27. Feedback Saturday, 27 March 2010