Upcoming SlideShare
×

# Stats Workshop2010

883 views
811 views

Published on

Paul Garthwaite's Presentation on Statistics

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
883
On SlideShare
0
From Embeds
0
Number of Embeds
137
Actions
Shares
0
28
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Stats Workshop2010

1. 1. MCT Mathematics & Statistics Paul Garthwaite [email_address] http://statistics.open.ac.uk/advisory.html Introduction to Statistical Analysis
2. 2. The Scientific Method <ul><li>Deductive reasoning: </li></ul><ul><ul><li>from the general to the specific (&quot;top-down&quot; approach) </li></ul></ul>
3. 3. Theory: In a pig’s digestive system, all phosphate ions are the same, regardless of what they were bound with. Theory: If you are a diabetic, losing weight will help you live longer.
4. 4. Study Design (deductive reasoning)
5. 5. Hypothesis testing is like a court of law: You aim to disprove the null hypothesis. The hypothesis of a court: The person in the dock is innocent. The aim is to gather evidence that is inconsistent with this hypothesis. We reject the hypothesis (and decide the person is guilty) if the evidence makes the hypothesis unlikely (beyond all reasonable doubt) .
6. 6. Inductive Reasoning <ul><li>From set of specific observations to broader generalizations and theories (&quot;bottom up&quot; approach) </li></ul>
7. 7. Observational Study (inductive reasoning)
8. 8. Observational studies could feed into inductive reasoning. Pilot studies have a place in forming hypotheses. Some disciplines (e.g. psychology) seem to disapprove of observational studies. Presumably such studies are written up as if the hypotheses were decided before gathering the data. (A dangerous practice!)
9. 9. Statistical Design <ul><li>Study can be: </li></ul><ul><ul><li>Observational  analyse existing data (Inductive) </li></ul></ul><ul><ul><li>Experimental  produce new data (Deductive) </li></ul></ul><ul><li>Relies on random sampling </li></ul><ul><ul><li>Obtain information about the whole from analysing the part (inferential statistics) </li></ul></ul><ul><li>Experimental design: </li></ul><ul><ul><li>randomly allocates conditions/treatments on subjects to observe their response </li></ul></ul>
10. 10. Warning <ul><li>Poor designs can lead to: </li></ul><ul><li>Inefficient use of collected data </li></ul><ul><li>Difficult statistical analysis </li></ul><ul><li> </li></ul><ul><li>Inability to draw meaningful conclusions </li></ul>
11. 11. Use Common Sense <ul><li>Think about questions your research might answer. </li></ul><ul><li>Can you gather data related to those questions? </li></ul><ul><li>Using common sense, would the data answer those questions? </li></ul><ul><li>Pigs and phosphates: feed pigs different phosphate compounds and see if their bone strengths differ? </li></ul><ul><li>Diabetes and diet: use patient notes to get age at death, age at diagnosis, and weight loss in first year after diagnosis. </li></ul>
12. 12. <ul><li>In many ways, statistics just makes common sense rigorous. </li></ul><ul><li>Think about what covariates may be relevant and try to measure them (gender and age in many social contexts; smoking in medical studies; etc.) </li></ul><ul><li>Try to reduce random variation. </li></ul>
13. 13. Gather lots of data <ul><li>A decent experiment will generally form about a quarter of a PhD (perhaps more) – four papers are enough for a PhD in most disciplines. </li></ul><ul><li>Designing an experiment, collecting data, analysing it, writing a paper, revising the paper, and so on, will take several months. </li></ul><ul><li>People typically do not spend enough time gathering data. The data drives the conclusions you can reach </li></ul><ul><li>More data = Firmer conclusions </li></ul>
14. 14. How much data? (My rules of thumb.) <ul><li>In a controlled experiment where the quantity of interest is a measurement, forty or so independent observations will typically enable modest-sized differences to be identified. </li></ul><ul><li>With observational data and questionnaire data, gathering 150 data or more should typically be the aim: you want 25 observations in each category of interest. </li></ul><ul><li>More data is needed with counts than measurements. </li></ul><ul><li>More data is needed with binary quantities (yes/no; cured/not cured; success/failure) than with Likert scores. </li></ul>
15. 15. Questionnaires Likert scales are good: strongly weakly indifferent/ disagree/ strongly agree/ agree/ disagree. Having five points on a Likert scale is often about right. Code the values as 1, 2, 3, 4, 5 and it is usually OK to treat them as measurements. Open-ended questions are hard to analyse.
16. 16. Statistical Data Analysis <ul><li>Turning data into information: First produce summary statistics (means percentages, standard deviations), graphs, bar-charts, cross-tabulations. </li></ul><ul><li>Try to get a feel for your data – what does it tell you? (If you feel you are non-numerate, work at becoming numerate.) </li></ul><ul><li>Try to form quantitative hypotheses that you think the data will refute. (e.g. “The proportions in the ‘strongly agree’ category are the same in these two sub-populations” or “As this quantity changes, the average value of this other quantity does not change”.) </li></ul>
17. 17. Common fundamental statistical methods <ul><li>t- tests </li></ul><ul><li>Comparison of proportions </li></ul><ul><li>Contingency tables </li></ul><ul><li>Regression </li></ul><ul><li>Analysis of variance </li></ul><ul><li>It is worth knowing when these are useful. </li></ul>
18. 18. Regression <ul><li>In many ways regression is the most useful statistical method. </li></ul><ul><li>It lets you test whether one variable affects another (while controlling for other covariates if necessary). </li></ul><ul><li>It also describes the relationship. </li></ul><ul><li>Stepwise methods help you find/test which variables are important. </li></ul><ul><li>Generalised linear models add flexibility. </li></ul>
19. 19. <ul><li>There is an advisory service that can help on: </li></ul><ul><ul><li>Designing an experiment </li></ul></ul><ul><ul><li>How to approach the analysis of data </li></ul></ul><ul><ul><li>Choosing appropriate techniques </li></ul></ul><ul><ul><li>Interpreting results </li></ul></ul><ul><ul><li>Understanding outputs from statistical packages </li></ul></ul><ul><li>Too few people ask for advice before gathering data. </li></ul>
20. 20. Statistical Software <ul><li>Packages are only tools (‘number crunches’) </li></ul><ul><ul><li> Most important is to choose adequate </li></ul></ul><ul><ul><li>method for your problem </li></ul></ul><ul><ul><li>Remember: </li></ul></ul><ul><ul><li> Garbage in  Garbage out </li></ul></ul>
21. 21. Some Statistical Packages <ul><li>General software (e.g. spreadsheets) </li></ul><ul><li>Specialised: </li></ul><ul><ul><li>Genstat, Minitab, SAS, Statistica, </li></ul></ul><ul><ul><li>SPSS </li></ul></ul><ul><ul><ul><li>wide range of statistical procedures </li></ul></ul></ul><ul><ul><ul><li>good graphical capability </li></ul></ul></ul><ul><ul><ul><li>fairly easy to use (menu driven option) </li></ul></ul></ul><ul><ul><ul><li>Good help facility with case studies </li></ul></ul></ul>
22. 22. Statistics Courses <ul><li>M248 : Analysing Data </li></ul><ul><ul><li>Exploratory data analysis. Models for data. Estimation. Confidence intervals. Hypothesis testing. Regression and two-variable problems. (Minitab) </li></ul></ul><ul><li>M249 : Practical Modern Statistics </li></ul><ul><ul><li>Medical statistics. Time series analysis. Multivariate statistics. Bayesian methods. </li></ul></ul><ul><ul><li>Focus on applications: SPSS and WinBUGS. </li></ul></ul>
23. 23. Statistics Courses <ul><li>M343 : Applications of Probability </li></ul><ul><ul><li>Models to describe patterns in time and space. Epidemiological models. Genetics and stockmarket price applications. </li></ul></ul><ul><li>M346 : Linear Statistical Modelling </li></ul><ul><ul><li>ANOVA. Design of experiments. Linear regression. Generalized linear models. Diagnostic checking. Log-linear models. (GenStat) </li></ul></ul>
24. 24. The Stats-Advisory Service <ul><li>Drop-in sessions </li></ul><ul><ul><li>Mondays : 2:00 – 4:00 (M216) </li></ul></ul><ul><ul><li>Thursdays : 10:30 – 12:20 (M214) (Both in Maths and Computing Building) </li></ul></ul><ul><li>Web: </li></ul><ul><ul><li>http://statistics.open.ac.uk/advisory.html </li></ul></ul><ul><li>E-mail: </li></ul><ul><ul><li> [email_address] </li></ul></ul>