Upcoming SlideShare
×

# Statistics for Librarians, Session 4: Statistics best practices

810 views

Published on

Final session in a series of four seminars presented to University of North Texas librarians. This presentation brings together some best practices for gathering, organizing, analyzing, and presenting statistics and data.

Published in: Data & Analytics, Technology
1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
810
On SlideShare
0
From Embeds
0
Number of Embeds
237
Actions
Shares
0
48
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Statistics for Librarians, Session 4: Statistics best practices

1. 1. BEST PRACTICES FOR STATISTICS
2. 2. Know what you know and what you don’t know Have a comparison group Use validated measures Have a Data Entry Plan Get to know your data If it doesn’t fit, change it Place your bets before you collect the data Use the best methods of analysis for your question & your data Go beyond the p-value BEST PRACTICES
3. 3. What is Statistics? •Study of Data •Collecting •Organizing •Summarizing •Analyzing •Presenting •Storing & Sharing Why is it Important? •Make sense of the data •Explain what happens and (possibly) why •Make sound decisions •To know how close we are to the truth.
4. 4. Results Bias? Sampling Error? Invalid Measures? Random Error? Other Factors? PURPOSE OF STATISTICS
5. 5. BEST PRACTICE: KNOW WHAT YOU ALREADY KNOW, WHAT YOU WANT TO KNOW AND WHAT YOU DON’T KNOW
6. 6. How do users differ when (searching, finding, selecting) (articles, books, Web sites)? What are the effects of ___________On ____________? Whichis better at improving _________? How are people (finding, selecting, using) _______? What are factors associated with ___________? STARTING WITH YOUR RESEARCH QUESTION
7. 7. KINDS OF VARIABLES Independent Subjects Factors Effects of… Dependent Objects Outcomes Effects on…
8. 8. Nominal •Counts by category •No meaning between the categories (Blue is not better than Red) Ordinal •Ranks •Scales •Space between ranks is subjective Interval •Integers •No baseline •Space between values is equal and objective, but discrete Ratio •Interval data with a baseline •Space between is continuous LEVELS OF MEASUREMENT (NOIR)
9. 9. •Counts by Categories •Ranks •Scales Qualitative •Measurements •Composite scores •Simple Counts Quantitative ANOTHER WAY
10. 10. LIKERT-TYPE SCALE? Arbitrary Few Levels Individual Questions Ordinal? Symmetrical Many Levels Composite Score Interval?
11. 11. BEST PRACTICE: HAVE A COMPARISON GROUP
12. 12. WAYS OF COMPARING… Time Periods Other Libraries National Surveys Patron Types Material Types
13. 13. •Qualitative •Comparison Expected ranks or ratios •Quantitative •Correlations Two variables •Quantitative or Qualitative •Paired or Not Paired Samples or Groups KINDS OF COMPARISON
14. 14. BEST PRACTICE: USE A VALID MEASURE
15. 15. Are you actually measuring what you are trying to measure? VALIDITY OF MEASURES
16. 16. USE A TOOL WITH ESTABLISHED VALIDITY Approaches and Study Skills Inventory for Students (ASSIST) User Engagement Scale (UES)
17. 17. ESTABLISH VALIDITY OF MEASURES •ConsistencyReliability •Common sense Content or Face Validity •Based on theory Construct Validity •Comparison with other valid measures Criterion Validity
18. 18. BEST PRACTICE: HAVE A DATA PLAN
19. 19. GOAL OF DATA COLLECTION IN STATISTICS Reliability Bias
20. 20. BIAS Systematic (not random) deviation from the true value (Statistics.com) Selection Bias Measurement • Observer Bias • Non-response Bias Analysis Bias
21. 21. DATA INPUT Have a data entry plan Train the inputters Use data validation tricks Double-entry
22. 22. BEST PRACTICE: GET TO KNOW YOUR DATA
23. 23. Central Tendency SpreadError EXPLORATORY DATA ANALYSIS
24. 24. • Average • For Quantative data • Excel function: =Average(range) Mean • Middle • For Quantitative or Rank data • Excel function: =Median(range) Median • Most common • Primarily for Qualitative data • Excel function: =Mode(range) Mode MEASURES OF CENTRAL TENDENCY
26. 26. DISTRIBUTION OR SPREAD OF QUALITATIVE DATA Tables •Counts •Percentages/Ratios •Averages of Counts Excel •Pivot Tables
27. 27. PIVOT TABLES IN EXCEL Select Data •Highlight table •Insert->Pivot Table Select Variables •Categories (Row Labels) •Values Change Settings •Percentage of Grand Total •Average
28. 28. DEMONSTRATION OF PIVOT TABLES FOR SPREAD OF QUALITATIVE DATA
29. 29. GRAPH & CHART RULES OF THUMB Trends Connection across the X- axis Categorical Comparisons Grouped Stacked Relative Stacked Categorical Few Categories Differences are Wide
30. 30. QUANTITATIVE DISTRIBUTIONS Stem & Leaf Histogram Distribution graphs
31. 31. John W. Tukey Exploratory Data Analysis Examining your data visually. Stem & Leaf Hinges Box plots Scatter plots, etc. EXPLORATORY DATA ANALYSIS
32. 32. STEM-AND-LEAF Stem Leaf 0 01112222222222222233333344445556 666677788899 1 0000000011122223333356778899 2 00122234444799 3 0245 First digit(s) Last digit Years at UNT 0 5 13 1 6 13 1 6 13 1 6 13 2 6 15 2 6 16 2 7 17 2 7 17 2 7 18 2 8 18 2 8 19 3 11 29 4 11 29 4 12 30 4 12 32 4 12 34 5 12 35 5 13
33. 33. FROM STEM-AND-LEAF TO HISTOGRAMS
34. 34. Stem Leaf Count 0 1122223334445555666666677777899 31 1 000011122222222333346677889 27 2 0122234468 10 3 1112355888 11 4 12 2 Range Count 0-9 31 10-19 27 20-29 10 30-39 11 40-49 2 0 10 20 30 40 0-9 10-19 20-29 30-39 40-49 Histogram of Years at UNT
35. 35. HISTOGRAMS IN EXCEL •Options •Add-ins •Manage Add-ins Analysis Toolpak •Equal Size Ranges •Ceiling (“more”) Set ranges •Data •Data Analysis •Histogram Create Histogram •Insert Bar Chart •Highlight histogram •Select bars & Format Selection •Gap Width=0% Create Graph For Histogram 9 19 29 39 49
36. 36. DEMONSTRATION OF HISTOGRAM IN EXCEL
37. 37. SPREAD OF QUANTITATIVE DATA How variable is the data? Range Quantiles Standard Deviation
38. 38. RANGE & QUARTILES
39. 39. Box plots Median Upper & lower quartiles Outliers PRESENTATION OF SPREAD
40. 40. Measure of dispersion of data Square root of the average variation from the mean STANDARD DEVIATION
41. 41. Greater variation, less certainty Lower variation, more certainty WHAT DOES THE SD TELL YOU?
42. 42. •Min(range) •Max(range) Range •Percentiles.inc(range, %) •Quartile.inc(range, {1,2,3,4}) Quantiles •STDEV.S(range) Standard Deviation SPREAD IN EXCEL
43. 43. NORMAL DISTRIBUTION
44. 44. SKEWED DISTRIBUTIONS
45. 45. DEMONSTRATION OF DISTRIBUTIONS Distribution of the Population The “Truth” N is the # of samples n is the number of items in each sample Watch the cumulative mean & medians slowly merge to the population
46. 46. Transform ation of data BEST PRACTICE: IF IT DOESN’T FIT, CHANGE IT
47. 47. WHY TRANSFORM? 0 5 10 15 20 25 30 35 40 45 50 0-9 10-19 20-29 30-39 Years at UNT 0 2 4 6 8 10 12 14 16 Log10(Years at UNT)
48. 48. Y=a+bx Log(Y)=Log(a+bx) 1/Y = 1/(a+bx) HOW TRANSFORMATION WORKS
49. 49. Evaluate the distribution of raw data Select a transformation method Transform the data Normally Distributed? Statistically Test Transformed Data HOW TO BECOME NORMAL Express the result in the terms of the transformation
50. 50. BEST PRACTICE: PLACE YOUR BETS BEFORE YOU START
51. 51. INFERENTIAL STATISTICS Tests of hypotheses •Associations •Expectations Accounts for uncertainty •Random error •Confidence interval
52. 52. Your Hypothesis (H1) Null Hypothesis (H0) HYPOTHESIS TESTING
53. 53. EXAMPLE HYPOTHESIS >=75%* <75%* *…of journal articles cited by UNT PACS faculty in journal articles published between 2008-2011. UNT Libraries provides access to…
54. 54. p Sample Size Central Tendency SpreadDistribution Significance Level HYPOTHESIS TESTING
55. 55. TESTING HYPOTHESES
56. 56. BEST PRACTICE: CHOOSE THE BEST METHOD FOR YOUR QUESTION AND DATA
57. 57. Assumptions Limitations Appropriate data type What the test tests KNOW THE TESTS
58. 58. Variable Type What is being compared Independence of units Underlying variance in the population Distribution Sample size Number of comparison groups FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD
59. 59. USE A FLOW CHART
60. 60. BEST PRACTICE: GOING BEYOND THE P- VALUE
61. 61. AND THE P-VALUE SAYS… Much about the distributions More about the H0 than H1 Little about size of differences
62. 62. MORE USEFUL STATISTICS Effect Sizes •Tell the real story Confidence Intervals •State your certainty
63. 63. Correlations •Cohen’s guidelines for Pearson’s r Differences from the mean •Standardized •weighted against the standard deviation •Cohen’s d 𝑑 = 𝑥1 − 𝑥2 𝑠 EFFECT SIZES OF QUANTITATIVE DATA Effect Size r> Small .10 Medium .30 Large .50
64. 64. Based on Contingency table • Odds of event A divided by odds of event B • Case-control studies Odds ratio • Uses probabilities rather than odds • Experiments, RCTsRelative risk EFFECT SIZES OF QUALITATIVE DATA Test A/B Yes No Total Yes 10 15 25 No 50 25 75 Totals 60 40 100
65. 65. Point estimates Intervals Based on Expressed as: •Single value •Mean •Degree of uncertainty •Range of certainty around the point estimate •Point estimate (e.g. mean) •Confidence level (usually .95) •Standard deviation •The mean score of the students who had the IL training was 83.5 with a 95% CI of 78.3 and 89.4. CONFIDENCE INTERVALS
66. 66. Noise Signal STATISTICAL ANALYSIS
67. 67. Know what you know and what you don’t know Have a comparison group Use validated measures Have a Data Entry Plan Get to know your data If it doesn’t fit, change it Place your bets before you collect the data Use the best methods of analysis for your question & your data Go beyond the p-value BEST PRACTICES
68. 68. RESOURCES Rice Virtual Lab in Statistics Excel Tutorials for Statistical Analysis Khan Academy - videos Basic Research Methods for Librarians Descriptive Statistical Techniques for Librarians