BEST PRACTICES FOR
STATISTICS
Know what you know and what you don’t know
Have a comparison group
Use validated measures
Have a Data Entry Plan
Get to kn...
What is Statistics?
•Study of Data
•Collecting
•Organizing
•Summarizing
•Analyzing
•Presenting
•Storing &
Sharing
Why is i...
Results
Bias?
Sampling
Error?
Invalid
Measures?
Random
Error?
Other
Factors?
PURPOSE OF STATISTICS
BEST PRACTICE:
KNOW WHAT YOU ALREADY KNOW,
WHAT YOU WANT TO KNOW AND
WHAT YOU DON’T KNOW
How do users differ when
(searching, finding, selecting)
(articles, books, Web sites)?
What are the effects of ___________...
KINDS OF VARIABLES
Independent
Subjects
Factors
Effects
of…
Dependent
Objects
Outcomes
Effects
on…
Nominal
•Counts by category
•No meaning between the categories (Blue is not better than
Red)
Ordinal
•Ranks
•Scales
•Space...
•Counts by Categories
•Ranks
•Scales
Qualitative
•Measurements
•Composite scores
•Simple Counts
Quantitative
ANOTHER WAY
LIKERT-TYPE SCALE?
Arbitrary
Few Levels
Individual
Questions
Ordinal?
Symmetrical
Many Levels
Composite
Score
Interval?
BEST PRACTICE:
HAVE A COMPARISON
GROUP
WAYS OF COMPARING…
Time Periods
Other Libraries
National Surveys
Patron Types
Material Types
•Qualitative
•Comparison
Expected ranks or ratios
•Quantitative
•Correlations
Two variables
•Quantitative or Qualitative
•...
BEST PRACTICE:
USE A VALID MEASURE
Are you actually
measuring what you
are trying to measure?
VALIDITY OF MEASURES
USE A TOOL WITH ESTABLISHED VALIDITY
Approaches and Study
Skills Inventory for
Students (ASSIST)
User Engagement Scale (UE...
ESTABLISH VALIDITY OF MEASURES
•ConsistencyReliability
•Common sense
Content or
Face Validity
•Based on theory
Construct
V...
BEST PRACTICE:
HAVE A DATA PLAN
GOAL OF DATA COLLECTION IN
STATISTICS
Reliability
Bias
BIAS
Systematic (not random) deviation from the true value
(Statistics.com)
Selection Bias
Measurement
• Observer Bias
• N...
DATA INPUT
Have a data entry plan
Train the inputters
Use data validation tricks
Double-entry
BEST PRACTICE:
GET TO KNOW
YOUR DATA
Central
Tendency
SpreadError
EXPLORATORY DATA ANALYSIS
• Average
• For Quantative data
• Excel function: =Average(range)
Mean
• Middle
• For Quantitative or Rank data
• Excel fu...
SPREAD &
DISTRIBUTION
DISTRIBUTION OR SPREAD OF QUALITATIVE
DATA
Tables
•Counts
•Percentages/Ratios
•Averages of Counts
Excel
•Pivot Tables
PIVOT TABLES IN EXCEL
Select Data
•Highlight table
•Insert->Pivot Table
Select
Variables
•Categories (Row Labels)
•Values
...
DEMONSTRATION OF PIVOT TABLES FOR
SPREAD OF QUALITATIVE DATA
GRAPH & CHART RULES OF THUMB
Trends
Connection
across the X-
axis
Categorical
Comparisons
Grouped
Stacked
Relative
Stacked...
QUANTITATIVE DISTRIBUTIONS
Stem &
Leaf
Histogram
Distribution
graphs
John W. Tukey
Exploratory Data
Analysis
Examining your data
visually.
Stem & Leaf
Hinges
Box plots
Scatter plots, e...
STEM-AND-LEAF
Stem Leaf
0 01112222222222222233333344445556
666677788899
1 0000000011122223333356778899
2 00122234444799
3 ...
FROM STEM-AND-LEAF TO HISTOGRAMS
Stem Leaf Count
0 1122223334445555666666677777899 31
1 000011122222222333346677889 27
2 0122234468 10
3 1112355888 11
4 12...
HISTOGRAMS IN EXCEL
•Options
•Add-ins
•Manage Add-ins
Analysis
Toolpak
•Equal Size Ranges
•Ceiling (“more”)
Set ranges
•Da...
DEMONSTRATION OF HISTOGRAM IN EXCEL
SPREAD OF QUANTITATIVE DATA
How variable is the data?
Range
Quantiles
Standard
Deviation
RANGE &
QUARTILES
Box plots
Median
Upper & lower
quartiles
Outliers
PRESENTATION OF
SPREAD
Measure of dispersion of data
Square root of the average
variation from the mean
STANDARD DEVIATION
Greater
variation, less
certainty
Lower variation,
more certainty
WHAT DOES THE SD TELL YOU?
•Min(range)
•Max(range)
Range
•Percentiles.inc(range, %)
•Quartile.inc(range, {1,2,3,4})
Quantiles
•STDEV.S(range)
Standar...
NORMAL DISTRIBUTION
SKEWED DISTRIBUTIONS
DEMONSTRATION OF DISTRIBUTIONS
Distribution of the
Population
The “Truth”
N is the # of samples
n is the number of items
i...
Transform
ation of
data
BEST PRACTICE:
IF IT DOESN’T FIT,
CHANGE IT
WHY TRANSFORM?
0
5
10
15
20
25
30
35
40
45
50
0-9 10-19 20-29 30-39
Years at UNT
0
2
4
6
8
10
12
14
16
Log10(Years at UNT)
Y=a+bx Log(Y)=Log(a+bx)
1/Y = 1/(a+bx)
HOW TRANSFORMATION WORKS
Evaluate the
distribution of
raw data
Select a
transformation
method
Transform the
data
Normally
Distributed?
Statisticall...
BEST PRACTICE:
PLACE YOUR BETS BEFORE
YOU START
INFERENTIAL STATISTICS
Tests of hypotheses
•Associations
•Expectations
Accounts for uncertainty
•Random error
•Confidence ...
Your
Hypothesis
(H1)
Null
Hypothesis
(H0)
HYPOTHESIS TESTING
EXAMPLE HYPOTHESIS
>=75%* <75%*
*…of journal articles cited by UNT PACS faculty in journal articles
published between 2008...
p
Sample Size
Central
Tendency
SpreadDistribution
Significance
Level
HYPOTHESIS TESTING
TESTING HYPOTHESES
BEST PRACTICE:
CHOOSE THE BEST
METHOD FOR YOUR
QUESTION AND DATA
Assumptions
Limitations
Appropriate data type
What the test tests
KNOW THE TESTS
Variable Type
What is being
compared
Independence
of units
Underlying
variance in the
population
Distribution Sample size
...
USE A FLOW CHART
BEST PRACTICE:
GOING BEYOND THE P-
VALUE
AND THE P-VALUE SAYS…
Much about the
distributions
More about the
H0 than H1
Little about size
of differences
MORE USEFUL STATISTICS
Effect Sizes
•Tell the real story
Confidence Intervals
•State your certainty
Correlations
•Cohen’s guidelines
for Pearson’s r
Differences from the
mean
•Standardized
•weighted against
the standard
de...
Based on
Contingency
table
• Odds of event A divided by odds of event
B
• Case-control studies
Odds ratio
• Uses probabili...
Point estimates
Intervals
Based on
Expressed as:
•Single value
•Mean
•Degree of uncertainty
•Range of certainty around the...
Noise
Signal
STATISTICAL ANALYSIS
Know what you know and what you don’t know
Have a comparison group
Use validated measures
Have a Data Entry Plan
Get to kn...
RESOURCES
Rice Virtual Lab in
Statistics
Excel Tutorials for
Statistical Analysis
Khan Academy -
videos
Basic Research
Met...
Upcoming SlideShare
Loading in …5
×

Statistics for Librarians, Session 4: Statistics best practices

810 views

Published on

Final session in a series of four seminars presented to University of North Texas librarians. This presentation brings together some best practices for gathering, organizing, analyzing, and presenting statistics and data.

Published in: Data & Analytics, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
810
On SlideShare
0
From Embeds
0
Number of Embeds
237
Actions
Shares
0
Downloads
48
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Statistics for Librarians, Session 4: Statistics best practices

  1. 1. BEST PRACTICES FOR STATISTICS
  2. 2. Know what you know and what you don’t know Have a comparison group Use validated measures Have a Data Entry Plan Get to know your data If it doesn’t fit, change it Place your bets before you collect the data Use the best methods of analysis for your question & your data Go beyond the p-value BEST PRACTICES
  3. 3. What is Statistics? •Study of Data •Collecting •Organizing •Summarizing •Analyzing •Presenting •Storing & Sharing Why is it Important? •Make sense of the data •Explain what happens and (possibly) why •Make sound decisions •To know how close we are to the truth.
  4. 4. Results Bias? Sampling Error? Invalid Measures? Random Error? Other Factors? PURPOSE OF STATISTICS
  5. 5. BEST PRACTICE: KNOW WHAT YOU ALREADY KNOW, WHAT YOU WANT TO KNOW AND WHAT YOU DON’T KNOW
  6. 6. How do users differ when (searching, finding, selecting) (articles, books, Web sites)? What are the effects of ___________On ____________? Whichis better at improving _________? How are people (finding, selecting, using) _______? What are factors associated with ___________? STARTING WITH YOUR RESEARCH QUESTION
  7. 7. KINDS OF VARIABLES Independent Subjects Factors Effects of… Dependent Objects Outcomes Effects on…
  8. 8. Nominal •Counts by category •No meaning between the categories (Blue is not better than Red) Ordinal •Ranks •Scales •Space between ranks is subjective Interval •Integers •No baseline •Space between values is equal and objective, but discrete Ratio •Interval data with a baseline •Space between is continuous LEVELS OF MEASUREMENT (NOIR)
  9. 9. •Counts by Categories •Ranks •Scales Qualitative •Measurements •Composite scores •Simple Counts Quantitative ANOTHER WAY
  10. 10. LIKERT-TYPE SCALE? Arbitrary Few Levels Individual Questions Ordinal? Symmetrical Many Levels Composite Score Interval?
  11. 11. BEST PRACTICE: HAVE A COMPARISON GROUP
  12. 12. WAYS OF COMPARING… Time Periods Other Libraries National Surveys Patron Types Material Types
  13. 13. •Qualitative •Comparison Expected ranks or ratios •Quantitative •Correlations Two variables •Quantitative or Qualitative •Paired or Not Paired Samples or Groups KINDS OF COMPARISON
  14. 14. BEST PRACTICE: USE A VALID MEASURE
  15. 15. Are you actually measuring what you are trying to measure? VALIDITY OF MEASURES
  16. 16. USE A TOOL WITH ESTABLISHED VALIDITY Approaches and Study Skills Inventory for Students (ASSIST) User Engagement Scale (UES)
  17. 17. ESTABLISH VALIDITY OF MEASURES •ConsistencyReliability •Common sense Content or Face Validity •Based on theory Construct Validity •Comparison with other valid measures Criterion Validity
  18. 18. BEST PRACTICE: HAVE A DATA PLAN
  19. 19. GOAL OF DATA COLLECTION IN STATISTICS Reliability Bias
  20. 20. BIAS Systematic (not random) deviation from the true value (Statistics.com) Selection Bias Measurement • Observer Bias • Non-response Bias Analysis Bias
  21. 21. DATA INPUT Have a data entry plan Train the inputters Use data validation tricks Double-entry
  22. 22. BEST PRACTICE: GET TO KNOW YOUR DATA
  23. 23. Central Tendency SpreadError EXPLORATORY DATA ANALYSIS
  24. 24. • Average • For Quantative data • Excel function: =Average(range) Mean • Middle • For Quantitative or Rank data • Excel function: =Median(range) Median • Most common • Primarily for Qualitative data • Excel function: =Mode(range) Mode MEASURES OF CENTRAL TENDENCY
  25. 25. SPREAD & DISTRIBUTION
  26. 26. DISTRIBUTION OR SPREAD OF QUALITATIVE DATA Tables •Counts •Percentages/Ratios •Averages of Counts Excel •Pivot Tables
  27. 27. PIVOT TABLES IN EXCEL Select Data •Highlight table •Insert->Pivot Table Select Variables •Categories (Row Labels) •Values Change Settings •Percentage of Grand Total •Average
  28. 28. DEMONSTRATION OF PIVOT TABLES FOR SPREAD OF QUALITATIVE DATA
  29. 29. GRAPH & CHART RULES OF THUMB Trends Connection across the X- axis Categorical Comparisons Grouped Stacked Relative Stacked Categorical Few Categories Differences are Wide
  30. 30. QUANTITATIVE DISTRIBUTIONS Stem & Leaf Histogram Distribution graphs
  31. 31. John W. Tukey Exploratory Data Analysis Examining your data visually. Stem & Leaf Hinges Box plots Scatter plots, etc. EXPLORATORY DATA ANALYSIS
  32. 32. STEM-AND-LEAF Stem Leaf 0 01112222222222222233333344445556 666677788899 1 0000000011122223333356778899 2 00122234444799 3 0245 First digit(s) Last digit Years at UNT 0 5 13 1 6 13 1 6 13 1 6 13 2 6 15 2 6 16 2 7 17 2 7 17 2 7 18 2 8 18 2 8 19 3 11 29 4 11 29 4 12 30 4 12 32 4 12 34 5 12 35 5 13
  33. 33. FROM STEM-AND-LEAF TO HISTOGRAMS
  34. 34. Stem Leaf Count 0 1122223334445555666666677777899 31 1 000011122222222333346677889 27 2 0122234468 10 3 1112355888 11 4 12 2 Range Count 0-9 31 10-19 27 20-29 10 30-39 11 40-49 2 0 10 20 30 40 0-9 10-19 20-29 30-39 40-49 Histogram of Years at UNT
  35. 35. HISTOGRAMS IN EXCEL •Options •Add-ins •Manage Add-ins Analysis Toolpak •Equal Size Ranges •Ceiling (“more”) Set ranges •Data •Data Analysis •Histogram Create Histogram •Insert Bar Chart •Highlight histogram •Select bars & Format Selection •Gap Width=0% Create Graph For Histogram 9 19 29 39 49
  36. 36. DEMONSTRATION OF HISTOGRAM IN EXCEL
  37. 37. SPREAD OF QUANTITATIVE DATA How variable is the data? Range Quantiles Standard Deviation
  38. 38. RANGE & QUARTILES
  39. 39. Box plots Median Upper & lower quartiles Outliers PRESENTATION OF SPREAD
  40. 40. Measure of dispersion of data Square root of the average variation from the mean STANDARD DEVIATION
  41. 41. Greater variation, less certainty Lower variation, more certainty WHAT DOES THE SD TELL YOU?
  42. 42. •Min(range) •Max(range) Range •Percentiles.inc(range, %) •Quartile.inc(range, {1,2,3,4}) Quantiles •STDEV.S(range) Standard Deviation SPREAD IN EXCEL
  43. 43. NORMAL DISTRIBUTION
  44. 44. SKEWED DISTRIBUTIONS
  45. 45. DEMONSTRATION OF DISTRIBUTIONS Distribution of the Population The “Truth” N is the # of samples n is the number of items in each sample Watch the cumulative mean & medians slowly merge to the population
  46. 46. Transform ation of data BEST PRACTICE: IF IT DOESN’T FIT, CHANGE IT
  47. 47. WHY TRANSFORM? 0 5 10 15 20 25 30 35 40 45 50 0-9 10-19 20-29 30-39 Years at UNT 0 2 4 6 8 10 12 14 16 Log10(Years at UNT)
  48. 48. Y=a+bx Log(Y)=Log(a+bx) 1/Y = 1/(a+bx) HOW TRANSFORMATION WORKS
  49. 49. Evaluate the distribution of raw data Select a transformation method Transform the data Normally Distributed? Statistically Test Transformed Data HOW TO BECOME NORMAL Express the result in the terms of the transformation
  50. 50. BEST PRACTICE: PLACE YOUR BETS BEFORE YOU START
  51. 51. INFERENTIAL STATISTICS Tests of hypotheses •Associations •Expectations Accounts for uncertainty •Random error •Confidence interval
  52. 52. Your Hypothesis (H1) Null Hypothesis (H0) HYPOTHESIS TESTING
  53. 53. EXAMPLE HYPOTHESIS >=75%* <75%* *…of journal articles cited by UNT PACS faculty in journal articles published between 2008-2011. UNT Libraries provides access to…
  54. 54. p Sample Size Central Tendency SpreadDistribution Significance Level HYPOTHESIS TESTING
  55. 55. TESTING HYPOTHESES
  56. 56. BEST PRACTICE: CHOOSE THE BEST METHOD FOR YOUR QUESTION AND DATA
  57. 57. Assumptions Limitations Appropriate data type What the test tests KNOW THE TESTS
  58. 58. Variable Type What is being compared Independence of units Underlying variance in the population Distribution Sample size Number of comparison groups FACTORS ASSOCIATED WITH CHOICE OF STATISTICAL METHOD
  59. 59. USE A FLOW CHART
  60. 60. BEST PRACTICE: GOING BEYOND THE P- VALUE
  61. 61. AND THE P-VALUE SAYS… Much about the distributions More about the H0 than H1 Little about size of differences
  62. 62. MORE USEFUL STATISTICS Effect Sizes •Tell the real story Confidence Intervals •State your certainty
  63. 63. Correlations •Cohen’s guidelines for Pearson’s r Differences from the mean •Standardized •weighted against the standard deviation •Cohen’s d 𝑑 = 𝑥1 − 𝑥2 𝑠 EFFECT SIZES OF QUANTITATIVE DATA Effect Size r> Small .10 Medium .30 Large .50
  64. 64. Based on Contingency table • Odds of event A divided by odds of event B • Case-control studies Odds ratio • Uses probabilities rather than odds • Experiments, RCTsRelative risk EFFECT SIZES OF QUALITATIVE DATA Test A/B Yes No Total Yes 10 15 25 No 50 25 75 Totals 60 40 100
  65. 65. Point estimates Intervals Based on Expressed as: •Single value •Mean •Degree of uncertainty •Range of certainty around the point estimate •Point estimate (e.g. mean) •Confidence level (usually .95) •Standard deviation •The mean score of the students who had the IL training was 83.5 with a 95% CI of 78.3 and 89.4. CONFIDENCE INTERVALS
  66. 66. Noise Signal STATISTICAL ANALYSIS
  67. 67. Know what you know and what you don’t know Have a comparison group Use validated measures Have a Data Entry Plan Get to know your data If it doesn’t fit, change it Place your bets before you collect the data Use the best methods of analysis for your question & your data Go beyond the p-value BEST PRACTICES
  68. 68. RESOURCES Rice Virtual Lab in Statistics Excel Tutorials for Statistical Analysis Khan Academy - videos Basic Research Methods for Librarians Descriptive Statistical Techniques for Librarians

×