Statistics for Librarians, Session 2: Descriptive statistics

1,141 views

Published on

The second in a series of four seminars presented to University of North Texas librarians. This presentation focuses on organizing and presenting basic descriptive statistics, including measures of central tendency and variation.

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,141
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
40
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Statistics for Librarians, Session 2: Descriptive statistics

  1. 1. E X P L O R A T O R Y D A T A A N A L Y S I S DESCRIPTIVE STATISTICS
  2. 2. REVIEW
  3. 3. Results Bias? Sampling Error? Invalid Measures? Random Error? Other Factors? PURPOSE OF STATISTICS
  4. 4. VARIABLES Independent Subjects Factors Effects of… Dependent Objects Outcomes Effects on…
  5. 5. SCALES OF DATA (NOIR) Nominal •Counts by category •Binary (Yes/No) •No meaning between the categories (Blue is not better than Red) Ordinal •Ranks •Scales •Space between ranks is subjective Interval •Integers •Zero is just another value – doesn’t mean “absence of” •Space between values is equal and objective, but discrete Ratio •Interval data with a baseline •Zero (0) means “absence of” •Space between is continuous •Includes simple counts
  6. 6. ANOTHER WAY • Counts by Categories • Ranks • Scales Qualitative • Measurements • Composite scores • Simple Counts Quantitative
  7. 7. EXAMPLE DATA SET PACS FACULTY CITATION ANALYSIS
  8. 8. RESEARCH QUESTION Does UNT Libraries provide access to the resources used by PACS faculty, based on references in their published works?
  9. 9. PACS STUDY VARIABLES •Department •Years at UNTFaculty •# published by type •Rankings of journalsPublished •# cited by type •Rankings of journals •UNT accessible Cited IV DV
  10. 10. PACS STUDY VARIABLES BY SCALE •# of publications by type •# of citations by type •# references available Qualitative •Years at UNT •Years since PhD Quantitative
  11. 11. EXPLORATORY DATA ANALYSIS GETTING TO KNOW YOUR DATA, INTIMATELY
  12. 12. DISTRIBUTIONS
  13. 13. QUALITATIVE DATA Tables •Counts •Percentages/Ratios •By row and column Excel •Pivot Tables
  14. 14. TABLES Department Num Faculty % of Faculty Anthropology 20 18% Behavior Analysis 17 15% Criminal Justice 18 16% Public Administration 19 17% Rehab, Social Work, & Addictions 18 16% Sociology 21 19% Totals 113 100% Department Article % Articles Other Anthropology 73 61% 46 Behavior Analysis 65 81% 15 Criminal Justice 54 69% 24 Public Administration 64 58% 47 Rehabilitation, Social Work, and Addictions 49 82% 11 Sociology 83 62% 50 Totals 388 67% 193 Availability # Refs % Available 586 79.62% Title not avail 134 17.66% Year not avail 23 2.72% Grand Total 743 100.00%
  15. 15. Department Article Article % Book Book % Other Total Anthropology 1152 666 2012 Behavior Analysis 1412 289 1740 Criminal Justice 1220 624 2003 Public Administration 966 561 1724 Rehabilitation, Social Work, and Addictions 852 365 1282 Sociology 2238 1558 3970 Totals Department Article Article % Book Book % Other Total Anthropology 1152 57% 666 33% 194 2012 Behavior Analysis 1412 81% 289 17% 39 1740 Criminal Justice 1220 61% 624 31% 159 2003 Public Administration 966 56% 561 33% 197 1724 Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282 Sociology 2238 56% 1558 39% 174 3970 Totals 7840 (avg) 63% 4063 30% 828 12731 ACTIVITY 1
  16. 16. GRAPHS 0% 20% 40% 60% 80% 100% % Articles by Department Anthropology Behavior Analysis Criminal Justice Public Administration Rehabilitation, Social Work, and Addictions Sociology % of Faculty
  17. 17. GRAPH & CHART RULES OF THUMB Trends Connection across the X- axis Categorical Comparisons Grouped Stacked Relative Stacked Categorical Few Categories Differences are Wide
  18. 18. ACTIVITY 2 Draw a bar graph of References by Type Department Article Article % Book Book % Other Total Anthropology 1152 57% 666 33% 194 2012 Behavior Analysis 1412 81% 289 17% 39 1740 Criminal Justice 1220 61% 624 31% 159 2003 Public Administration 966 56% 561 33% 197 1724 Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282 Sociology 2238 56% 1558 39% 174 3970 Totals 7840 (avg) 63% 4063 30% 828 12731 0 1000 2000 3000 4000 5000 Other Book Article
  19. 19. QUANTITATIVE DISTRIBUTIONS Stem & Leaf Histogram Distribution graphs
  20. 20. EXPLORATORY DATA ANALYSIS • John W. Tukey • Exploratory Data Analysis • Examining your data visually. • Stem & Leaf • Hinges • Box plots • Scatter plots, etc.
  21. 21. STEM-AND-LEAF Stem Leaf 0 1122223334445555666666677777899 1 000011122222222333346677889 2 0122234468 3 1112355888 4 12 First digit(s) Last digit
  22. 22. ACTIVITY 3 Create a stem-and-leaf table for Years at UNT. Stem Leaf 0 01112222222222222233333344445556666677788899 1 0000000011122223333356778899 2 00122234444799 3 0245
  23. 23. FROM STEM-AND-LEAF TO HISTOGRAMS
  24. 24. Stem Leaf Count 0 1122223334445555666666677777899 31 1 000011122222222333346677889 27 2 0122234468 10 3 1112355888 11 4 12 2 Range Count 0-9 31 10-19 27 20-29 10 30-39 11 40-49 2 0 10 20 30 40 0-9 10-19 20-29 30-39 40-49 Histogram of Years at UNT
  25. 25. ACTIVITY 4 Create a histogram of the Years at UNT Stem Leaf 0 01112222222222222233333344445556666677788899 1 0000000011122223333356778899 2 00122234444799 3 0245 Stem Leaf Count 0 01112222222222222233333344445556666677788899 44 1 0000000011122223333356778899 28 2 00122234444799 14 3 0245 4 0 10 20 30 40 50 0-9 10-19 20-29 30-39 Years at UNT
  26. 26. PIVOT TABLES Select Data •Highlight table •Insert->Pivot Table Select Variables •Categories (Row Labels) •Values Change Settings •Percentage of Grand Total •Average
  27. 27. DEMONSTRATION OF PIVOT TABLES IN EXCEL
  28. 28. HISTOGRAMS IN EXCEL •Options •Add-ins •Manage Add-ins Analysis Toolpak •Equal spacing •Enter the highest # for each range •Ceiling (“more”) Set ranges •Data •Data Analysis •Histogram Create Histogram •Insert Bar Chart •Highlight histogram •Select bars & Format Selection •Gap Width=0% Create Graph
  29. 29. DEMONSTRATION OF HISTOGRAM IN EXCEL
  30. 30. MEASURES OF CENTRAL TENDENCY • Average Mean • Middle Median • Most Common Mode
  31. 31. CENTRAL TENDENCY BY SCALES Quantitative Mean Median Qualitative Median --not Nominal Mode
  32. 32. ACTIVITY 5 # Available Mode # References by Type Mode Years Since PhD Mean Median Years at UNT Mean Median
  33. 33. MEAN Sum of all the values divided by the count of values 𝑋 = sample mean ∑ = “sum of…” X = values of the variable n = number of values
  34. 34. EXCEL FUNCTIONS FOR MEASURES OF CENTRAL TENDENCY •=Average(range) Mean •=Median(range) Median •=Mode(range) Mode
  35. 35. SPREAD (REVIEW) Quantitative •Range •Quartiles or Quintiles •Standard Deviation Qualitative •Distribution Tables •Bar Graphs How variable is the data?
  36. 36. RANGE & QUARTILES
  37. 37. PRESENTATION OF SPREAD • Box plots • Median • Upper & lower quintiles • Outliers • Cross-tabulations • Bar graphs
  38. 38. BOXPLOT IN EXCEL Set parameters •Median •Quartile 1 •Minimum •Maximum •Quartile 3 Use Excel functions •Median(range) •Quartile.inc(range,1) •Min(range) •Max(range) •Quartile.inc(range,3) Insert Chart •Highlight both columns •Select a bar chart •Switch the columns & rows •Modify the formats of each element •YouTube tutorial
  39. 39. STANDARD DEVIATION •Measure of dispersion of data •Square root of the average variation from the mean
  40. 40. STANDARD DEVIATION WORKED OUT Years since PhD (𝑿) Mean ( 𝑿) Difference from Mean 𝑿 − 𝑿 Difference from Mean Squared 𝑿 − 𝑿 𝟐 1 14.86 -13.86 192.216 1 14.86 -13.86 192.216 2 14.86 -12.86 165.4876 14 14.86 -0.86 0.746837 16 14.86 1.14 1.290047 41 14.86 26.14 683.0802 42 14.86 27.14 736.3518 n=81 14.86 0.00 9931.506
  41. 41. WORK IT OUT 𝑠 = 𝟗𝟗𝟑𝟏. 𝟓𝟎𝟔 𝟖𝟏 − 1 𝑠 = 124.1438 𝑠 = 9931.506 80 𝑠 = 11.14198
  42. 42. SPREAD IN EXCEL • =Min(range) • =Max(range) Range • =Percentiles.inc(range, %) • =Quartile.inc(range, {1,2,3,4}) Quantiles • =STDEV.S(range) Standard Deviation
  43. 43. WHAT DOES THE STANDARD DEVIATION TELL YOU? Greater variation, less certainty Lower variation, more certainty
  44. 44. FROM HISTOGRAMS TO FREQUENCY DISTRIBUTIONS
  45. 45. NORMAL DISTRIBUTIONS
  46. 46. NORMAL DISTRIBUTION
  47. 47. SKEWED DISTRIBUTIONS
  48. 48. BIVARIATE ANALYSIS
  49. 49. SCATTER PLOT Relationship of two variables Quantitative Only
  50. 50. CORRELATIONS Direct • As x increases, y increases Indirect • As x increases, y decreases No Correlation
  51. 51. DEMONSTRATION OF SCATTER PLOT IN EXCEL •Highlight both columns Select Data •Scatter •Layout 9 Insert graph •X-axis label •Y-axis label Change Labels
  52. 52. CROSS-TABULATIONS Qualitative Two Variables Fewer Categories Row Percentage Column Percentage Pivot Tables in Excel
  53. 53. CONTINGENCY TABLE Test A/B Yes No Total Yes 10 15 25 No 50 25 75 Totals 60 40 100 Simple Cross-tab Two Binomial Variables •Odds Ratios & Risk Ratios Powerful Statistics
  54. 54. IMPORTANCE OF DESCRIPTIVE STATISTICS Describes Population Sample Results Compares Sample to Population Sub-groups Correlations Summarizes Central Tendency Spread
  55. 55. PROGRESSION FROM DESCRIPTIVE TO INFERENTIAL STATISTICS Central Tendency Spread Distributions Probability Inferential Statistics
  56. 56. RESOURCES Rice Virtual Lab in Statistics Excel Tutorials for Statistical Analysis Khan Academy - videos Basic Research Methods for Librarians – ebook Descriptive Statistical Techniques for Librarians - ebook

×