0
Upcoming SlideShare
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Standard text messaging rates apply

# Statistics for Librarians, Session 2: Descriptive statistics

298

Published on

The second in a series of four seminars presented to University of North Texas librarians. This presentation focuses on organizing and presenting basic descriptive statistics, including measures of …

The second in a series of four seminars presented to University of North Texas librarians. This presentation focuses on organizing and presenting basic descriptive statistics, including measures of central tendency and variation.

1 Like
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total Views
298
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
27
0
Likes
1
Embeds 0
No embeds

No notes for slide

### Transcript

• 1. E X P L O R A T O R Y D A T A A N A L Y S I S DESCRIPTIVE STATISTICS
• 2. REVIEW
• 3. Results Bias? Sampling Error? Invalid Measures? Random Error? Other Factors? PURPOSE OF STATISTICS
• 4. VARIABLES Independent Subjects Factors Effects of… Dependent Objects Outcomes Effects on…
• 5. SCALES OF DATA (NOIR) Nominal •Counts by category •Binary (Yes/No) •No meaning between the categories (Blue is not better than Red) Ordinal •Ranks •Scales •Space between ranks is subjective Interval •Integers •Zero is just another value – doesn’t mean “absence of” •Space between values is equal and objective, but discrete Ratio •Interval data with a baseline •Zero (0) means “absence of” •Space between is continuous •Includes simple counts
• 6. ANOTHER WAY • Counts by Categories • Ranks • Scales Qualitative • Measurements • Composite scores • Simple Counts Quantitative
• 7. EXAMPLE DATA SET PACS FACULTY CITATION ANALYSIS
• 8. RESEARCH QUESTION Does UNT Libraries provide access to the resources used by PACS faculty, based on references in their published works?
• 9. PACS STUDY VARIABLES •Department •Years at UNTFaculty •# published by type •Rankings of journalsPublished •# cited by type •Rankings of journals •UNT accessible Cited IV DV
• 10. PACS STUDY VARIABLES BY SCALE •# of publications by type •# of citations by type •# references available Qualitative •Years at UNT •Years since PhD Quantitative
• 11. EXPLORATORY DATA ANALYSIS GETTING TO KNOW YOUR DATA, INTIMATELY
• 12. DISTRIBUTIONS
• 13. QUALITATIVE DATA Tables •Counts •Percentages/Ratios •By row and column Excel •Pivot Tables
• 14. TABLES Department Num Faculty % of Faculty Anthropology 20 18% Behavior Analysis 17 15% Criminal Justice 18 16% Public Administration 19 17% Rehab, Social Work, & Addictions 18 16% Sociology 21 19% Totals 113 100% Department Article % Articles Other Anthropology 73 61% 46 Behavior Analysis 65 81% 15 Criminal Justice 54 69% 24 Public Administration 64 58% 47 Rehabilitation, Social Work, and Addictions 49 82% 11 Sociology 83 62% 50 Totals 388 67% 193 Availability # Refs % Available 586 79.62% Title not avail 134 17.66% Year not avail 23 2.72% Grand Total 743 100.00%
• 15. Department Article Article % Book Book % Other Total Anthropology 1152 666 2012 Behavior Analysis 1412 289 1740 Criminal Justice 1220 624 2003 Public Administration 966 561 1724 Rehabilitation, Social Work, and Addictions 852 365 1282 Sociology 2238 1558 3970 Totals Department Article Article % Book Book % Other Total Anthropology 1152 57% 666 33% 194 2012 Behavior Analysis 1412 81% 289 17% 39 1740 Criminal Justice 1220 61% 624 31% 159 2003 Public Administration 966 56% 561 33% 197 1724 Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282 Sociology 2238 56% 1558 39% 174 3970 Totals 7840 (avg) 63% 4063 30% 828 12731 ACTIVITY 1
• 16. GRAPHS 0% 20% 40% 60% 80% 100% % Articles by Department Anthropology Behavior Analysis Criminal Justice Public Administration Rehabilitation, Social Work, and Addictions Sociology % of Faculty
• 17. GRAPH & CHART RULES OF THUMB Trends Connection across the X- axis Categorical Comparisons Grouped Stacked Relative Stacked Categorical Few Categories Differences are Wide
• 18. ACTIVITY 2 Draw a bar graph of References by Type Department Article Article % Book Book % Other Total Anthropology 1152 57% 666 33% 194 2012 Behavior Analysis 1412 81% 289 17% 39 1740 Criminal Justice 1220 61% 624 31% 159 2003 Public Administration 966 56% 561 33% 197 1724 Rehabilitation, Social Work, and Addictions 852 66% 365 28% 65 1282 Sociology 2238 56% 1558 39% 174 3970 Totals 7840 (avg) 63% 4063 30% 828 12731 0 1000 2000 3000 4000 5000 Other Book Article
• 19. QUANTITATIVE DISTRIBUTIONS Stem & Leaf Histogram Distribution graphs
• 20. EXPLORATORY DATA ANALYSIS • John W. Tukey • Exploratory Data Analysis • Examining your data visually. • Stem & Leaf • Hinges • Box plots • Scatter plots, etc.
• 21. STEM-AND-LEAF Stem Leaf 0 1122223334445555666666677777899 1 000011122222222333346677889 2 0122234468 3 1112355888 4 12 First digit(s) Last digit
• 22. ACTIVITY 3 Create a stem-and-leaf table for Years at UNT. Stem Leaf 0 01112222222222222233333344445556666677788899 1 0000000011122223333356778899 2 00122234444799 3 0245
• 23. FROM STEM-AND-LEAF TO HISTOGRAMS
• 24. Stem Leaf Count 0 1122223334445555666666677777899 31 1 000011122222222333346677889 27 2 0122234468 10 3 1112355888 11 4 12 2 Range Count 0-9 31 10-19 27 20-29 10 30-39 11 40-49 2 0 10 20 30 40 0-9 10-19 20-29 30-39 40-49 Histogram of Years at UNT
• 25. ACTIVITY 4 Create a histogram of the Years at UNT Stem Leaf 0 01112222222222222233333344445556666677788899 1 0000000011122223333356778899 2 00122234444799 3 0245 Stem Leaf Count 0 01112222222222222233333344445556666677788899 44 1 0000000011122223333356778899 28 2 00122234444799 14 3 0245 4 0 10 20 30 40 50 0-9 10-19 20-29 30-39 Years at UNT
• 26. PIVOT TABLES Select Data •Highlight table •Insert->Pivot Table Select Variables •Categories (Row Labels) •Values Change Settings •Percentage of Grand Total •Average
• 27. DEMONSTRATION OF PIVOT TABLES IN EXCEL
• 28. HISTOGRAMS IN EXCEL •Options •Add-ins •Manage Add-ins Analysis Toolpak •Equal spacing •Enter the highest # for each range •Ceiling (“more”) Set ranges •Data •Data Analysis •Histogram Create Histogram •Insert Bar Chart •Highlight histogram •Select bars & Format Selection •Gap Width=0% Create Graph
• 29. DEMONSTRATION OF HISTOGRAM IN EXCEL
• 30. MEASURES OF CENTRAL TENDENCY • Average Mean • Middle Median • Most Common Mode
• 31. CENTRAL TENDENCY BY SCALES Quantitative Mean Median Qualitative Median --not Nominal Mode
• 32. ACTIVITY 5 # Available Mode # References by Type Mode Years Since PhD Mean Median Years at UNT Mean Median
• 33. MEAN Sum of all the values divided by the count of values 𝑋 = sample mean ∑ = “sum of…” X = values of the variable n = number of values
• 34. EXCEL FUNCTIONS FOR MEASURES OF CENTRAL TENDENCY •=Average(range) Mean •=Median(range) Median •=Mode(range) Mode
• 35. SPREAD (REVIEW) Quantitative •Range •Quartiles or Quintiles •Standard Deviation Qualitative •Distribution Tables •Bar Graphs How variable is the data?
• 36. RANGE & QUARTILES
• 37. PRESENTATION OF SPREAD • Box plots • Median • Upper & lower quintiles • Outliers • Cross-tabulations • Bar graphs
• 38. BOXPLOT IN EXCEL Set parameters •Median •Quartile 1 •Minimum •Maximum •Quartile 3 Use Excel functions •Median(range) •Quartile.inc(range,1) •Min(range) •Max(range) •Quartile.inc(range,3) Insert Chart •Highlight both columns •Select a bar chart •Switch the columns & rows •Modify the formats of each element •YouTube tutorial
• 39. STANDARD DEVIATION •Measure of dispersion of data •Square root of the average variation from the mean
• 40. STANDARD DEVIATION WORKED OUT Years since PhD (𝑿) Mean ( 𝑿) Difference from Mean 𝑿 − 𝑿 Difference from Mean Squared 𝑿 − 𝑿 𝟐 1 14.86 -13.86 192.216 1 14.86 -13.86 192.216 2 14.86 -12.86 165.4876 14 14.86 -0.86 0.746837 16 14.86 1.14 1.290047 41 14.86 26.14 683.0802 42 14.86 27.14 736.3518 n=81 14.86 0.00 9931.506
• 41. WORK IT OUT 𝑠 = 𝟗𝟗𝟑𝟏. 𝟓𝟎𝟔 𝟖𝟏 − 1 𝑠 = 124.1438 𝑠 = 9931.506 80 𝑠 = 11.14198
• 42. SPREAD IN EXCEL • =Min(range) • =Max(range) Range • =Percentiles.inc(range, %) • =Quartile.inc(range, {1,2,3,4}) Quantiles • =STDEV.S(range) Standard Deviation
• 43. WHAT DOES THE STANDARD DEVIATION TELL YOU? Greater variation, less certainty Lower variation, more certainty
• 44. FROM HISTOGRAMS TO FREQUENCY DISTRIBUTIONS
• 45. NORMAL DISTRIBUTIONS
• 46. NORMAL DISTRIBUTION
• 47. SKEWED DISTRIBUTIONS
• 48. BIVARIATE ANALYSIS
• 49. SCATTER PLOT Relationship of two variables Quantitative Only
• 50. CORRELATIONS Direct • As x increases, y increases Indirect • As x increases, y decreases No Correlation
• 51. DEMONSTRATION OF SCATTER PLOT IN EXCEL •Highlight both columns Select Data •Scatter •Layout 9 Insert graph •X-axis label •Y-axis label Change Labels
• 52. CROSS-TABULATIONS Qualitative Two Variables Fewer Categories Row Percentage Column Percentage Pivot Tables in Excel
• 53. CONTINGENCY TABLE Test A/B Yes No Total Yes 10 15 25 No 50 25 75 Totals 60 40 100 Simple Cross-tab Two Binomial Variables •Odds Ratios & Risk Ratios Powerful Statistics
• 54. IMPORTANCE OF DESCRIPTIVE STATISTICS Describes Population Sample Results Compares Sample to Population Sub-groups Correlations Summarizes Central Tendency Spread
• 55. PROGRESSION FROM DESCRIPTIVE TO INFERENTIAL STATISTICS Central Tendency Spread Distributions Probability Inferential Statistics
• 56. RESOURCES Rice Virtual Lab in Statistics Excel Tutorials for Statistical Analysis Khan Academy - videos Basic Research Methods for Librarians – ebook Descriptive Statistical Techniques for Librarians - ebook