Beginner's Guide to Getting Public Data into the Classroom
1. BEGINNER’S GUIDE TO
GETTING PUBLIC DATA
INTO THE CLASSROOM
PRESENTED OCTOBER 17, 2015
SOCIETY FOR SCIENCE AND THE PUBLIC TEACHER CONFERENCE
WASHINGTON DC
Shawn Handran, Ph.D.
2. About me
10 years in academic research
Montana State BS, Washington Univ. in St. Louis PhD,
Harvard Medical School post-doc
7 years in biotechnology
Genomics, bioinformatics, HT screening & imaging
4 years in Non-Profit sector
Foundation/fundraising database research
4th year of teaching at FCS
AP Biology, AP Statistics, Biotechnology
3. Getting public data into the classroom
Stimulate intrinsic interest
Keep barriers to entry low
Get to a good comfort level
Gradual release
% of content
5. Stimulate intrinsic interest
Teachers often mandate the parameters
Too much control stifles creativity
Give students ownership
Ownership drives interest level and engagement
Provide guidance
Yes, some parameters are still required!
8. Keep barriers to entry low
Datasets
Ease of access
Dataset format
Data analysis
Cost
Ease of use
9. Dataset barriers to entry
Ease of access
HTML tables
Downloadable files
Copy and paste
Query database/forms
PDF
Difficulty
10. Dataset barriers to entry
Dataset format
HTML use Import HTML Table function
Text format (csv, tab) or Excel (xls, xlsx)
Query forms
Simple database files (e.g., Access)
Complex database files
Difficulty
11. Keep Barriers to Entry Low
TUTORIAL 1:
IMPORT HTML TABLE
INTO GOOGLE SHEETS
12. MLB 2014 AL team summary stats
Baseball-Reference.com
http://goo.gl/5RU0Gt
18. Two functions of data analysis
Data handling
Data visualization
Most programs do both but some not well
You’ll often use multiple programs
19. Data analysis: cost vs. ease of use
R Stata SAS
OpenOffice StatCrunch
GoogleSheets Numbers
Gapminder
JMP
Excel
Publisher
Minitab
Fathom*
Free $$$
HardEasy
Tableau Public
SPSS
Illustrator
Tableau
*discontinued
20. Spreadsheet/graphing programs
Advantages
Free or close to free (except Excel)
Good selection of canned graphs
Disadvantages
Challenging for students to learn
Requires a lot of wizard-level hacking/tweaking
Winners: Google Sheets, MS Excel
21. Statistical programs
Advantages
Handles large datasets faster and better than Excel
Designed for statistical analysis
Handles variables seamlessly
More graph options and better graph editing tools than Excel
About the same learning curve as Excel for simple functions
Disadvantages
Moderate to high cost, even with academic pricing
More sophisticated graphs or analyses require mad skills
Poor graphic export options
Winners: JMP, Minitab
23. Graphic design programs
Advantages
Perfect control over every graphic element
Final output looks stunning and is scalable
Disadvantages
Zero data handling and analysis capability
Huge learning curve
Expensive
Winner: Adobe Illustrator
Runner up: Microsoft Publisher (poor man’s Illustrator)
24. Tableau Public
Advantages
Free including 10GB online storage
Handles humongous datasets
Interactive with mouse-over information
Easy to use for simple datasets and graphs
Disadvantages
Everything you create is public
Data handling is limited and removing variables
can be tedious (but not always)
31. Get to a good comfort level
Getting started: Survey of public datasets
Getting help: Learn from data experts
Getting acquainted: Make new friends
Disclaimer: these lists are by no means exhaustive!
32. Getting started: public datasets
Data.gov (186,000+ data sets)
http://www.data.gov/
Big Machine Learning (BigML) blog post
http://blog.bigml.com/list-of-public-data-
sources-fit-for-machine-learning/
33. Getting started: public datasets
Gapminder Offline software (free)
http://www.gapminder.org/downloads/
Pre-loaded data!
Cake walk easy to use!
Dynamic and awesome looking!
34. Getting started: public datasets
HTML tables
http://www.baseball-reference.com/
http://www.billboard.com/archive/charts
http://apps.who.int/gho/data/?theme=home
36. Getting started: public datasets
Copy and Paste
https://gssdataexplorer.norc.org/ (easy)
http://espn.go.com/mlb/statistics (tedious)
37. Getting help: Learn from data experts
David McCandliss http://www.informationisbeautiful.net/
Andy Kirk http://www.visualisingdata.com/blog/
Hans Rosling http://www.gapminder.org/videos/
Edward Tufte http://www.edwardtufte.com/tufte/
38. Getting acquainted: make friends
Here at this conference
On social media networks
You’ll have better luck on LinkedIn and G+
Don’t be afraid to reach out
40. Gradual release
Model
Don’t just show it—demo it live
Encourage
Preferably in-class computer time/activities
Release and nudge
More nudginghigher quality of final product
41. Student Project: Billboard Top100
Student level: 12 (AP Statistics)
International student
Public sources:
Billboard Top 100, Radio, Digital 2014
http://www.billboard.com/archive/charts/2014
Moderate amount of nudging
Mostly for language and cultural help
42.
43. Student Project: MLB Hitting Stats
Student level: 12 (AP Statistics)
Local student
Public source:
http://espn.go.com/mlb/statistics
Low amount of nudging
Student had an excellent grasp on statistical
analysis
44. Less nudging, less complexity
0.000
0.050
0.100
0.150
0.200
0.250
0.300
0.350
0.400
Batting Average On Base Percentage
Home
Away
Away
Home
0.480.450.420.390.360.330.300.270.240.21
Data
Home vs. Away On Base Percentage
MLB 2013 Home vs. Away Statistics
Dataset size: 808
45. Tableau possibilities…
I recreated the same
dataset in ~2 hours of work
on Tableau Public and
visualized 11K data points
https://goo.gl/ffreyb
52. Getting public data into the classroom
Get the students interested in something
important to them (not to you)
Keep the barriers to entry low
GapMinder, Google Sheets, Excel, Tableau
Get yourself trained and prepared
You don’t need to be an expert!
Model it for them, then let them do it