Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Data Wrangling II:
Programming on the Whiteboard
February 26, 2016
Paige Morgan
Digital Humanities Librarian
Starting Activity:
Open Syllabus Project
http://opensyllabusproject.org/
Open Syllabus Project
• Use the syllabus explorer to examine
the data.
• Keep track of each step you take as
you drill dow...
Last week...
• The work of creating usable data
• Forms that this data might take:
• markup language
• Spreadsheets (MySQL...
This week:
• Caveat Curator (challenges of working
with data)
• Programming on the Whiteboard, i.e.,
conceptualizing the s...
Goals/Takeaways
• A better understanding of the
workflow for dealing with data
• How to start small and scale up
effective...
Why this focus on data?
• Understanding your data, and your
intended actions, is a key skill for
developing any digital pr...
Image: Josh Lee, @wtrsld, via Twitter, January
2014.
What if your data is
crowdsourced?
You can require a particular
format for submissions
You can even put
programmatic limits on the
formats available for
submission
But in the end, you’re
probably still going to need
to scrub and/or format.
This is true even for data
from supposedly
reputable sources, like
government or media
organizations.
Example: Doctor Who
Villains dataset
http://tinyurl.com/doctorwhovillains
Data Dictionaries
If you are thinking about
your data, and the tasks that
you need to accomplish,
then it’s easier to determine
what sort of...
Pseudocode
• Used by programmers to break down a
complex task into single steps
• Easily adaptable for use by non-
program...
Pseudocode Example (Visible Prices)
• Computer has a file that contains prices from different
texts.
• Computer must know ...
It is likely that your data will
have a longer life span than
any specific project you
create.
In many instances, it may be
more useful to focus on the
data curation as much as a
single project.
Getting Data
• Figshare
• Datahub.io
• Project websites
• APIs
Cleaning Data
• OpenRefine http://openrefine.org/
Key DH Values
• Adaptive
• Sustainable/resource-aware
• Collaborative
• Social
Key skills
• Thinking flexibly about your data (and
potential project)
• Are there portions of your dataset that
could be ...
And now, it’s your turn...
Group Activity
• What questions can you ask and
answer with this data as it is?
• What data would you need in order to
ask...
Next steps
• What’s the smallest version of your
dataset possible? (useful for testing out
tools)
• Possible tools to exam...
Thank you!
• Questions? Ideas? Book a consult at
http://paigecmorgan.youcanbook.me
Upcoming SlideShare
Loading in …5
×

Feb.2016 Demystifying Digital Humanities - Workshop 3

1,909 views

Published on

Slides from Demystifying Digital Humanities Workshop 3: Data Wrangling: Programming on the Whiteboard -- taught at the University of Miami Libraries in February, 2016

Published in: Education
  • Be the first to comment

  • Be the first to like this

Feb.2016 Demystifying Digital Humanities - Workshop 3

  1. 1. Data Wrangling II: Programming on the Whiteboard February 26, 2016 Paige Morgan Digital Humanities Librarian
  2. 2. Starting Activity: Open Syllabus Project http://opensyllabusproject.org/
  3. 3. Open Syllabus Project • Use the syllabus explorer to examine the data. • Keep track of each step you take as you drill down. • Goal: develop a research question based on your explorations. • What other data would you need to answer this research question?
  4. 4. Last week... • The work of creating usable data • Forms that this data might take: • markup language • Spreadsheets (MySQL & relational DBs) • Non-relational databases (RDF/Linked Open Data
  5. 5. This week: • Caveat Curator (challenges of working with data) • Programming on the Whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals
  6. 6. Goals/Takeaways • A better understanding of the workflow for dealing with data • How to start small and scale up effectively • Greater ability to talk about what you’re trying to do
  7. 7. Why this focus on data? • Understanding your data, and your intended actions, is a key skill for developing any digital project (big or small). • You may have one big project – but your data may support several small/intermediary projects.
  8. 8. Image: Josh Lee, @wtrsld, via Twitter, January 2014.
  9. 9. What if your data is crowdsourced?
  10. 10. You can require a particular format for submissions
  11. 11. You can even put programmatic limits on the formats available for submission
  12. 12. But in the end, you’re probably still going to need to scrub and/or format.
  13. 13. This is true even for data from supposedly reputable sources, like government or media organizations.
  14. 14. Example: Doctor Who Villains dataset http://tinyurl.com/doctorwhovillains
  15. 15. Data Dictionaries
  16. 16. If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort of language or platform your project needs.
  17. 17. Pseudocode • Used by programmers to break down a complex task into single steps • Easily adaptable for use by non- programmers
  18. 18. Pseudocode Example (Visible Prices) • Computer has a file that contains prices from different texts. • Computer must know that each price amount is connected with an object, and with a bibliographical record. • Users can input a price amount, and computer will retrieve all objects that match the price, and display them to the user, along with bibliographical information. • (More complex): Computer is able to retrieve prices linked with certain categories (clothing, food, etc.)
  19. 19. It is likely that your data will have a longer life span than any specific project you create.
  20. 20. In many instances, it may be more useful to focus on the data curation as much as a single project.
  21. 21. Getting Data • Figshare • Datahub.io • Project websites • APIs
  22. 22. Cleaning Data • OpenRefine http://openrefine.org/
  23. 23. Key DH Values • Adaptive • Sustainable/resource-aware • Collaborative • Social
  24. 24. Key skills • Thinking flexibly about your data (and potential project) • Are there portions of your dataset that could be extracted for use in a particular tool? • How can you adjust your data in order to show it to people (and be more able to talk/write/present about your research interests?)
  25. 25. And now, it’s your turn...
  26. 26. Group Activity • What questions can you ask and answer with this data as it is? • What data would you need in order to ask & answer other research questions? • What are the steps that you would need to take in order to answer those research questions?
  27. 27. Next steps • What’s the smallest version of your dataset possible? (useful for testing out tools) • Possible tools to examine (as ways of presenting your data) • Omeka (http://www.omeka.net) • Scalar (http://scalar.usc.edu) • Simile (http://www.simile-widgets.org) • Google Fusion Tables (https://support.google.com/fusiontables/answer/2571232)
  28. 28. Thank you! • Questions? Ideas? Book a consult at http://paigecmorgan.youcanbook.me

×