Slides from Demystifying Digital Humanities Workshop 3: Data Wrangling: Programming on the Whiteboard -- taught at the University of Miami Libraries in February, 2016
3. Open Syllabus Project
• Use the syllabus explorer to examine
the data.
• Keep track of each step you take as
you drill down.
• Goal: develop a research question
based on your explorations.
• What other data would you need to
answer this research question?
4. Last week...
• The work of creating usable data
• Forms that this data might take:
• markup language
• Spreadsheets (MySQL & relational
DBs)
• Non-relational databases (RDF/Linked
Open Data
5. This week:
• Caveat Curator (challenges of working
with data)
• Programming on the Whiteboard, i.e.,
conceptualizing the specific steps that
you need to take to accomplish your
goals
6. Goals/Takeaways
• A better understanding of the
workflow for dealing with data
• How to start small and scale up
effectively
• Greater ability to talk about what
you’re trying to do
7. Why this focus on data?
• Understanding your data, and your
intended actions, is a key skill for
developing any digital project (big or
small).
• You may have one big project – but
your data may support several
small/intermediary projects.
16. If you are thinking about
your data, and the tasks that
you need to accomplish,
then it’s easier to determine
what sort of language or
platform your project needs.
17. Pseudocode
• Used by programmers to break down a
complex task into single steps
• Easily adaptable for use by non-
programmers
18. Pseudocode Example (Visible Prices)
• Computer has a file that contains prices from different
texts.
• Computer must know that each price amount is
connected with an object, and with a bibliographical
record.
• Users can input a price amount, and computer will
retrieve all objects that match the price, and display
them to the user, along with bibliographical information.
• (More complex): Computer is able to retrieve prices
linked with certain categories (clothing, food, etc.)
19. It is likely that your data will
have a longer life span than
any specific project you
create.
20. In many instances, it may be
more useful to focus on the
data curation as much as a
single project.
24. Key skills
• Thinking flexibly about your data (and
potential project)
• Are there portions of your dataset that
could be extracted for use in a particular
tool?
• How can you adjust your data in order to
show it to people (and be more able to
talk/write/present about your research
interests?)
26. Group Activity
• What questions can you ask and
answer with this data as it is?
• What data would you need in order to
ask & answer other research
questions?
• What are the steps that you would
need to take in order to answer those
research questions?
27. Next steps
• What’s the smallest version of your
dataset possible? (useful for testing out
tools)
• Possible tools to examine (as ways of
presenting your data)
• Omeka (http://www.omeka.net)
• Scalar (http://scalar.usc.edu)
• Simile (http://www.simile-widgets.org)
• Google Fusion Tables
(https://support.google.com/fusiontables/answer/2571232)