Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dmdh winter 2015 session #2

560 views

Published on

Demystifying Digital Humanities Winter 2015 Workshop 2

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Dmdh winter 2015 session #2

  1. 1. Winter 2015: Session #2 Programming on the Whiteboard (Sarah Kremen-Hicks & Brian Gutierrez)
  2. 2. Previously, at DMDH... • The work of creating usable data • Forms that this data might take: • markup language • spreadsheets
  3. 3. Workshop #2 • Caveat Curator (challenges of working with data) • Programming on the whiteboard, i.e., conceptualizing the specific steps that you need to take to accomplish your goals
  4. 4. Why this focus on data? • Understanding your data, and your intended actions, is a key skill for working with any programming language or platform. • This is true whether you are the programmer or whether you are working with professional programmers.
  5. 5. Programming languages are like human languages in that they both have phrases, patterns, and rules.
  6. 6. Programming languages are unlike human languages in that they aren’t for communicating with people.
  7. 7. They are also unlike human languages in that every programming utterance does something, i.e., causes an action to occur.
  8. 8. You can get used to patterns – even unfamiliar ones.
  9. 9. The shift is in getting used to thinking in terms of every single action.
  10. 10. Our subject matter today is all actions that you’ll need to think about before you work with...
  11. 11. Image: Josh Lee, @wtrsld, via Twitter, January 2014.
  12. 12. Even when you’re just experimenting, you need to prep your data.
  13. 13. You may know your dataset in detail already, from your research -- but your computer is concerned with different levels of detail.
  14. 14. Becoming aware of those levels of detail is not only helpful for your project ideas...
  15. 15. ...it’s also a useful skill for working with programming languages. (where a stray /> or ; can break your program/website)
  16. 16. Caveat Curator
  17. 17. Data only works if your computer can read it.
  18. 18. But my data is just text! (Isn’t that easy?)
  19. 19. (Remember, your computer is fairly stupid).
  20. 20. Formatted text is often full of text your computer can’t parse correctly.
  21. 21. The┘re┘sÜlt ís that yoÜr te┘xt might come┘ oÜt looking like┘this whe┘n yoÜ ope┘n it in a programming e┘nvironme┘nt.
  22. 22. So you need to convert it to plain text. (without any of the fancy details encoded in MS Word fonts.)
  23. 23. But even that can produce unexpected errors.
  24. 24. Maybe you want to work with sailing data and ports of call:
  25. 25. The ship you’re interested in leaves the Ivory Coast for St. Helena...
  26. 26. But when you create your map, you get this:
  27. 27. The latitude/longitude coordinate is the significant datum.
  28. 28. The city name is just the human-readable component.
  29. 29. Each datum needs to be unique.
  30. 30. Figuring out what sort of unique configuration will work best involves at least some experimentation.
  31. 31. To experiment effectively, you’ll want to keep careful records.
  32. 32. If you develop categories of information, you’ll want to keep a record of what each category means, and what its limits are.
  33. 33. Cleaning and structuring your data is a foundation issue that changes, depending on the available format of your data.
  34. 34. What if your data is crowdsourced?
  35. 35. You can require a particular format for submissions
  36. 36. You can even put programmatic limits on the formats available for submission
  37. 37. But in the end, you’re still going to need to scrub and/or format.
  38. 38. This is true even for data from supposedly reputable sources, like government or media organizations.
  39. 39. Example: Doctor WhoVillains dataset http://tinyurl.com/doctorwhovil lains
  40. 40. This step is no fun!
  41. 41. But it’s absolutely necessary.
  42. 42. Break!
  43. 43. Working with multiple types of data: GIS and the Spatial Turn
  44. 44. GIS technology has paved the way for the analyzing qualitative data associated with cultural experiences
  45. 45. “A good map is worth a thousand words, cartographers say, and they are right: because it produces a thousand words: it raises doubts, ideas. It poses new questions, and forces you to look for new answers.” (Moretti 1998, 3–4)
  46. 46. Literary texts are filled with subjective spatial data: an author or character's articulation of geographically located dwellings, urban and rural landscapes, as well as performance spaces
  47. 47. Project: Mapping William Wordsworth's Conspicuous Consumption in The Prelude (Brian R. Gutierrez)
  48. 48. Objective: to map the visual culture events referenced in Wordsworth’s autobiographical poem The Prelude (as well as the ones not referenced)
  49. 49. Problem to solve: Prove that literary galleries, specifically Joseph Boydell’s “Shakespeare Gallery” shaped the dramaturgical choices in the only play written by Wordsworth. He reads Shakespeare not through a personal copy of the play, but through the visual and performative texts at that time
  50. 50. Data: place-names, indirect references, and all non- referenced visual cultural events
  51. 51. Access to data: Project Gutenberg, digital archive of British newspapers and periodicals
  52. 52. What to do with that data? Map it!!
  53. 53. First data set: Literary spatial articulations
  54. 54. Wordsworth mentions these following place names and references: "Oh wonderous power of words, how sweet  they are  / According to the meaning which they bring-- / Vauxhall and Ranelagh, I then had heard / Of your green groves and wilderness of lamps, /Your gorgeous ladies, fairy cataracts,And pageant fireworks"  (119-125) "Half-rural Sadler's Wells" (267)
  55. 55. First, I need to know what and where these places were in order to identify them as spatial data Ex:Vauxhall and Ranelagh
  56. 56. Second, if I'm interested in visual cultural experiences, I need to identify what kind of event occurred there: galley play, etc.
  57. 57. Third, how would I access the data? Answer: place-names in a book are not under any copyright.   However, if I wanted to include sections from the text when a viewer would click on that place name then I would have to think about copyright, but it's on PG, so that's covered.
  58. 58. Fourth, I would have to locate any indirect reference to visual cultural phenomena. Ex:Wordsworth mentions two actresses by name Mary Robinson and Sarah Siddons. Since I cannot map a person, I need to investigate which plays they were in and at which theaters during that moment of his life (it's an autobiography)
  59. 59. Fifth, I need to research what special events were occurring at other places he mentions. For that, I look to The Times (newspapers) and various periodicals.
  60. 60. Sixth, because I going to create a map, using ArcGIS, I need to put my data in an excel spreadsheet so that it can be read by the program.
  61. 61. What is the relationship between the data?
  62. 62. Analyze the qualitative data Humanist skill= Dhumanist skill
  63. 63. Programming on the whiteboard involves looking at the categories of information, and thinking about how they interact.
  64. 64. Categories • Place names • Poetic lines • Genre of visual/cultural event • Spatial data (latitude/longitude)
  65. 65. Return to the source of original data—the literary text—to examine how the author is describing these phenomena
  66. 66. Why use ArcGIS?
  67. 67. Benefits of ArcGIS • It allows the overlay of historical maps • Trainings were available and accessible (through DHSI and UW courses) • As a software program,ArcGIS is established enough to be considered robust • Available through the UW software suite
  68. 68. Disadvantages of ArcGIS • Available only for PCs • Proprietary file format (even if input data is open-access, the end result is not) • Available only on an annual subscription model (and prohibitively expensive for scholars without campus-granted access)
  69. 69. In Franco Moretti’s Atlas of the European Novel 1800-1900 (1998), he calls for a “literary geography,” predicated on the creation of “readerly maps” and the use of those maps as analytical tools.
  70. 70. Caveats? The pursuit of mapping data may exclude complex social spaces (e.g., gender domestic environments)
  71. 71. Caveats? Cartographical representations should not be divorced from their primary texts
  72. 72. Break!
  73. 73. Project:Visualizing Prosody (Sarah Kremen-Hicks) x / |x /|xx / | x / |x / Sir Walter Vivian all a summer's day / x | / x | x / | x / | x / Gave his broad lawns until the set of sun
  74. 74. Marking up a poem for metrical scansion is encoding it with data. What can a computer do with that data?
  75. 75. Computers are good at counting things – like iambs.
  76. 76. Is it possible to predict deviations from a metrical norm based on author or lyric classification?
  77. 77. Will authors show a tendency for particular types of metrical substitution?
  78. 78. Prepping the Data • For proof of concept, start with one author (Alfred, LordTennyson) • Get Tennyson’s poems from Project Gutenberg • Hand-mark representative poems for prosody
  79. 79. Programming on the Whiteboard What should the computer do?
  80. 80. Computer tasks• Count feet per line • Recognize | as a foot boundary • Recognize carriage return as a line boundary • Supply foot boundaries at beginning/end of lines • Count the number of areas contained within foot boundaries for each line
  81. 81. These steps involve recognizing each metrical foot as units that contain particular accentual- syllabic data. x / |x /|xx / | x / |x / Sir WalterVivian all a summer's day
  82. 82. Computer tasks, cont’d. • Identify the most common number of feet per line • Supply a report on lines (by number) that deviate • Calculate rate of deviation/adherence • Mode = paradigm
  83. 83. After recognizing the foot as a unit, the computer can calculate what patterns of data each foot contains.
  84. 84. Computer tasks, cont’d. • Identify the most common foot type • Identify markings within foot boundaries • Compare markings to foot dictionary to identify type
  85. 85. These tasks identify each line as a unit composed of one or more feet. x / |x /|xx / | x / |x / Sir WalterVivian all a summer's day (iambic pentameter with third foot anapestic substitution)
  86. 86. Still more computing tasks! • Identify the most common foot type within a poem • Supply a report on feet (by line and foot number) that deviate • Calculate rate of deviation/adherence • Mode = paradigm
  87. 87. Just as the feet contain patterns, the lines contain patterns that can be analyzed as well.
  88. 88. Still more computing tasks! • Report on types of deviations arranged by most to least common • Information should include location (line/foot number), as well as prevalence of substitution type
  89. 89. Deviations and their placement within each line and each poem should display certain patterns unique to each author (I hope!)
  90. 90. Current status: I’m investigating using the Natural Language Toolkit to tokenize each foot; and to establish syllables, feet, and lines as a unique hierarchy.
  91. 91. ApplicableValues •Iterative development •Failure as valuable •Collaboration
  92. 92. If you are thinking about your data, and the tasks that you need to accomplish, then it’s easier to determine what sort of language or platform your project needs.
  93. 93. There are countless tutorials, online courses, etc., for almost any programming language or platform. (We’re giving you a cheat sheet, too; and http://www.dmdh.org is your friend. So is Google.)
  94. 94. Learning them can be a slow process, especially at first.
  95. 95. However, knowing what tasks you’re working towards makes it easier to understand the purpose of the introductory lessons.
  96. 96. It’s also easy to think about how the first rules you learn for any language or platform might affect your goals.
  97. 97. And now, it’s your turn...
  98. 98. For this activity, we recommend that you pair up, or form small groups to work together.
  99. 99. Group Activity • What do you need to do with your data? • What units might that data exist in? • What categories do you need to create? • What relationships need to exist between the units and categories?
  100. 100. Upcoming Workshops! • Crash Course on R: Feb 4, 12:30-2:00 (location TBD) • SpringWorkshops on Project Ideation and Development:April 11th and April 25th
  101. 101. DMDH content is developed by Paige Morgan, Sarah Kremen-Hicks, and Brian Gutierrez, with generous support from the Simpson Center for the Humanities at the University of Washington. Content is available under a Creative Commons Attribution-NonCommercial 3.0 Unported License. Please contact Sarah at sarahkh@uw.edu with questions.

×