Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Dmdh winter 2015 session #1


Published on

Demystifying Digital Humanities Winter 2015 Workshop 1

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Dmdh winter 2015 session #1

  1. 1. DMDH Winter 2015 Session #1: Exploring Programming in the Digital Humanities
  2. 2. Programming is complex enough that just figuring out what you want to do and what sort of language you need is work.
  3. 3. Thinking that you ought to be able to do everything almost immediately is a recipe for feeling terrible.
  4. 4. Being aware that it is genuine work, and not just work for newbies, matters.
  5. 5. There will always be new programs and platforms that you will want to experiment with.
  6. 6. Working with technology means periodically starting from scratch -- a bit like working with a new time period or culture; or figuring out how to teach a new class.
  7. 7. What can programming languages do?
  8. 8. Programming languages can...
  9. 9. They can also do all these things in combination.
  10. 10. Example #1 • find all the statements in quotes ("") from a novel. • count how many words are in each statement • put the statements in order from smallest amount of words to largest • write all the statements from the novel in a text file
  11. 11. Example #2 • allow a user to type in some information, i.e., "Benedict Cumberbatch" • compare “Benedict Cumberbatch” to a much larger file • retrieve any data that matches the information • print the retrieved information on screen
  12. 12. Example #3 • "read" two texts -- say, two plays by Seneca • search for any words that the two plays have in common • print the words that they have in common on screen • calculate what percentage of the words in each play are shared • print that percentage onscreen
  13. 13. Example #4 • if the user is located in geographic location Z, i.e., 45th and University, go to an online address and retrieve some text • print that text on the user’s tablet screen • receive input from the user and respond
  14. 14. However... • In Example #1, the computer is focusing on things that characters say. But what if you want to isolate speeches from just one character? • In Example 2, how does the computer know how much text to print? Will it just print "Benedict Cumberbatch" 379 times, because that's how often it appears in the larger file?
  15. 15. These are the areas of programming where critical thinking and humanities skills become vital.
  16. 16. The Difference • Humans are good at differentiating between material in complex and sophisticated ways. • Computers are good at not differentiating between material unless they’ve been specifically instructed to do so.
  17. 17. Computers work with data. You work with data, too -- but in most cases, you'll have to make your data readable by computer.
  18. 18. How to make your data machine-readable • Annotate it with markup language • Organize it in patterns that the computer can understand • Add data that is not explicitly readable in the current format (i.e., hardbound/softbound binding; language:English; date of record creation)
  19. 19. Depending on the data you have, and the way you annotate or structure it, different things become possible.
  20. 20. For instance, sometimes it may be enough to know that a tile is 9” sq. But sometimes you need to know that it is 3” x 3”.
  21. 21. Your goal is to make the data As Simple As Possible -- but not so simple that it stops being useful.
  22. 22. Depending on the data you work with, the work of structuring or annotating becomes more challenging, but also more useful.
  23. 23. The work of creating data is social.
  24. 24. In other words, how can others use it?
  25. 25. Many programming languages have governing bodies that establish standards for their use: •the World Wide Web (W3C) Consortium ( •the TEI Technical Council
  26. 26. BREAK!
  27. 27. Data Examples • Annotated (Markup Languages: HTML,TEI) • Structured (MySQL) • Combination (Semantic Web)
  28. 28. Markup: HTML <i> This text is italic.</i> = This text is italic.
  29. 29. Markup: HTML <a href=“”> This text</a> will take you to a webpage. = This text will take you to a webpage.
  30. 30. Markup: HTML Anything can be data -- and markup languages provide instructions for how computers should treat that data.
  31. 31. Markup: HTML HTML is a display language used to format text on webpages. <p> separates text into paragraphs. <em> makes text bold (emphasized). These are just a few of the HTML formatting instructions that you can use.
  32. 32. HTML Syntax Rules •Open and closed tags: <> and </> •Attributes (2nd -level information) defined using =“” •Comments: <!-- -->
  33. 33. Markup languages are popular in digital humanities because lots of humanists work with texts.
  34. 34. Without markup languages, the things that a computer can search for are limited.
  35. 35. Ctrl + F: any text in iambic pentameter.
  36. 36. With markup, the things you can search for are only limited by your interpretation. Markup: TEI
  37. 37. TEI (Text Encoding Initiative) Markup: TEI
  38. 38. Poetry w/ TEI <text xmlns="" xml:id="d1"> <body xml:id="d2"> <div1 type="book" xml:id="d3"> <head>Songs of Innocence</head> <pb n="4"/> <div2 type="poem" xml:id="d4"> <head>Introduction</head> <lg type="stanza"> <l>Piping down the valleys wild, </l> <l>Piping songs of pleasant glee, </l> <l>On a cloud I saw a child, </l> <l>And he laughing said to me: </l> </lg>
  39. 39. Grammar w/ TEI <entry> <form> <orth>pamplemousse</orth> </form> <gramGrp> <gram type="pos">noun</gram> <gram type="gen">masculine</gram> </gramGrp> </entry>
  40. 40. TEI’s syntax rules are identical to HTML’s -- though your normal browser can’t work with TEI the way it works with HTML.
  41. 41. TEI is meant to be a highly social language -- meaning that the committee who maintains its standards want it to be something that anyone can use.
  42. 42. In order for TEI to successfully encode texts, it has to be adaptable to individual projects.
  43. 43. Anything that you can isolate (and put in brackets) can (theoretically) then be manipulated to serve your project.
  44. 44. TEI can be used to encode more than just text: <div type="shot">   <view>BBC World symbol</view>   <sp>    <speaker>Voice Over</speaker>    <p>Monty Python's Flying Circus tonight comes to you live      from the Grillomat Snack Bar, Paignton.</p>  </sp> </div> <div type="shot">   <view>Interior of a nasty snack bar. Customers around, preferably    real people. Linkman sitting at one of the plastic tables.</view>  <sp>    <speaker>Linkman</speaker>     <p>Hello to you live from the Grillomat Snack Bar.</p>   </sp> </div>
  45. 45. Or, you could encode all Stephenie Meyer’s Twilight according to its emotional register.
  46. 46. Whether you include or exclude some aspect of the text in your markup can be very important from an academic perspective.
  47. 47. The challenge of creating good data is one reason that collaboration is so important to digital scholarship.
  48. 48. Data Collaboration • Avoid reinventing the wheel (has the markup for this text already been done?) • Consider the labor involved vs. the outcome (and future use of the data you create.)
  49. 49. Structured Data
  50. 50. Study Scenario #1 • You study urban espresso stands: their hours, brands of coffee, whether or not they sell pastries, and how far the espresso stands are from major roadways.
  51. 51. What Types of Data? • Binary (pastries: y/n) • Unordered (hours; coffee brands) • Derived/subservient (hours+proximity to roadways; take cards? Which cards?)
  52. 52. Study Scenario #2 • You study female characters in novels written between 1700 and 1850. Encoding a whole novel just to study female characters isn’t practical for you.
  53. 53. What types of data might you collect in this case?
  54. 54. Both scenarios involve aggregating information, rather than encoding it.
  55. 55. Structured Data: Example #1 (MySQL) ID Name Location Hours Coffee Brand Pastries (Y/N) Distance from Street 008 Java the Hut 56 Farringdon Road, London, UK 7:00 a.m.-2:00 p.m. Square Mile Roasters N 25 meters 009 Prufrock Coffee 18 Shoreditch High Street 7:00 a.m. – 10:00 p.m. Monmouth Y 10 meters
  56. 56. Structured Data: Example #2 (RDF)
  57. 57. How your data is (or can be) structured will influence the technology that you (can) use to work with it.
  58. 58. Digital humanists see creating machine-readable data as valuable scholarship, and consider it vital to make that labor transparent.
  59. 59. Exercise: You Create the Data!
  60. 60. Your data determines your project.
  61. 61. Every project has data. Text objects, images, tags, geographical coordinates, categories, records, creator metadata, etc.
  62. 62. Even if you’re not planning to learn any programming skills, you are still working with data.
  63. 63. Next time: Programming on the Whiteboard January 24, 9:30, CMU 202 •Cleaning data before you work with it! •Identifying specific programming tasks •How access affects your project idea •Flash project development •Homework: bring some data to work with.