1. Text into Data Prof Alvarado MDST 370519 February 2013
2. Business• Quiz 1 graded – Let me know if you have questions• Readings – Apologies for mis-posting!
3. Review• Last week, we took a 30,000 foot view of the use of databases in the digital humanities – We found that databases are everywhere• Databases form the foundation of all projects – Even if a database management system is not used• Relational databases are sophisticated and mature choices for foundations
4. Overview• We began this course by looking at code as language – Code structured like natural language – Code implies, models, and creates a world• We then looked at the opposite process – looking at language, and the products of culture, as code – We called this “reverse engineering”• Today we continue this and look specifically at text
5. What do you remember when you read a book?
6. We remember scenes, images, plot lines, values, etc. We sometimes remember verbatim passagesWe don’t normally remember the words
7. We get much of ourculture through books(and other "culturalmodels" in Colbyswords)
8. Like cigarettes,books are a deliverymechanism(not of nicotine, butof culture)
9. Colbys theory CULTURE TEXTS
10. If texts contain cultural meanings . . . How do we get to them? How do we represent them?
11. Models of Text
12. Competing Approaches• A common approach to model text is to use XML – XML is like HTML, but more general – It allows you to mark up a text• XML assumes a text is like a tree – An “ordered hierarchy of content objects”• XML was also specifically designed to work with text
13. XML looks like thisNotice how the element names reference units, not layout or style
14. Text as Tree
15. XML turns out to be very useful for defining the physical or logicalstructure of a text, but not for figures and meaningsTexts are actually more like networks
16. This image shows three"figures" in the text ofan Old French poem.Note how they do not"nest" neatly into thestructure of the text, butinstead cross-cut it.It is hard to model thiskind of data with XML.
17. Relational databases are a better choice for this since they are more abstractThe problem is, what data model to use? How do you model text in a relational database?
18. Liu and Smith argue for aradical model, in which textis parsed at the workd levelEach word gets its ownrecord
19. The Princeton Charrette Project used adatabase-driven application called Figura It was designed to represent the criticaledition of an Old French poem along withthe figural annotations of the text made by scholars A “figure” is a figure of speech orrhetorical device, like rhyming or the use of chiasmus
20. The database stored information aboutgrammar, manuscript images, figures, and other data that had been accumulated over the years prior to building the database
21. At the heart of thedatabase is the text modelthat links figures to text
22. In my model and in Liu & Smith’s, the text becomes a database The readable text is just a query As is the index, table of contents, etc.
23. The database of words and figures can be read by a program to generate a visually rich and interactive edition on the web
24. But it can also be used to discoverpatterns in the text not visible to the readerIt can help us discover the culturalpatterns that are “delivered” by the text to our brains
25. The results of a queryshowing the relationshipbetween proper nouns(agents) and figure types
26. A structural reading of the data
27. Form and content are interwoven, each reinforcing the otherForm – the delivery system – is used to transmit the meaningful content, the stuff that remains in your brain after reading or hearing the story
28. This is a "hypergraph" ofthe same data, also easily generated from the database by code
29. Text is like thishttp://anthonyflo.tumblr.com/post/7590868323/photographer-and-self-described-geek-of-maps