Your SlideShare is downloading. ×
0
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Mdst3705 2013-02-19-text-into-data
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mdst3705 2013-02-19-text-into-data

228

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
228
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • (theoretically)
  • Transcript

    • 1. Text into Data Prof Alvarado MDST 370519 February 2013
    • 2. Business• Quiz 1 graded – Let me know if you have questions• Readings – Apologies for mis-posting!
    • 3. Review• Last week, we took a 30,000 foot view of the use of databases in the digital humanities – We found that databases are everywhere• Databases form the foundation of all projects – Even if a database management system is not used• Relational databases are sophisticated and mature choices for foundations
    • 4. Overview• We began this course by looking at code as language – Code structured like natural language – Code implies, models, and creates a world• We then looked at the opposite process – looking at language, and the products of culture, as code – We called this “reverse engineering”• Today we continue this and look specifically at text
    • 5. What do you remember when you read a book?
    • 6. We remember scenes, images, plot lines, values, etc. We sometimes remember verbatim passagesWe don’t normally remember the words
    • 7. We get much of ourculture through books(and other "culturalmodels" in Colbyswords)
    • 8. Like cigarettes,books are a deliverymechanism(not of nicotine, butof culture)
    • 9. Colbys theory CULTURE TEXTS
    • 10. If texts contain cultural meanings . . . How do we get to them? How do we represent them?
    • 11. Models of Text
    • 12. Competing Approaches• A common approach to model text is to use XML – XML is like HTML, but more general – It allows you to mark up a text• XML assumes a text is like a tree – An “ordered hierarchy of content objects”• XML was also specifically designed to work with text
    • 13. XML looks like thisNotice how the element names reference units, not layout or style
    • 14. Text as Tree
    • 15. XML turns out to be very useful for defining the physical or logicalstructure of a text, but not for figures and meaningsTexts are actually more like networks
    • 16. This image shows three"figures" in the text ofan Old French poem.Note how they do not"nest" neatly into thestructure of the text, butinstead cross-cut it.It is hard to model thiskind of data with XML.
    • 17. Relational databases are a better choice for this since they are more abstractThe problem is, what data model to use? How do you model text in a relational database?
    • 18. Liu and Smith argue for aradical model, in which textis parsed at the workd levelEach word gets its ownrecord
    • 19. The Princeton Charrette Project used adatabase-driven application called Figura It was designed to represent the criticaledition of an Old French poem along withthe figural annotations of the text made by scholars A “figure” is a figure of speech orrhetorical device, like rhyming or the use of chiasmus
    • 20. The database stored information aboutgrammar, manuscript images, figures, and other data that had been accumulated over the years prior to building the database
    • 21. At the heart of thedatabase is the text modelthat links figures to text
    • 22. In my model and in Liu & Smith’s, the text becomes a database The readable text is just a query As is the index, table of contents, etc.
    • 23. The database of words and figures can be read by a program to generate a visually rich and interactive edition on the web
    • 24. But it can also be used to discoverpatterns in the text not visible to the readerIt can help us discover the culturalpatterns that are “delivered” by the text to our brains
    • 25. The results of a queryshowing the relationshipbetween proper nouns(agents) and figure types
    • 26. A structural reading of the data
    • 27. Form and content are interwoven, each reinforcing the otherForm – the delivery system – is used to transmit the meaningful content, the stuff that remains in your brain after reading or hearing the story
    • 28. This is a "hypergraph" ofthe same data, also easily generated from the database by code
    • 29. Text is like thishttp://anthonyflo.tumblr.com/post/7590868323/photographer-and-self-described-geek-of-maps
    • 30. A text is a signalCulture is a transmitter

    ×