Your SlideShare is downloading. ×
Udhig0613
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Udhig0613

239
views

Published on

text analysis tools

text analysis tools

Published in: Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
239
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. New Tools in Digital Humanities UDHIG June 13 2006 Zoe Borovsky
  • 2. New tools
    • Text:
      • Juxta
      • TAPoR, HyperPo
      • WordHoard
      • Images:
        • Image Markup Tool
  • 3. Why digitize text? Text analysis: discovering new knowledge by linking information together in interesting ways, not just showing overall trends. “ I think discovering new knowledge vs. showing trends is like the difference between a detective following clues to find the criminal vs. analysts looking at crime statistics to assess overall trends in car theft.” (Marti Hearst, 2003)
  • 4. The verb “look” occurs more often near words & names of giantesses than giants. Three volumes of sagas: Hundreds of giants and giantesses
  • 5. Types of tools
    • Concordance, comparison, corpus, critical editions (Juxta)
    • Search (TAPoR, HyperPo, WordHoard)
      • Key words in context (KWIC)
      • Collocates (associations)
      • Markup: Lemma, Parts of speech, Speaker
  • 6. Juxta
    • Produces critical editions, comparing and collating multiple witnesses of a single work
    http://www.patacriticism.org/juxta/
  • 7. Juxta
    • Desktop Application: Mac, Windows and Unix/Linux (open source)
    • Input: plain text (UTF-8), or XML
    • Output: HTML critical apparatus
  • 8. The darker color, the more variants that differ
  • 9. Toggle between texts
  • 10. Generate HTML
  • 11.  
  • 12. TAPoR
      • Web-based text analysis portal
      • Search and display using online tools
    http://test-tapor.mcmaster.ca/portal/portal Input: XML, HTML, TEI, plain text
  • 13. TAPoR
    • Mostly English, some western European languages
    • Word Lists
    • KWIC (key word in context)
    • Collocates/co occurrences - words that occur in the proximity
  • 14. Word List HyperPo
  • 15. Key word in context, HyperPo
  • 16.
    • co occurrences
    • “ white”
    • add secondary corpus
  • 17. WordHoard
    • Desktop application/server version
    • texts are annotated or tagged by morphological, lexical, semantic, prosodic, and narratological criteria.
    http://wordhoard.northwestern.edu/userman/index.html
  • 18. The downloadable version comes with texts Open source version can be installed on your own server with your texts
  • 19. Sample WordHoard query
    • Shakespeare’s use of the word “love” over time
  • 20. Results….
  • 21. Image Markup Tool http://www.tapor.uvic.ca/~mholmes/image_markup/ Windows only
  • 22. Image Markup tool
    • Input: an image that you want to make available on a web page with annotations directly on the image
    Ex, Robert Watson’s Back to Nature
  • 23.  
  • 24. Image Markup Tool
    • Output: sample
    • A copy of your XML data file with an added XSL stylesheet declaration
    • A copy of the image file you're marking up (usually reduced to a size suitable for a Web page -- you can control this size in the Options / Web view preferences window).
    • An XSLT file (copied from the web_view folder in the program folder, with some variables modified to suit your data).
    • A JavaScript file (copied from the web_view folder in the program folder).
    • A CSS stylesheet file (copied from the web_view folder in the program folder).

×