Your SlideShare is downloading. ×
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply



Published on

text analysis tools

text analysis tools

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. New Tools in Digital Humanities UDHIG June 13 2006 Zoe Borovsky
  • 2. New tools
    • Text:
      • Juxta
      • TAPoR, HyperPo
      • WordHoard
      • Images:
        • Image Markup Tool
  • 3. Why digitize text? Text analysis: discovering new knowledge by linking information together in interesting ways, not just showing overall trends. “ I think discovering new knowledge vs. showing trends is like the difference between a detective following clues to find the criminal vs. analysts looking at crime statistics to assess overall trends in car theft.” (Marti Hearst, 2003)
  • 4. The verb “look” occurs more often near words & names of giantesses than giants. Three volumes of sagas: Hundreds of giants and giantesses
  • 5. Types of tools
    • Concordance, comparison, corpus, critical editions (Juxta)
    • Search (TAPoR, HyperPo, WordHoard)
      • Key words in context (KWIC)
      • Collocates (associations)
      • Markup: Lemma, Parts of speech, Speaker
  • 6. Juxta
    • Produces critical editions, comparing and collating multiple witnesses of a single work
  • 7. Juxta
    • Desktop Application: Mac, Windows and Unix/Linux (open source)
    • Input: plain text (UTF-8), or XML
    • Output: HTML critical apparatus
  • 8. The darker color, the more variants that differ
  • 9. Toggle between texts
  • 10. Generate HTML
  • 11.  
  • 12. TAPoR
      • Web-based text analysis portal
      • Search and display using online tools Input: XML, HTML, TEI, plain text
  • 13. TAPoR
    • Mostly English, some western European languages
    • Word Lists
    • KWIC (key word in context)
    • Collocates/co occurrences - words that occur in the proximity
  • 14. Word List HyperPo
  • 15. Key word in context, HyperPo
  • 16.
    • co occurrences
    • “ white”
    • add secondary corpus
  • 17. WordHoard
    • Desktop application/server version
    • texts are annotated or tagged by morphological, lexical, semantic, prosodic, and narratological criteria.
  • 18. The downloadable version comes with texts Open source version can be installed on your own server with your texts
  • 19. Sample WordHoard query
    • Shakespeare’s use of the word “love” over time
  • 20. Results….
  • 21. Image Markup Tool Windows only
  • 22. Image Markup tool
    • Input: an image that you want to make available on a web page with annotations directly on the image
    Ex, Robert Watson’s Back to Nature
  • 23.  
  • 24. Image Markup Tool
    • Output: sample
    • A copy of your XML data file with an added XSL stylesheet declaration
    • A copy of the image file you're marking up (usually reduced to a size suitable for a Web page -- you can control this size in the Options / Web view preferences window).
    • An XSLT file (copied from the web_view folder in the program folder, with some variables modified to suit your data).
    • A JavaScript file (copied from the web_view folder in the program folder).
    • A CSS stylesheet file (copied from the web_view folder in the program folder).