One of the key appeals for digital humanities at small liberal arts colleges has been as an avenue for undergraduate research in the humanities. In this seminar, a panel of undergraduates will share their research, as well as their goals, challenges, and what they have learned from the process of digital humanities research. A moderated discussion on undergraduate research in the digital humanities will follow. Details are here: http://www.nitle.org/live/events/137-undergraduates-collaborating-in-digital-humanities
2. Panelists
• Moderator: Janet Simons, Associate Director of
Instructional Technology, Co-Director, Digital
Humanities Initiative (DHi), Hamilton College
• John Burnett, Wheaton College
• Sarah Schultz, Hamilton College
• Amanda Kleintop, University of Richmond
• Gabrielle Kirilloff, University of Pittsburgh
• Janis Chinn, University of Pittsburgh
3. John Burnett, Wheaton College
• Wheaton College Digital History Project
• http://wheatoncollege.edu/digital-history-
project/
4. Sarah Schultz, Hamilton College
• Agha Shahid Ali Poetry Project
• http://asa.dhinitiative.org/demo/index.html
5. Amanda Kleintop, University of Richmond
• History Honors Thesis, “Networks of
Resistance: Black Virginians Remember Civil
War Loyalties”
7. Gabrielle Kirilloff, University of Pittsburgh
How Digital Tools Impact Research
Questions and Methodologies in
Literary Studies
http://ft.obdurodon.org
http://gk.obdurodon.org
8. Research overview
• Previous research
– Speech as agency
– Speech hierarchies
• Research question
– What are the correlations among speech, gender,
and moral alignment?
9. Research methodologies
• Why XML?
– Unique, descriptive tags
– Processing a large number of tales
– Creating multiple views of the same data
• Learning XML and related technologies
16. To what extent do twitter users exercise
register shifting when communicating
with twitter users at large, non-verified
users, and verified users?
Research Question
17. Linguistic Register
• Situation-specific variety of language
• Spoken Register
• Unconscious effort
• Acquired naturally
• Written Register
• Conscious effort
• Acquired through study
18. Corpus Building
• Python script collects
tweets from the public • Current Statistics:
timeline
• Shell scripting and Perl
filter down the corpus • Total words: 5,375,767
• XML encoded, accessed via • Unique words: 654,755
XQuery • Type-token ratio: 0.12
• Average tweet length (words): 10.36
• English tweets • Average tweet length (characters):
• US and Canada 60.39
• Total tweets: 519,018
• Currently 519,018 tweets
• Total authors: 483,940
• 98% accurate filtering to
English only text • Total verified authors: 687
• Total non-verified authors: 483,253
19. What’s next?
• Judge tweets on relative register based on:
• Expletives and profanity
• Rate of non-dictionary word usage
• Average word length of dictionary words
• Appropriate capitalization
• Standard punctuation
• Leetspeak
• Chatspeak
• Ratio of function words within a tweet
• Potential additions:
• Analysis of word n-grams and character bi-grams
• Prescriptive use of ‘whom’ over ‘who’.
20. • “…The Telegraph quoted an actor and a
television producer emitting typically
brainless "Kids Today" plaints about how
modern modes of
communication, especially Twitter, are
degrading the English language, so that
"the sentence with more than one clause is
a problem for us", and "words are getting
shortened".“ –Mark Liberman, Language
Log, 2011
Motivations
21. • Impossible without DH
• Quantifiable and repeatable results
• Empowering to build and manipulate tools
to work with data
Motivations