Your SlideShare is downloading. ×
DiaView: visualise cultural change in diachronic corpora, David Beavan, UCLDH, DH2012
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

DiaView: visualise cultural change in diachronic corpora, David Beavan, UCLDH, DH2012

396
views

Published on

Talk given at Digital Humanities 2012 (DH2012) in Hamburg, Germany on 18 July 2012. …

Talk given at Digital Humanities 2012 (DH2012) in Hamburg, Germany on 18 July 2012.
Web site: http://www.scottishcorpus.ac.uk/corpus/diaview/
Video: http://lecture2go.uni-hamburg.de/konferenzen/-/k/13916
Abstract: http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/diaview-visualise-cultural-change-in-diachronic-corpora/

This paper will introduce and demonstrate DiaView, a new tool to investigate and visualise word usage in diachronic corpora. DiaView highlights cultural change over time by exposing salient lexical items from each decade or year, and providing them to the user in an effortless visualisation. This is made possible by examining large quantities of diachronic textual data, in this case the Google Books corpus (Michel et al. 2010) of one million English books. This paper will introduce the methods and technologies at its core, perform a demonstration of the tool and discuss further possibilities.

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
396
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DiaView:Visualise Cultural Change inDiachronic CorporaDavid BeavanUCL Centre for Digital Humanities@DavidBeavanwww.scottishcorpus.ac.uk/corpus/diaview
  • 2. Google Books corpus/Ngram Viewerhttp://books.google.com/ngrams/
  • 3. Google Books corpus• OCR quality variable, particularly poor in 1700s (difficulties with long-s: ſ )• Does not evenly sample across genres (data collection fairly opportunistic)• Chronological placement questionable (implicit metadata not always correct)• Very large data set (155 billion tokens)
  • 4. DiaView uses• English One Million corpus “Books with low OCR quality were removed, and serials were removed.”• 1850 to present (avoids long-s)• 98 billion tokens (still very large)• Filter out very infrequently used words (or keep large sample of most frequently used)
  • 5. DiaView concept• Quick and easy to use• Aggregate and summarise data• Promote browsing and opportunistic discovery• Help identify cultural trends across time• Highlight salient or ‘interesting’ terms• Provide links to more in-depth analysis• Inspect corpus by decade or year• Ability to work with any corpora or any dataset
  • 6. DiaView method/measuring salience Proportion of term occurrences in entire corpus vs Proportion of term occurrences in particular year
  • 7. Word ‘and’100 of 1000 words in entire corpus is ‘and’ = 10%Year 1 45 of 500 words = 9% = -10% of corpus proportion (10%)Year 2 55 of 500 words = 11% = +10% of corpus proportion (10%)Word ‘sausage’20 of 1000 words in entire corpus is ‘sausage’ = 2%Year 1 4 of 500 words = 0.2% = -90% of corpus proportion (2%)Year 2 16 of 500 words = 3.2% = +60% of corpus proportion (2%)Rank for salience by year, ignoring underuse (not negative %ages)Year 1 -Year 2 ‘sausage’ (+60%), ‘and’ (+10%)
  • 8. DiaView method• Word frequency alone does not dictate salience (extraordinary over use does)• Traverse entire corpus by year/decade• Calculate salience for each type• Rank types according to salience• Apply visual style to word lists• Create links back to Ngram Viewer for in-depth analysis
  • 9. www.scottishcorpus.ac.uk/corpus/diaview
  • 10. www.scottishcorpus.ac.uk/corpus/diaview
  • 11. DiaView:Visualise Cultural Change inDiachronic CorporaDavid BeavanUCL Centre for Digital Humanities@DavidBeavanwww.scottishcorpus.ac.uk/corpus/diaview