• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
 

Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

on

  • 214 views

Talk given at Digital Humanities 2008 (DH2008) in Oulu, Finland on 27 June 2008. ...

Talk given at Digital Humanities 2008 (DH2008) in Oulu, Finland on 27 June 2008.
Web site: http://www.scottishcorpus.ac.uk/corpus/bnc/
Abstract: http://www.ekl.oulu.fi/dh2008/Digital%20Humanities%202008%20Book%20of%20Abstracts.pdf

This paper demonstrates a web-based, interactive data visualisation, allowing users to quickly inspect and browse the collocational relationships present in a corpus. The software is inspired by tag clouds, first popularised by on-line photograph sharing website Flickr (www.flickr.com). A paper based on a prototype of this Collocate Cloud visualisation was given at Digital Resources for the Humanities and Arts 2007. The software has since matured, offering new ways of navigating and inspecting the source data. It has also been expanded to analyse additional corpora, such as the British National Corpus (http://www.natcorp.ox.ac.uk/), which will be the focus of this talk.

Statistics

Views

Total Views
214
Views on SlideShare
214
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008 Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008 Presentation Transcript

    • Glimpses through theclouds: Collocates in anew lightDavid Beavan, Department ofEnglish Language
    • What are clouds?
    • Cloud propertiesAlphabetical listing of itemsl Good for navigationl Quickly locate or discount a known iteml Limitednumber of items (Flickr tag cloud = 150)Font size shows popularityl Good for browsingl Often used tags ‘jump out at you’l Limited usefulness if less popular terms are sought
    • Word frequency cloudShares properties with tag cloudsl Wordslisted alphabetically: good for navigationl Font size shows frequency of word: good for browsingRestricted viewl Summarises the document as a wholel Does not give insight into the usage or context of each word for this we need to look at co-occurrences/collocates
    • Our corpus●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
    • Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
    • Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
    • Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●
    • Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●● ●●●●● ●●
    • Co-occurrences as a cloudUsing the British National Corpus (BNC)l Popular and well knownl 100 million word corpusl British Englishl Compiled in early 1990sl Wide range of genresl Written and spoken datal 2007 XML edition
    • Co-occurrence cloudsCo-occurrence cloudsl 100 most frequent co-occurring word pairsl Rendered as a cloudl Inherit cloud benefits of navigation and explorationl Allow user to create new clouds from visible wordsWhat’s missingl KWIC concordance of word pairsl Measure of collocation strength
    • Our corpus●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
    • Collocates of ‘brown’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ●●●● ●
    • Collocate cloudsCollocate cloudsl 100 most frequent co-occurring word pairsl Rendered as a cloudl Inherit cloud benefits of navigation and explorationl Allow user to create new clouds from visible wordsl KWIC concordance of word pairsl Measure of collocation strength
    • FutureAdvantagesl Easy to interpret and usel Lowers the barrier to corpus analysisl Iterative nature promotes browsing and investigationImprovementsl Allow use of stopwords / filter wordsl Configure ‘size’ of cloudl Show POSl Group words under their headwordl Make your own?
    • Glimpses through theclouds: Collocates in anew lightDavid Beavand.beavan@englang.arts.gla.ac.uk