Your SlideShare is downloading. ×
Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Glimpses Through the Clouds: collocates in a new light, David Beavan, University of Glasgow, DH2008

142
views

Published on

Talk given at Digital Humanities 2008 (DH2008) in Oulu, Finland on 27 June 2008. …

Talk given at Digital Humanities 2008 (DH2008) in Oulu, Finland on 27 June 2008.
Web site: http://www.scottishcorpus.ac.uk/corpus/bnc/
Abstract: http://www.ekl.oulu.fi/dh2008/Digital%20Humanities%202008%20Book%20of%20Abstracts.pdf

This paper demonstrates a web-based, interactive data visualisation, allowing users to quickly inspect and browse the collocational relationships present in a corpus. The software is inspired by tag clouds, first popularised by on-line photograph sharing website Flickr (www.flickr.com). A paper based on a prototype of this Collocate Cloud visualisation was given at Digital Resources for the Humanities and Arts 2007. The software has since matured, offering new ways of navigating and inspecting the source data. It has also been expanded to analyse additional corpora, such as the British National Corpus (http://www.natcorp.ox.ac.uk/), which will be the focus of this talk.

Published in: Education, Technology

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
142
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Glimpses through theclouds: Collocates in anew lightDavid Beavan, Department ofEnglish Language
  • 2. What are clouds?
  • 3. Cloud propertiesAlphabetical listing of itemsl Good for navigationl Quickly locate or discount a known iteml Limitednumber of items (Flickr tag cloud = 150)Font size shows popularityl Good for browsingl Often used tags ‘jump out at you’l Limited usefulness if less popular terms are sought
  • 4. Word frequency cloudShares properties with tag cloudsl Wordslisted alphabetically: good for navigationl Font size shows frequency of word: good for browsingRestricted viewl Summarises the document as a wholel Does not give insight into the usage or context of each word for this we need to look at co-occurrences/collocates
  • 5. Our corpus●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  • 6. Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  • 7. Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  • 8. Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ●
  • 9. Collocates of ‘blue’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●● ● ●●●● ●●●●● ●●
  • 10. Co-occurrences as a cloudUsing the British National Corpus (BNC)l Popular and well knownl 100 million word corpusl British Englishl Compiled in early 1990sl Wide range of genresl Written and spoken datal 2007 XML edition
  • 11. Co-occurrence cloudsCo-occurrence cloudsl 100 most frequent co-occurring word pairsl Rendered as a cloudl Inherit cloud benefits of navigation and explorationl Allow user to create new clouds from visible wordsWhat’s missingl KWIC concordance of word pairsl Measure of collocation strength
  • 12. Our corpus●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
  • 13. Collocates of ‘brown’●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ● ●● ●●●● ●
  • 14. Collocate cloudsCollocate cloudsl 100 most frequent co-occurring word pairsl Rendered as a cloudl Inherit cloud benefits of navigation and explorationl Allow user to create new clouds from visible wordsl KWIC concordance of word pairsl Measure of collocation strength
  • 15. FutureAdvantagesl Easy to interpret and usel Lowers the barrier to corpus analysisl Iterative nature promotes browsing and investigationImprovementsl Allow use of stopwords / filter wordsl Configure ‘size’ of cloudl Show POSl Group words under their headwordl Make your own?
  • 16. Glimpses through theclouds: Collocates in anew lightDavid Beavand.beavan@englang.arts.gla.ac.uk

×