MW2011: Klavans, J.  +, Computational Linguistics in Museums: Applications for Cultural Datasets
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

on

  • 1,201 views

As museums continue to develop more sophisticated techniques for managing and analyzing cultural data, many are beginning to encounter challenges when trying to deal with the nuances of language and ...

As museums continue to develop more sophisticated techniques for managing and analyzing cultural data, many are beginning to encounter challenges when trying to deal with the nuances of language and automated processing tools. How might user-generated comments be harvested and processed to determine the nature of the comment? Is it possible to use existing collection documentation to derive relations between similar objects? How can we train systems to automatically recognize (disambiguate) different meanings of the same word? Can automated language processing lead to more compelling browsing interfaces for online collections?

Luckily, a good deal of expertise and tools exist within the field of computational linguistics that can be applied to these problems to achieve meaningful results. Informed by previous work in computational linguistics and relevant project experience, the authors will address a number of these questions providing insight about how answers to impact museum practice might be found. Authors will share tools and resources that museum software developers can use to prototype and experiment with these techniques - without being experts in language processing themselves. In addition, the authors will describe the work of the T3: Text, Tags, Trust research project and how they have applied these tools to a large shared dataset of object metadata and social tags collected by the Steve.museum project.

Specific challenges regarding batch-processing tools and large datasets will be addressed. Best practices and algorithms will be shared for dealing with a number of sticky issues. Directions for future research and promising application areas will be also be discussed.

A presentation from Museums and the Web 2011 (MW2011)

Statistics

Views

Total Views
1,201
Views on SlideShare
1,064
Embed Views
137

Actions

Likes
0
Downloads
2
Comments
1

4 Embeds 137

http://www.museumsandtheweb.com 100
http://conference.archimuse.com 31
http://www.slideshare.net 5
http://translate.googleusercontent.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Take this seriously.
  • IN presenting this paper, start with something not in the paper.
  • Still need to finish
  • Words,words, words.

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets Presentation Transcript

  • 1. Your spoken paper cannot be the same as your written paper
    Read more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com
  • 2. Computational Linguistics in Museums: Applications for Cultural Datasets
    Klavans
    Judith
    Susan
    Robert
    Chun
    Stein
    Guerra
    Raul
  • 3. ComputationalLinguistics
    Language - Words, Words, Words
    Use
    Meaning
    Syntax
    Shape of words
    Sounds
  • 4. Applications
    Speech synthesis – 1980’s Talking Machines for the Blind
    Intelligent search – pre-google
    Finding names – who, what, where
    Translation
    Speech recognition
    Answering Questions – What is Watson?
  • 5. Domains for Computational Linguistics
    Healthcare – interpreting patient records
    Government – helping people find information
    International Affairs – cross-language translation
    Law – analyzing Enron scandal email
    Marketing – Opinions on products
    Museums – analyzing text and tags associated with objects for better access
  • 6. Computational
    Linguistics for
    Metadata
    Building
    +
  • 7. Computational Linguistics in Museums: Applications for Cultural Datasets
    Klavans
    Judith
    Susan
    Robert
    Chun
    Stein
    Guerra
    Raul
  • 8. InterdisciplinaryResearch
    Computational Linguistics
    in Museums
  • 9. Text, Tags, Trust
    Funded in 2008 by IMLS
    With the University of Maryland, and collaborative of museum partners
    Studying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.
  • 10. MW 2011 Contributions
    Which Computational Linguistic tools can or should be applied to tags?
    How do these tools impact tag analysis?
    What results differ from the initial steve.museum results from Trant 2007?
    So what – for CL?
    So what – for Museums?
  • 11. Hard Challenges
    • What do these words really mean?
    • 12. How can tags be related to other tags?
    across languages
    across users
    • How are tags over museum objects related to tags over anything else?
    • 13. How can they be used?
  • Finding a Needle in the Haystack
  • 14. Gallery Label
    This canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape.
    While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.
  • 15. Tools for Tags
    Morphological Analysis – Conflate when possible
    Cats, cat
    Haystacks, haystack
    Painting, paint ?
    What words are verbs, nouns, adjectives?
    How should multi-word tags be handled?
  • 16. Raw Tags or Tokens
  • 17. Results
    25%
    93%
    68%
  • 18. 1. NN=25205
    2. JJ=6319
    3. NNS=4041
    4. NN_NN=2257
    5. JJ_NN=1792
    6. VBG=1043
    7. VBN=727
    8. NP=708
    9. OD_NN=454
    10. JJ_NNS=413
  • 19. Top 10 POS Patterns:
    1. NN=6706
    2. NN_NN=1713
    3. JJ_NN=1194
    4. JJ=921
    5. NNS=757
    6. JJ_NNS=303
    7. NN_NNS=300
    8. VBG=238
    9. NP=209
    10. VBN_NN=202
  • 20. Hard Challenges
    • What do these words really mean?
    • 21. How can tags be related to other tags?
    across languages
    across users
    • How are tags over museum objects related to tags over anything else?
    • 22. How can they be used?
  • Why Part of Speech?
    • Integral to most language processing pipelines
    • 23. Irecursor to parsing.
    • 24. However, for social tags, parsing is not a meaningful step.
    Research:
    • Understand the nature of this kind of descriptive tagging.
    • 25. Link part of speech information with other lexical resources for disambiguation
  • You shall know a word
    by the company it keeps.
    J.R. Firth
    Gold
    Orange
    Necklace
    Ripe
  • 26. What About “New England”
    Idioms / lexicalized phrases are more difficult
    Heuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tags
    E.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War”
    *Klavans and Golbeck, 2010
  • 27. Wish List - Better ways to tame the proliferation of rich but “noisy” content
    Clustering over tags for similarity
    Clustering over tags and terms from text
    Matching over existing terms to identify meaningful units
    Apply machine learning techniques to guess meaning
    Bigrams, Trigram, Thesauri, Corpus Analysis
  • 28. Acknowledgements
    Steve.museum project members
    T3 and steve.museum museum partners
    University of Maryland, T3 group
    IMA Museum
    ……and other participants
  • 29. Thank You!
    Questions?