MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

•Download as PPTX, PDF•

3 likes•949 views

As museums continue to develop more sophisticated techniques for managing and analyzing cultural data, many are beginning to encounter challenges when trying to deal with the nuances of language and automated processing tools. How might user-generated comments be harvested and processed to determine the nature of the comment? Is it possible to use existing collection documentation to derive relations between similar objects? How can we train systems to automatically recognize (disambiguate) different meanings of the same word? Can automated language processing lead to more compelling browsing interfaces for online collections? Luckily, a good deal of expertise and tools exist within the field of computational linguistics that can be applied to these problems to achieve meaningful results. Informed by previous work in computational linguistics and relevant project experience, the authors will address a number of these questions providing insight about how answers to impact museum practice might be found. Authors will share tools and resources that museum software developers can use to prototype and experiment with these techniques - without being experts in language processing themselves. In addition, the authors will describe the work of the T3: Text, Tags, Trust research project and how they have applied these tools to a large shared dataset of object metadata and social tags collected by the Steve.museum project. Specific challenges regarding batch-processing tools and large datasets will be addressed. Best practices and algorithms will be shared for dealing with a number of sticky issues. Directions for future research and promising application areas will be also be discussed. A presentation from Museums and the Web 2011 (MW2011)

Technology Education

Your spoken paper cannot be the same as your written paper Read more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com

Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul

ComputationalLinguistics Language - Words, Words, Words Use Meaning Syntax Shape of words Sounds

Applications Speech synthesis – 1980’s Talking Machines for the Blind Intelligent search – pre-google Finding names – who, what, where Translation Speech recognition Answering Questions – What is Watson?

Domains for Computational Linguistics Healthcare – interpreting patient records Government – helping people find information International Affairs – cross-language translation Law – analyzing Enron scandal email Marketing – Opinions on products Museums – analyzing text and tags associated with objects for better access

Computational Linguistics for Metadata Building +

InterdisciplinaryResearch Computational Linguistics in Museums

Text, Tags, Trust Funded in 2008 by IMLS With the University of Maryland, and collaborative of museum partners Studying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.

MW 2011 Contributions Which Computational Linguistic tools can or should be applied to tags? How do these tools impact tag analysis? What results differ from the initial steve.museum results from Trant 2007? So what – for CL? So what – for Museums?

How can tags be related to other tags? across languages across users ,[object Object]

Gallery Label This canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape. While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.

Tools for Tags Morphological Analysis – Conflate when possible Cats, cat Haystacks, haystack Painting, paint ? What words are verbs, nouns, adjectives? How should multi-word tags be handled?

1. NN=25205 2. JJ=6319 3. NNS=4041 4. NN_NN=2257 5. JJ_NN=1792 6. VBG=1043 7. VBN=727 8. NP=708 9. OD_NN=454 10. JJ_NNS=413

Top 10 POS Patterns: 1. NN=6706 2. NN_NN=1713 3. JJ_NN=1194 4. JJ=921 5. NNS=757 6. JJ_NNS=303 7. NN_NNS=300 8. VBG=238 9. NP=209 10. VBN_NN=202

However, for social tags, parsing is not a meaningful step. Research: ,[object Object]

Link part of speech information with other lexical resources for disambiguation,[object Object]

What About “New England” Idioms / lexicalized phrases are more difficult Heuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tags E.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War” *Klavans and Golbeck, 2010

Similar to MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

An Outline Of Type-Theoretical Approaches To Lexical SemanticsTye Rausch

Literacy Integration PresentationNAFCareerAcads

Diachronic AnalysisPierpaolo Basile

Antropologia/anthropologyWilmer Carrion

Ounl Celstec PresentationRiina Vuorikari

Vuorikari Multilingual Tagging behaviour by teachersRiina Vuorikari

MacroMicroZoom.pdfMartin Wynne

Big Data and Natural Language ProcessingMichel Bruley

Graphic literacies for a digital age the survival of layoutAsliza Hamzah

Technologies and englishesTariq Usman

Reading Streetcavalcic

Finding and Citing Online Images & SourcesWendy DeGroat

Exploring rhetoric in the Electronic EnlightenmentMartin Wynne

Class14Dr. Cupid Lucid

Animal Essay.pdfAmi Hall

Ontologies and the humanities: some issues affecting the design of digital in...Toby Burrows

Natural Language Processing with PythonBenjamin Bengfort

eMargin Presentation given to Skills Funding AgencyRDUES

Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...Bradley Allen

Vocabulary 2010 rubenaDaf

Similar to MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets (20)

An Outline Of Type-Theoretical Approaches To Lexical Semantics

Literacy Integration Presentation

Diachronic Analysis

Antropologia/anthropology

Ounl Celstec Presentation

Vuorikari Multilingual Tagging behaviour by teachers

MacroMicroZoom.pdf

Big Data and Natural Language Processing

Graphic literacies for a digital age the survival of layout

Technologies and englishes

Reading Street

Finding and Citing Online Images & Sources

Exploring rhetoric in the Electronic Enlightenment

Class14

Animal Essay.pdf

Ontologies and the humanities: some issues affecting the design of digital in...

Natural Language Processing with Python

eMargin Presentation given to Skills Funding Agency

Faceted Navigation of User-Generated Metadata (Calit2 Rescue Seminar Series 2...

Vocabulary 2010 rubena

Recently uploaded

"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays

Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

DevEX - reference for building teams, processes, and platformsSergiu Bodiu

CloudStudio User manual (basic edition):comworks

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays

Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University

Gen AI in Business - Global Trends Report 2024.pdfAddepto

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos

DMCC Future of Trade Web3 - Special EditionDubai Multi Commodity Centre

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Recently uploaded (20)

"Debugging python applications inside k8s environment", Andrii Soldatenko

Unraveling Multimodality with Large Language Models.pdf

Integration and Automation in Practice: CI/CD in Mule Integration and Automat...

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

DevEX - reference for building teams, processes, and platforms

CloudStudio User manual (basic edition):

Streamlining Python Development: A Guide to a Modern Project Setup

Are Multi-Cloud and Serverless Good or Bad?

SIP trunking in Janus @ Kamailio World 2024

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...

Nell’iperspazio con Rocket: il Framework Web di Rust!

Gen AI in Business - Global Trends Report 2024.pdf

Human Factors of XR: Using Human Factors to Design XR Systems

My Hashitalk Indonesia April 2024 Presentation

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)

DMCC Future of Trade Web3 - Special Edition

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

1. Your spoken paper cannot be the same as your written paper Read more: Museums and the Web 2011 (MW2011): Presentation Guidelines | conference.archimuse.com

2. Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul

3. ComputationalLinguistics Language - Words, Words, Words Use Meaning Syntax Shape of words Sounds

4. Applications Speech synthesis – 1980’s Talking Machines for the Blind Intelligent search – pre-google Finding names – who, what, where Translation Speech recognition Answering Questions – What is Watson?

5. Domains for Computational Linguistics Healthcare – interpreting patient records Government – helping people find information International Affairs – cross-language translation Law – analyzing Enron scandal email Marketing – Opinions on products Museums – analyzing text and tags associated with objects for better access

6. Computational Linguistics for Metadata Building +

7. Computational Linguistics in Museums: Applications for Cultural Datasets Klavans Judith Susan Robert Chun Stein Guerra Raul

8. InterdisciplinaryResearch Computational Linguistics in Museums

9. Text, Tags, Trust Funded in 2008 by IMLS With the University of Maryland, and collaborative of museum partners Studying the relationships between social tags, scholarly text and resources, and the application of trust networks to improve access to museum collections.

10. MW 2011 Contributions Which Computational Linguistic tools can or should be applied to tags? How do these tools impact tag analysis? What results differ from the initial steve.museum results from Trant 2007? So what – for CL? So what – for Museums?

11.

12.

13.

14. Gallery Label This canvas was the first one Gauguin painted during the two months he spent in Provence.... Gauguin had rebelled against Impressionism's reliance on the visible world, and he altered nature's shapes and colors to suggest his own more subjective reaction to the landscape. While the rural subject and acidic colors show the influence of van Gogh, this image is more indebted to Paul Cézanne. In his careful integration of the haystack and farm buildings, Gauguin has echoed Cézanne's emphasis on geometric form.

15. Tools for Tags Morphological Analysis – Conflate when possible Cats, cat Haystacks, haystack Painting, paint ? What words are verbs, nouns, adjectives? How should multi-word tags be handled?

16. Raw Tags or Tokens

17. Results 25% 93% 68%

18. 1. NN=25205 2. JJ=6319 3. NNS=4041 4. NN_NN=2257 5. JJ_NN=1792 6. VBG=1043 7. VBN=727 8. NP=708 9. OD_NN=454 10. JJ_NNS=413

19. Top 10 POS Patterns: 1. NN=6706 2. NN_NN=1713 3. JJ_NN=1194 4. JJ=921 5. NNS=757 6. JJ_NNS=303 7. NN_NNS=300 8. VBG=238 9. NP=209 10. VBN_NN=202

20.

21.

22.

23. Irecursor to parsing.

24.

25.

26. What About “New England” Idioms / lexicalized phrases are more difficult Heuristic comparison to Wikipedia Titles matched 46% (30% distinct) of multiword tags E.g. “Grapes of Wrath”, “Irish Wolfhound”, “Franco-Prussian War” *Klavans and Golbeck, 2010

27. Wish List - Better ways to tame the proliferation of rich but “noisy” content Clustering over tags for similarity Clustering over tags and terms from text Matching over existing terms to identify meaningful units Apply machine learning techniques to guess meaning Bigrams, Trigram, Thesauri, Corpus Analysis

28. Acknowledgements Steve.museum project members T3 and steve.museum museum partners University of Maryland, T3 group IMA Museum ……and other participants

29. Thank You! Questions?

Editor's Notes

Take this seriously.
IN presenting this paper, start with something not in the paper.
Still need to finish
Words,words, words.

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

Recommended

Recommended

More Related Content

Similar to MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

Similar to MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets (20)

More from museums and the web

More from museums and the web (20)

Recently uploaded

Recently uploaded (20)

MW2011: Klavans, J. +, Computational Linguistics in Museums: Applications for Cultural Datasets

Editor's Notes