SlideShare a Scribd company logo
Reusing Linguistic Resources
Tasks and Goals for a Linked Data Approach


              Marieke van Erp
              marieke@cs.vu.nl



                    LDL2012
Introduction

• BA, MA & PhD compling/
  information extraction
  @Tilburg University

• Since 2009: SemWeb group
  @VU University Amsterdam
Why Reuse Linguistic
                   Resources?
• Linguistic resources are
    expensive to create
•   ...and difficult to use for
    ‘outsiders’

• How can we reach out to the
    ‘outside world’?



                                 Image Source: http://cyberbrethren.com/wp-content/uploads/2012/02/language1.jp
Make reuse easier!


• Increased visibility
• Social value:
     • stimulates collaboration
     • accelerates innovation
• External quality control




                                  Image Source: http://th02.deviantart.net/fs71/PRE/i/2010/146/b/3/
                                            DON__T_PANIC_by_VigilantMeadow.jpg
What’s holding us back?



• Fear?
• Habit?




             Image Source: h http://mindfulbalance.files.wordpress.com/2011/02/hesitate1.jpg
Practical Constraints

1. Task specificity
2. Formats
3. Different conceptual
   models
4. No machine-readable
   definitions
5. Lack of metadata



                          Image Source: http://bogdankipko.com/wp-content/uploads/2011/12/barriers.jpg
1. Task-specificity


• Resources are often geared
  towards one specific task
  e.g., part-of-speech tagging,
  named entity recognition

• How can we make our
  resources more flexible?



                                  Image Source: http://thelearnersguild.files.wordpress.com/2008/07/the-informal-
                                                                learners-toolkit1.jpg
2. Formats

• XML, inline XML, CSV, one
  word per line, one sentence
  per line, slashtags, ARFF,




                                Image Source: http://www.elec-intro.com/EX/05-13-03/kf_compact_data.jpg
3. Conceptual Models
• An NP is an NP is an NP?
• “President Obama signed the
  National Defense
  Authorization Act after
  months of debate”
  • NE: “President Obama”?
  • NE: “Obama”?

                                Image Source: http://www.w3.org/2001/sw/BestPractices/WNET/wordnet-
                                                        sw-20040713-fig01.png
4. Lack of Machine-
               Readable definitions
• For integration or reuse
  manual effort is needed
  • time consuming
  • difficult to track definitions
  • not scalable




                                   Image Source: http://www.barcode1.co.uk/images/samplejplarge.jpg
5. Lack of Metadata

• Can I trust this data provider?
• How was this data created?
• How many annotators?
  • for the entire data set?
  • per instance?
• If generated automatically,
  what were the parameters?



                                    Image Source: http://darwin-online.org.uk/converted/published/
                                           1859_Origin_F373/1859_Origin_F373_fig02.jpg
A Linked Data Approach
• Linked Data is not a magic
  solution to all problems

• ...but it is better than what
  we’ve got at this moment




                                  Image Source: http://linkeddata.org/static/images/lod-
                                          datasets_2009-07-14_cropped.png
1. Using RDF

• RDF is not inherently better
  than some other formats, but
  it is used by many

• + SPARQL makes it easy to
  retrieve data



                                 Image Source: http://www.247ha.com/images/rdf.jpg
2. Mapping Annotations
• A single conceptual
  model for all linguistic
  resources is not going
  to happen

• ...but can we spot the
  similarities between
  models and utilise
  that?


                             Image Source:http://www.webology.org/2006/v3n3/images/sample.JPG
3. Grounding
• It’s only linked data if you link
  it to other sources

• Added bonus: automatic
  sense disambiguation + access
  to a wealth of extra
  knowledge about your data
  item


                                      Image Source: http://mj-services.com/wallpaper/More_WallPaper/Trees/Giants,
                                        %20Calaveras%20State%20Park%20-%201600x1200%20-%20ID%2015.jpg
4. Define Your Metadata
• Include your data model
• Preferably give each instance’s
  provenance
    • collection
    • annotation/creation
    • previous versions
    • confidence


                                    Image Source: http://www.wineaustralia.com/australia/Portals/2/November%20E-
                                                     news/Wines%20of%20Provenance%20Final.jpg
Conclusions
• Look for similarities between
    resources
•   Say where your resource
    comes from
•   Use standards, or make it
    easy for others to convert
    your data to a standard
•   Link to other data


                                  Image Source: http://efr0702.files.wordpress.com/2012/02/puzzle.jpg
Questions?



marieke@cs.vu.nl
http://www.cs.vu.nl/~marieke        Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg?
                                               __SQUARESPACE_CACHEVERSION=1295297003883
Acknowledgment

• This work is funded by
  NWO in the CATCH
  programme, grant
  640.004.801

More Related Content

What's hot

Escaping Datageddon
Escaping DatageddonEscaping Datageddon
Escaping Datageddon
Dorothea Salo
 
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] SilosOpening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Ken Varnum
 
Troubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL DataTroubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL Data
NASIG
 
Troubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL dataTroubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL data
Beth Ashmore
 
Keeping up to date
Keeping up to dateKeeping up to date
Keeping up to date
Kara Jones
 
Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?
Nick Sheppard
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managers
Nick Sheppard
 

What's hot (7)

Escaping Datageddon
Escaping DatageddonEscaping Datageddon
Escaping Datageddon
 
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] SilosOpening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
Opening What's Closed: Using Open Source Tools to Tear Down [Vendor] Silos
 
Troubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL DataTroubleshooting Electronic Resources with ILL Data
Troubleshooting Electronic Resources with ILL Data
 
Troubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL dataTroubleshooting electronic resources with ILL data
Troubleshooting electronic resources with ILL data
 
Keeping up to date
Keeping up to dateKeeping up to date
Keeping up to date
 
Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?Libraries, OA research and OER: towards symbiosis?
Libraries, OA research and OER: towards symbiosis?
 
OER for repository managers
OER for repository managersOER for repository managers
OER for repository managers
 

Viewers also liked

Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events
Marieke van Erp
 
Agora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic contextAgora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic context
Marieke van Erp
 
KM Lecture 7 LOD
KM Lecture 7 LODKM Lecture 7 LOD
KM Lecture 7 LOD
Marieke van Erp
 
Agora User Interviews
Agora User InterviewsAgora User Interviews
Agora User Interviews
Marieke van Erp
 
Richness oftheworld2012
Richness oftheworld2012Richness oftheworld2012
Richness oftheworld2012
Marieke van Erp
 
NewsReader: Automating detective work
NewsReader: Automating detective workNewsReader: Automating detective work
NewsReader: Automating detective work
Marieke van Erp
 
Knowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QAKnowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QA
Marieke van Erp
 
DeRiVE opening
DeRiVE openingDeRiVE opening
DeRiVE opening
Marieke van Erp
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
Marieke van Erp
 
KM2012 Lecture 1: introduction
KM2012 Lecture 1: introductionKM2012 Lecture 1: introduction
KM2012 Lecture 1: introduction
Marieke van Erp
 
2 ontologies I
2 ontologies I2 ontologies I
2 ontologies I
Marieke van Erp
 
KM Lecture11 nlp/nif
KM Lecture11 nlp/nifKM Lecture11 nlp/nif
KM Lecture11 nlp/nif
Marieke van Erp
 
Automatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from TwitterAutomatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from Twitter
Marieke van Erp
 

Viewers also liked (13)

Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events Automatic Heritage Metadata Enrichment with Historic Events
Automatic Heritage Metadata Enrichment with Historic Events
 
Agora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic contextAgora: putting museum objects into their art-historic context
Agora: putting museum objects into their art-historic context
 
KM Lecture 7 LOD
KM Lecture 7 LODKM Lecture 7 LOD
KM Lecture 7 LOD
 
Agora User Interviews
Agora User InterviewsAgora User Interviews
Agora User Interviews
 
Richness oftheworld2012
Richness oftheworld2012Richness oftheworld2012
Richness oftheworld2012
 
NewsReader: Automating detective work
NewsReader: Automating detective workNewsReader: Automating detective work
NewsReader: Automating detective work
 
Knowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QAKnowledge and Media 2012 Lecture 10: Research proposal QA
Knowledge and Media 2012 Lecture 10: Research proposal QA
 
DeRiVE opening
DeRiVE openingDeRiVE opening
DeRiVE opening
 
Lecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and VisualisationLecture 5: Mining, Analysis and Visualisation
Lecture 5: Mining, Analysis and Visualisation
 
KM2012 Lecture 1: introduction
KM2012 Lecture 1: introductionKM2012 Lecture 1: introduction
KM2012 Lecture 1: introduction
 
2 ontologies I
2 ontologies I2 ontologies I
2 ontologies I
 
KM Lecture11 nlp/nif
KM Lecture11 nlp/nifKM Lecture11 nlp/nif
KM Lecture11 nlp/nif
 
Automatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from TwitterAutomatic Extraction of Soccer Game Event Data from Twitter
Automatic Extraction of Soccer Game Event Data from Twitter
 

Similar to Ldl2012

How to build a better mousetrap final
How to build a better mousetrap finalHow to build a better mousetrap final
How to build a better mousetrap finalJeannie Castro
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
Projeto RCAAP
 
DataUp at ACRL 2013
DataUp at ACRL 2013DataUp at ACRL 2013
DataUp at ACRL 2013
Carly Strasser
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
National Information Standards Organization (NISO)
 
A tour of the library of the future
A tour of the library of the futureA tour of the library of the future
A tour of the library of the future
Bethan Ruddock
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
Roberto García
 
Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2
Yum Studio
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
Adrian Stevenson
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
IUPUI
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
Mathieu d'Aquin
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
Bhupesh Bansal
 
Rapid eLearning
Rapid eLearning Rapid eLearning
Rapid eLearning
Yum Studio
 
UCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference WorkshopUCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference Workshop
Mike Moore
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingUniversity of Arizona
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
Filip Radulovic
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Joaquin Delgado PhD.
 

Similar to Ldl2012 (20)

How to build a better mousetrap final
How to build a better mousetrap finalHow to build a better mousetrap final
How to build a better mousetrap final
 
The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...The state of global research data initiatives: observations from a life on th...
The state of global research data initiatives: observations from a life on th...
 
DataUp at ACRL 2013
DataUp at ACRL 2013DataUp at ACRL 2013
DataUp at ACRL 2013
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
NISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to RealityNISO Webinar: Library Linked Data: From Vision to Reality
NISO Webinar: Library Linked Data: From Vision to Reality
 
A tour of the library of the future
A tour of the library of the futureA tour of the library of the future
A tour of the library of the future
 
Exploring the Semantic Web
Exploring the Semantic WebExploring the Semantic Web
Exploring the Semantic Web
 
Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2Acpet Moodle from Scratch Version 2
Acpet Moodle from Scratch Version 2
 
Designing e-Learning Objects
Designing e-Learning ObjectsDesigning e-Learning Objects
Designing e-Learning Objects
 
Introduction to APIs and Linked Data
Introduction to APIs and Linked DataIntroduction to APIs and Linked Data
Introduction to APIs and Linked Data
 
Preventing data loss
Preventing data lossPreventing data loss
Preventing data loss
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Infographics - 2012 E3 Conestoga
Infographics - 2012 E3 ConestogaInfographics - 2012 E3 Conestoga
Infographics - 2012 E3 Conestoga
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Rapid eLearning
Rapid eLearning Rapid eLearning
Rapid eLearning
 
UCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference WorkshopUCISA Learning Anaytics Pre-Conference Workshop
UCISA Learning Anaytics Pre-Conference Workshop
 
Provenance Management to Enable Data Sharing
Provenance Management to Enable Data SharingProvenance Management to Enable Data Sharing
Provenance Management to Enable Data Sharing
 
Linked Energy Data Generation
Linked Energy Data GenerationLinked Energy Data Generation
Linked Energy Data Generation
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine LearningLucene/Solr Revolution 2015: Where Search Meets Machine Learning
Lucene/Solr Revolution 2015: Where Search Meets Machine Learning
 

More from Marieke van Erp

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
Marieke van Erp
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
Marieke van Erp
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
Marieke van Erp
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
Marieke van Erp
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
Marieke van Erp
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
Marieke van Erp
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
Marieke van Erp
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
Marieke van Erp
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Marieke van Erp
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Marieke van Erp
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Marieke van Erp
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Marieke van Erp
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
Marieke van Erp
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
Marieke van Erp
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
Marieke van Erp
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
Marieke van Erp
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
Marieke van Erp
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
Marieke van Erp
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Marieke van Erp
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Marieke van Erp
 

More from Marieke van Erp (20)

Towards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH SymposiumTowards Culturally Aware AI Systems - TSDH Symposium
Towards Culturally Aware AI Systems - TSDH Symposium
 
A Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic WebA Polyvocal and Contextualised Semantic Web
A Polyvocal and Contextualised Semantic Web
 
AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit AI x Digital Humanities = > Inclusiviteit
AI x Digital Humanities = > Inclusiviteit
 
Computationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and SpaceComputationally Tracing Concepts Through Time and Space
Computationally Tracing Concepts Through Time and Space
 
The Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital HumanitiesThe Hitchhiker's Guide to the Future of Digital Humanities
The Hitchhiker's Guide to the Future of Digital Humanities
 
Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)Why language technology can’t handle Game of Thrones (yet)
Why language technology can’t handle Game of Thrones (yet)
 
(Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research (Beyond) Combining Text and Tables for qualitative and quantitative research
(Beyond) Combining Text and Tables for qualitative and quantitative research
 
Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...Finding common ground between text, maps, and tables for quantitative and qua...
Finding common ground between text, maps, and tables for quantitative and qua...
 
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchSlicing and Dicing a Newspaper Corpus for Historical Ecology Research
Slicing and Dicing a Newspaper Corpus for Historical Ecology Research
 
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...
 
Good Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologistsGood Lynx, bad Lynx: Document enrichment for historical ecologists
Good Lynx, bad Lynx: Document enrichment for historical ecologists
 
Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case Towards Semantic Enrichment of Newspapers: a historical ecology use case
Towards Semantic Enrichment of Newspapers: a historical ecology use case
 
Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition Natural Language Processing en Named Entity Recognition
Natural Language Processing en Named Entity Recognition
 
HuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the ConversationHuC lecture - Digital and Humanities: Continuing the Conversation
HuC lecture - Digital and Humanities: Continuing the Conversation
 
Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing Multilingual Fine-grained Entity Typing
Multilingual Fine-grained Entity Typing
 
Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia Entity Typing Using Distributional Semantics and DBpedia
Entity Typing Using Distributional Semantics and DBpedia
 
Entity Typing and Event Extraction
Entity Typing and Event Extraction Entity Typing and Event Extraction
Entity Typing and Event Extraction
 
The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...The domain as unifier, how focusing on social history can bring technical fie...
The domain as unifier, how focusing on social history can bring technical fie...
 
Evaluating entity linking an analysis of current benchmark datasets and a ro...
Evaluating entity linking  an analysis of current benchmark datasets and a ro...Evaluating entity linking  an analysis of current benchmark datasets and a ro...
Evaluating entity linking an analysis of current benchmark datasets and a ro...
 
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...Finding Stories in 1,784,532 Events:  Scaling up computational models of narr...
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
 

Recently uploaded

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Ldl2012

  • 1. Reusing Linguistic Resources Tasks and Goals for a Linked Data Approach Marieke van Erp marieke@cs.vu.nl LDL2012
  • 2. Introduction • BA, MA & PhD compling/ information extraction @Tilburg University • Since 2009: SemWeb group @VU University Amsterdam
  • 3. Why Reuse Linguistic Resources? • Linguistic resources are expensive to create • ...and difficult to use for ‘outsiders’ • How can we reach out to the ‘outside world’? Image Source: http://cyberbrethren.com/wp-content/uploads/2012/02/language1.jp
  • 4. Make reuse easier! • Increased visibility • Social value: • stimulates collaboration • accelerates innovation • External quality control Image Source: http://th02.deviantart.net/fs71/PRE/i/2010/146/b/3/ DON__T_PANIC_by_VigilantMeadow.jpg
  • 5. What’s holding us back? • Fear? • Habit? Image Source: h http://mindfulbalance.files.wordpress.com/2011/02/hesitate1.jpg
  • 6. Practical Constraints 1. Task specificity 2. Formats 3. Different conceptual models 4. No machine-readable definitions 5. Lack of metadata Image Source: http://bogdankipko.com/wp-content/uploads/2011/12/barriers.jpg
  • 7. 1. Task-specificity • Resources are often geared towards one specific task e.g., part-of-speech tagging, named entity recognition • How can we make our resources more flexible? Image Source: http://thelearnersguild.files.wordpress.com/2008/07/the-informal- learners-toolkit1.jpg
  • 8. 2. Formats • XML, inline XML, CSV, one word per line, one sentence per line, slashtags, ARFF, Image Source: http://www.elec-intro.com/EX/05-13-03/kf_compact_data.jpg
  • 9. 3. Conceptual Models • An NP is an NP is an NP? • “President Obama signed the National Defense Authorization Act after months of debate” • NE: “President Obama”? • NE: “Obama”? Image Source: http://www.w3.org/2001/sw/BestPractices/WNET/wordnet- sw-20040713-fig01.png
  • 10. 4. Lack of Machine- Readable definitions • For integration or reuse manual effort is needed • time consuming • difficult to track definitions • not scalable Image Source: http://www.barcode1.co.uk/images/samplejplarge.jpg
  • 11. 5. Lack of Metadata • Can I trust this data provider? • How was this data created? • How many annotators? • for the entire data set? • per instance? • If generated automatically, what were the parameters? Image Source: http://darwin-online.org.uk/converted/published/ 1859_Origin_F373/1859_Origin_F373_fig02.jpg
  • 12. A Linked Data Approach • Linked Data is not a magic solution to all problems • ...but it is better than what we’ve got at this moment Image Source: http://linkeddata.org/static/images/lod- datasets_2009-07-14_cropped.png
  • 13. 1. Using RDF • RDF is not inherently better than some other formats, but it is used by many • + SPARQL makes it easy to retrieve data Image Source: http://www.247ha.com/images/rdf.jpg
  • 14. 2. Mapping Annotations • A single conceptual model for all linguistic resources is not going to happen • ...but can we spot the similarities between models and utilise that? Image Source:http://www.webology.org/2006/v3n3/images/sample.JPG
  • 15. 3. Grounding • It’s only linked data if you link it to other sources • Added bonus: automatic sense disambiguation + access to a wealth of extra knowledge about your data item Image Source: http://mj-services.com/wallpaper/More_WallPaper/Trees/Giants, %20Calaveras%20State%20Park%20-%201600x1200%20-%20ID%2015.jpg
  • 16. 4. Define Your Metadata • Include your data model • Preferably give each instance’s provenance • collection • annotation/creation • previous versions • confidence Image Source: http://www.wineaustralia.com/australia/Portals/2/November%20E- news/Wines%20of%20Provenance%20Final.jpg
  • 17. Conclusions • Look for similarities between resources • Say where your resource comes from • Use standards, or make it easy for others to convert your data to a standard • Link to other data Image Source: http://efr0702.files.wordpress.com/2012/02/puzzle.jpg
  • 18. Questions? marieke@cs.vu.nl http://www.cs.vu.nl/~marieke Image Source: http://www.amichelleblakeley.com/storage/question%20marks.jpg? __SQUARESPACE_CACHEVERSION=1295297003883
  • 19. Acknowledgment • This work is funded by NWO in the CATCH programme, grant 640.004.801