SlideShare a Scribd company logo

PhD Day: Adaptive Entity Linking

The document discusses adaptive entity linking. It presents the motivation for entity linking as enabling reuse of web knowledge and as a first step for ontology learning. The problem is that current entity linking approaches do not work across all domains and text types. The proposed solution is to use linked data datasets and a framework called AELA for adaptive entity linking. Experiments were conducted on an annotated dataset to analyze how the definition of an entity changes across domains and to identify entity types.

1 of 36
Download to read offline
www.insight-­‐centre.org	
  www.insight-­‐centre.org	
  
Adap%ve	
  En%ty	
  Linking	
  
PhD	
  Day	
  –	
  October/2013	
  
Bianca	
  Pereira	
  
www.insight-­‐centre.org	
  
Agenda	
  
•  Mo%va%on	
  
•  Problem	
  
•  Proposed	
  Solu%on	
  
•  Experiments	
  
•  Next	
  Steps	
  
www.insight-­‐centre.org	
  
Mo%va%on	
  
•  En%ty	
  Linking	
  creates	
  links	
  from	
  men%ons	
  in	
  
text	
  to	
  en%%es	
  from	
  a	
  structured	
  knowledge	
  
base.	
  It	
  ..	
  
..	
  enables	
  reusing	
  knowledge	
  already	
  published	
  on	
  
the	
  web.	
  
..	
  can	
  be	
  used	
  as	
  the	
  first	
  step	
  for	
  ontology	
  learning	
  
and	
  popula%on	
  algorithms.	
  
www.insight-­‐centre.org	
  
Problem	
  
•  En%ty	
   Linking	
   has	
   been	
   performed	
   using	
  
generic	
  approaches.	
  
•  It	
  does	
  not	
  work	
  for	
  all	
  domains	
  and	
  types	
  of	
  
text.	
  
•  There	
  is	
  no	
  clear	
  defini%on	
  of	
  “en%ty”.	
  
www.insight-­‐centre.org	
  
Problem	
  
•  Research	
  Ques%on:	
  “How	
  to	
  adapt	
  a	
  general	
  
En%ty	
  Linking	
  Approach	
  to	
  a	
  Domain?”	
  
•  Philosophical	
  Ques%on:	
  “What	
  is	
  an	
  En%ty?”	
  
www.insight-­‐centre.org	
  
Proposed	
  Solu%on	
  
•  Usage	
  of	
  Linked	
  Data	
  datasets.	
  
•  AELA,	
   a	
   Framework	
   for	
   Adap%ve	
   En%ty	
  
Linking.	
  

Recommended

Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...
Linked Open Data case study (illegal newspapers WW2, Wikipedia, DBpedia) - Le...Olaf Janssen
 
2011_05 BayCHI Welcome Slides
2011_05 BayCHI Welcome Slides2011_05 BayCHI Welcome Slides
2011_05 BayCHI Welcome SlidesStacie Hibino
 
Beyond the Search in FamilySearch
Beyond the Search in FamilySearchBeyond the Search in FamilySearch
Beyond the Search in FamilySearchCarol Petranek
 
Exploring the "Search" in FamilySearch
Exploring the "Search" in FamilySearchExploring the "Search" in FamilySearch
Exploring the "Search" in FamilySearchCarol Petranek
 
Using Linked Open Data to crowdsource Dutch WW2 underground newspapers on Wik...
Using Linked Open Data to crowdsource Dutch WW2 underground newspapers on Wik...Using Linked Open Data to crowdsource Dutch WW2 underground newspapers on Wik...
Using Linked Open Data to crowdsource Dutch WW2 underground newspapers on Wik...Olaf Janssen
 
How To Use Family Search - Genealogy Boot Camp Part 4
How To Use Family Search - Genealogy Boot Camp Part 4How To Use Family Search - Genealogy Boot Camp Part 4
How To Use Family Search - Genealogy Boot Camp Part 4GenealogyBank
 

More Related Content

Similar to PhD Day: Adaptive Entity Linking

Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013J T "Tom" Johnson
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web ArchivesMichael Nelson
 
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)ux singapore
 
Emtacl12, mlibraries12 conferences, 2012
Emtacl12, mlibraries12 conferences, 2012Emtacl12, mlibraries12 conferences, 2012
Emtacl12, mlibraries12 conferences, 2012Kerryn Amery
 
Saving the World with Open Source and Science
Saving the World with Open Source and ScienceSaving the World with Open Source and Science
Saving the World with Open Source and ScienceAll Things Open
 
A Fractured Fairy Tale of the Internet (SI110)
A Fractured Fairy Tale of the Internet (SI110)A Fractured Fairy Tale of the Internet (SI110)
A Fractured Fairy Tale of the Internet (SI110)Charles Severance
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory WorkshopBeat Estermann
 
Urban Archaeology - Session 12: Writing for Archaeology
Urban Archaeology - Session 12: Writing for ArchaeologyUrban Archaeology - Session 12: Writing for Archaeology
Urban Archaeology - Session 12: Writing for ArchaeologyNicole Beale
 
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesIsaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesAntoine Isaac
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmekideaport
 
IT Trends for 2011: Things Might Be Very Different Today
IT Trends for 2011: Things Might Be Very Different TodayIT Trends for 2011: Things Might Be Very Different Today
IT Trends for 2011: Things Might Be Very Different TodayCharles Severance
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?cneudecker
 
0011 ICT In Physical Education
0011 ICT In Physical Education0011 ICT In Physical Education
0011 ICT In Physical EducationKeith Lyons
 
Estermann wikidata introduction-sapa-20180630
Estermann wikidata introduction-sapa-20180630Estermann wikidata introduction-sapa-20180630
Estermann wikidata introduction-sapa-20180630Beat Estermann
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our OpportunityRichard Wallis
 
Keeping Up to Date on Data Management - UC3 Data Curation Workshop
Keeping Up to Date on Data Management - UC3 Data Curation WorkshopKeeping Up to Date on Data Management - UC3 Data Curation Workshop
Keeping Up to Date on Data Management - UC3 Data Curation WorkshopCarly Strasser
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in LibrariesRichard Wallis
 

Similar to PhD Day: Adaptive Entity Linking (20)

Ar search skills
Ar search skillsAr search skills
Ar search skills
 
Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013Gold rushwriterspresentation 2013
Gold rushwriterspresentation 2013
 
Profiling Web Archives
Profiling Web ArchivesProfiling Web Archives
Profiling Web Archives
 
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
UXSG2014 Lightning Talks - Selfish accessibility (Adrian Roselli)
 
Emtacl12, mlibraries12 conferences, 2012
Emtacl12, mlibraries12 conferences, 2012Emtacl12, mlibraries12 conferences, 2012
Emtacl12, mlibraries12 conferences, 2012
 
Saving the World with Open Source and Science
Saving the World with Open Source and ScienceSaving the World with Open Source and Science
Saving the World with Open Source and Science
 
A Fractured Fairy Tale of the Internet (SI110)
A Fractured Fairy Tale of the Internet (SI110)A Fractured Fairy Tale of the Internet (SI110)
A Fractured Fairy Tale of the Internet (SI110)
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Information Update Feb 2015
Information Update Feb 2015Information Update Feb 2015
Information Update Feb 2015
 
Urban Archaeology - Session 12: Writing for Archaeology
Urban Archaeology - Session 12: Writing for ArchaeologyUrban Archaeology - Session 12: Writing for Archaeology
Urban Archaeology - Session 12: Writing for Archaeology
 
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data VocabulariesIsaac - W3C Data on the Web Best Practices - Data Vocabularies
Isaac - W3C Data on the Web Best Practices - Data Vocabularies
 
Büyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi GörmekBüyük Veriyle Büyük Resmi Görmek
Büyük Veriyle Büyük Resmi Görmek
 
IT Trends for 2011: Things Might Be Very Different Today
IT Trends for 2011: Things Might Be Very Different TodayIT Trends for 2011: Things Might Be Very Different Today
IT Trends for 2011: Things Might Be Very Different Today
 
How to read a million books?
How to read a million books?How to read a million books?
How to read a million books?
 
0011 ICT In Physical Education
0011 ICT In Physical Education0011 ICT In Physical Education
0011 ICT In Physical Education
 
Estermann wikidata introduction-sapa-20180630
Estermann wikidata introduction-sapa-20180630Estermann wikidata introduction-sapa-20180630
Estermann wikidata introduction-sapa-20180630
 
The Web of Data is Our Opportunity
The Web of Data is Our OpportunityThe Web of Data is Our Opportunity
The Web of Data is Our Opportunity
 
Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...Hosting public domain chemicals data online for the community – the challenge...
Hosting public domain chemicals data online for the community – the challenge...
 
Keeping Up to Date on Data Management - UC3 Data Curation Workshop
Keeping Up to Date on Data Management - UC3 Data Curation WorkshopKeeping Up to Date on Data Management - UC3 Data Curation Workshop
Keeping Up to Date on Data Management - UC3 Data Curation Workshop
 
Linked Data in Libraries
Linked Data in LibrariesLinked Data in Libraries
Linked Data in Libraries
 

More from Bianca Pereira

Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's blockBianca Pereira
 
HCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science projectHCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science projectBianca Pereira
 
Taxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base ConstructionTaxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base ConstructionBianca Pereira
 
How to build your topic?
How to build your topic?How to build your topic?
How to build your topic?Bianca Pereira
 
Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's blockBianca Pereira
 
Smart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's CollegeSmart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's CollegeBianca Pereira
 
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...Bianca Pereira
 
Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015Bianca Pereira
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterBianca Pereira
 
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...Bianca Pereira
 
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsPhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsBianca Pereira
 
PhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology ModularizationPhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology ModularizationBianca Pereira
 
NUIG Research Showcase 2014
NUIG Research Showcase 2014NUIG Research Showcase 2014
NUIG Research Showcase 2014Bianca Pereira
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachBianca Pereira
 
How to Make Your Content Smarter
How to Make Your Content SmarterHow to Make Your Content Smarter
How to Make Your Content SmarterBianca Pereira
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Bianca Pereira
 
Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Bianca Pereira
 

More from Bianca Pereira (17)

Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's block
 
HCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science projectHCI Challenges in Crowd4Access Citizen Science project
HCI Challenges in Crowd4Access Citizen Science project
 
Taxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base ConstructionTaxonomy Extraction for Customer Service Knowledge Base Construction
Taxonomy Extraction for Customer Service Knowledge Base Construction
 
How to build your topic?
How to build your topic?How to build your topic?
How to build your topic?
 
Dealing with writer's block
Dealing with writer's blockDealing with writer's block
Dealing with writer's block
 
Smart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's CollegeSmart Futures presentation at St. Raphael's College
Smart Futures presentation at St. Raphael's College
 
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
Compreensão de Linguagem Natural no Insight: Construindo a Ponte entre Texto ...
 
Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015Tutorial de Web Semântica - CompSem 2015
Tutorial de Web Semântica - CompSem 2015
 
DBpedia as Gaeilge Chapter
DBpedia as Gaeilge ChapterDBpedia as Gaeilge Chapter
DBpedia as Gaeilge Chapter
 
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
Entity Linking with Multiple Knowledge Bases: an Ontology Modularization Appr...
 
PhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data DatasetsPhD Day: Entity Linking using Generic Linked Data Datasets
PhD Day: Entity Linking using Generic Linked Data Datasets
 
PhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology ModularizationPhD Day: Entity Linking using Ontology Modularization
PhD Day: Entity Linking using Ontology Modularization
 
NUIG Research Showcase 2014
NUIG Research Showcase 2014NUIG Research Showcase 2014
NUIG Research Showcase 2014
 
AELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking ApproachAELA: An Adaptive Entity Linking Approach
AELA: An Adaptive Entity Linking Approach
 
How to Make Your Content Smarter
How to Make Your Content SmarterHow to Make Your Content Smarter
How to Make Your Content Smarter
 
Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)Reading Group 2013 (DERI NUIG)
Reading Group 2013 (DERI NUIG)
 
Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)Reading Group 2014 (Insight NUIG)
Reading Group 2014 (Insight NUIG)
 

Recently uploaded

Information Technology Project to Create a Business
Information Technology Project to Create a BusinessInformation Technology Project to Create a Business
Information Technology Project to Create a Businessmbowl010
 
ConFoo 2024 - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024  - Need for Speed: Removing speed bumps in API ProjectsConFoo 2024  - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024 - Need for Speed: Removing speed bumps in API ProjectsŁukasz Chruściel
 
WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024Damian Radcliffe
 
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solutionConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solutionŁukasz Chruściel
 
Practical SEO for WordPress Bloggers.pdf
Practical SEO for WordPress Bloggers.pdfPractical SEO for WordPress Bloggers.pdf
Practical SEO for WordPress Bloggers.pdfNile Flores
 
Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024Dolphin Data Lab
 
NANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonNANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonAPNIC
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPTPraveenKumarThota7
 
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonDNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonAPNIC
 
Reactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptxReactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptxJoão Esperancinha
 

Recently uploaded (10)

Information Technology Project to Create a Business
Information Technology Project to Create a BusinessInformation Technology Project to Create a Business
Information Technology Project to Create a Business
 
ConFoo 2024 - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024  - Need for Speed: Removing speed bumps in API ProjectsConFoo 2024  - Need for Speed: Removing speed bumps in API Projects
ConFoo 2024 - Need for Speed: Removing speed bumps in API Projects
 
WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024WAN-IFRA: World Press Trends Outlook 2023-2024
WAN-IFRA: World Press Trends Outlook 2023-2024
 
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solutionConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
ConFoo 2024 - Sylius 2.0, top-notch eCommerce for customizable solution
 
Practical SEO for WordPress Bloggers.pdf
Practical SEO for WordPress Bloggers.pdfPractical SEO for WordPress Bloggers.pdf
Practical SEO for WordPress Bloggers.pdf
 
Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024Seagate HDD Firmware Repair Tool Datasheet 2024
Seagate HDD Firmware Repair Tool Datasheet 2024
 
NANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff HustonNANOG 90: 'BGP in 2023' presented by Geoff Huston
NANOG 90: 'BGP in 2023' presented by Geoff Huston
 
Biometrics Technology Intresting PPT
Biometrics Technology Intresting PPTBiometrics Technology Intresting PPT
Biometrics Technology Intresting PPT
 
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff HustonDNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
DNS-OARC 42: Is the DNS ready for IPv6? presentation by Geoff Huston
 
Reactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptxReactive programming with Spring Webflux.pptx
Reactive programming with Spring Webflux.pptx
 

PhD Day: Adaptive Entity Linking

  • 1. www.insight-­‐centre.org  www.insight-­‐centre.org   Adap%ve  En%ty  Linking   PhD  Day  –  October/2013   Bianca  Pereira  
  • 2. www.insight-­‐centre.org   Agenda   •  Mo%va%on   •  Problem   •  Proposed  Solu%on   •  Experiments   •  Next  Steps  
  • 3. www.insight-­‐centre.org   Mo%va%on   •  En%ty  Linking  creates  links  from  men%ons  in   text  to  en%%es  from  a  structured  knowledge   base.  It  ..   ..  enables  reusing  knowledge  already  published  on   the  web.   ..  can  be  used  as  the  first  step  for  ontology  learning   and  popula%on  algorithms.  
  • 4. www.insight-­‐centre.org   Problem   •  En%ty   Linking   has   been   performed   using   generic  approaches.   •  It  does  not  work  for  all  domains  and  types  of   text.   •  There  is  no  clear  defini%on  of  “en%ty”.  
  • 5. www.insight-­‐centre.org   Problem   •  Research  Ques%on:  “How  to  adapt  a  general   En%ty  Linking  Approach  to  a  Domain?”   •  Philosophical  Ques%on:  “What  is  an  En%ty?”  
  • 6. www.insight-­‐centre.org   Proposed  Solu%on   •  Usage  of  Linked  Data  datasets.   •  AELA,   a   Framework   for   Adap%ve   En%ty   Linking.  
  • 7. www.insight-­‐centre.org   Experiments   •  What  is  an  En%ty?     •  What  have  been  iden%fied  as  en%%es?   •  How  to  manually  detect  en%%es  from  text?   •  How  the  defini%on  of  En%ty  change  from  one   domain  to  another?      
  • 8. www.insight-­‐centre.org   Experiments   •  What  is  an  En%ty?     •  What  have  been  iden8fied  as  en88es?   •  How  to  manually  detect  en%%es  from  text?   •  How  the  defini%on  of  En%ty  change  from  one   domain  to  another?      
  • 9. www.insight-­‐centre.org   Experiments   •  AIDA-­‐CoNLL  annotated  dataset   – 1,387  Reuters  documents  (some  of  them  are   tables)   – Annota%on  of  en%%es  with  links  to  Wikipedia.  
  • 10. www.insight-­‐centre.org   Experiments   •  AIDA-­‐CoNLL  annotated  dataset   – 1,387  Reuters  documents  (some  of  them  are   tables)   – Annota%on  of  en88es  with  links  to  Wikipedia.                ?
  • 11. www.insight-­‐centre.org   Experiments  –  AIDA  CoNLL   •  Proper  Nouns:  5576   – Names  ini%ated  by  a  capitalized  leber   •  Acronyms:  712   – Names  with  all  lebers  in  upper  case   •  Others:  20  
  • 12. www.insight-­‐centre.org   AIDA  CoNLL  –  Proper  Nouns   •  German   •  Bri%sh   •  European  Commission   •  Germany   •  European  Union   •  Britain   •  Commission   •  Franz  Fischler   •  France   •  Spanish   •  Loyola  de  Palacio   •  Europe   •  Bonn   •  Hendrix     •  U.S.   •  Jimi  Hendrix   •  English   •  Noengham   •  Australian   •  China   •  Taiwan   •  Taipei   •  Taiwan  Strait   •  Ukraine   •  Taiwanese   •  Lien  Chan   •  Chinese   •  Foreign  Ministry    
  • 13. www.insight-­‐centre.org   AIDA  CoNLL  –  Proper  Nouns   •  German   •  Bri%sh   •  European  Commission   •  Germany   •  European  Union   •  Britain   •  Commission   •  Franz  Fischler   •  France   •  Spanish   •  Loyola  de  Palacio   •  Europe   •  Bonn   •  Hendrix     •  U.S.   •  Jimi  Hendrix   •  English   •  Noengham   •  Australian   •  China   •  Taiwan   •  Taipei   •  Taiwan  Strait   •  Ukraine   •  Taiwanese   •  Lien  Chan   •  Chinese   •  Foreign  Ministry    
  • 14. www.insight-­‐centre.org   AIDA  CoNLL  –  “Acronyms”   •  BRUSSELS   •  BSE   •  LONDON   •  BEIJING   •  FRANKFURT   •  GREEK   •  ATHENS   •  BAYERISCHE  VEREINSBANK   •  SWEDISH   •  SWEDEN   •  JERUSALEM   •  TUNIS   •  KDPI   •  PUK   •  KDP   •  MANAMA   •  UAE   •  DUBAI   •  BEIRUT   •  AN-­‐NAHAR   •  AS-­‐SAFIR   •  AD-­‐DIYAR   •  CME   •  CHICAGO   •  MONTGOMERY   •  SNET   •  PHOENIX   •  PARIS  
  • 15. www.insight-­‐centre.org   AIDA  CoNLL  -­‐  Others   •  interior  ministry   •  neo-­‐Nazi   •  neo-­‐Nazism   •  post-­‐Soviet   •  van  der  Sar   •  1860  Munich   •  serie  A   •  1990  World  Cup   •  1992  European  championship   •  2,000  Guineas   •  2000  Games   •  pan-­‐Turkism   •  al-­‐Akhbar   •  al-­‐Ram   •  1997  FED  CUP   •  1998  World  Cup   •  1995  World  Cup   •  1.  FC  Cologne   •  post-­‐Communist   •  cocker  spaniels  
  • 16. www.insight-­‐centre.org   AIDA  CoNLL  -­‐  Others   •  interior  ministry   •  neo-­‐Nazi   •  neo-­‐Nazism   •  post-­‐Soviet   •  van  der  Sar   •  1860  Munich   •  serie  A   •  1990  World  Cup   •  1992  European  championship   •  2,000  Guineas   •  2000  Games   •  pan-­‐Turkism   •  al-­‐Akhbar   •  al-­‐Ram   •  1997  FED  CUP   •  1998  World  Cup   •  1995  World  Cup   •  1.  FC  Cologne   •  post-­‐Communist   •  cocker  spaniels  
  • 17. www.insight-­‐centre.org   AIDA  CoNLL  -­‐  Others   •  interior  ministry   •  neo-­‐Nazi   •  neo-­‐Nazism   •  post-­‐Soviet   •  van  der  Sar   •  1860  Munich   •  serie  A   •  1990  World  Cup   •  1992  European  championship   •  2,000  Guineas   •  2000  Games   •  pan-­‐Turkism   •  al-­‐Akhbar   •  al-­‐Ram   •  1997  FED  CUP   •  1998  World  Cup   •  1995  World  Cup   •  1.  FC  Cologne   •  post-­‐Communist   •  cocker  spaniels  
  • 18. www.insight-­‐centre.org   AIDA  CoNLL   •  SOCCER  -­‐  GERMAN  FIRST  DIVISION  RESULTS  /  STANDINGS.  BONN   1996-­‐12-­‐06  Results  of  German  first  division  soccer  matches  played   on  Friday  :  Bochum  2  Bayer  Leverkusen  2  Werder  Bremen  1  1860   Munich  1  Karlsruhe  3  Freiburg  0  Schalke  2  Hansa  Rostock  0   Standings  (  tabulated  under  played,  won,  drawn,  lost,  goals  for   goals  against  points  )  :  Bayer  Leverkusen  17  10  4  3  38  22  34  Bayern   Munich  16  9  6  1  26  14  33  VfB  Stubgart  16  9  4  3  39  17  31  Borussia   Dortmund  16  9  4  3  33  17  31  Karlsruhe  17  8  4  5  30  20  28  VfL   Bochum  16  7  6  3  23  21  27  1.  FC  Cologne  16  8  2  6  31  27  26  Schalke   04  17  7  4  6  25  26  25  Werder  Bremen  17  6  4  7  29  28  22  MSV   Duisburg  16  5  4  7  16  22  19  SV  1860  Munich  17  4  6  7  25  31  18  FC  St.   Pauli  15  5  3  7  21  28  18  Fortuna  Dusseldorf  16  5  3  8  13  24  18   Hamburger  SV  16  4  5  7  20  25  17  Arminia  Bielefeld  16  4  4  8  18  28  16   FC  Hansa  Rostock  17  4  3  10  19  26  15  Borussia  Monchengladbach  16   4  3  9  12  22  15  SC  Freiburg  17  4  1  12  20  40  13  
  • 19. www.insight-­‐centre.org   AIDA  CoNLL  –  Some  findings   •  Syntac%c  structure  does  not  help  in  all  cases.   – Proper   Nouns   may   not   be   ini%alized   by   a   capitalized  leber.   – Not   all   words   with   all   lebers   in   upper   case   are   Acronyms.   •  There   may   be   some   “men%on   boundary”   problems  even  on  manual  annota%on.  
  • 20. www.insight-­‐centre.org   AIDA  CoNLL   •  5596  en%%es   •  6308  different  men%on  strings  
  • 21. www.insight-­‐centre.org   AIDA  CoNLL   •  1110  en%%es  with  name  varia%ons.   hbp://en.wikipedia.org/wiki/New_York_Jets     New  York  Jets   NY  JETS   hbp://en.wikipedia.org/wiki/Butch_Harmon   Butch  Harmon   Butch   hbp://en.wikipedia.org/wiki/Norway     Norway   Norwegian   hbp://en.wikipedia.org/wiki/Cincinna%_Reds   Cincinna%  Reds   CINCINNATI  Reds   hbp://en.wikipedia.org/wiki/Republika_Srpska   Bosnian  Serb   Republika  Srpska   hbp://en.wikipedia.org/wiki/John_Smoltz   John  Smoltz   Smoltz   hbp://en.wikipedia.org/wiki/Rede_Globo   TV  Globo   Globo   hbp://en.wikipedia.org/wiki/London_Wasps   London   Wasps   hbp://en.wikipedia.org/wiki/Chicago_Cubs   CHICAGO   CUBS   Chicago  Cubs   hbp://en.wikipedia.org/wiki/England_cricket_team   ENGLAND   Englishmen   hbp://en.wikipedia.org/wiki/Alexander_Downer   Alexander  Downer   Downer   hbp://en.wikipedia.org/wiki/Wales   Wales   Welsh  
  • 22. www.insight-­‐centre.org   AIDA  CoNLL   •  1110  en%%es  with  name  varia%ons.   hbp://en.wikipedia.org/wiki/New_York_Jets     New  York  Jets   NY  JETS   hbp://en.wikipedia.org/wiki/Butch_Harmon   Butch  Harmon   Butch   hCp://en.wikipedia.org/wiki/Norway     Norway   Norwegian   hbp://en.wikipedia.org/wiki/Cincinna%_Reds   Cincinna%  Reds   CINCINNATI  Reds   hbp://en.wikipedia.org/wiki/Republika_Srpska   Bosnian  Serb   Republika  Srpska   hbp://en.wikipedia.org/wiki/John_Smoltz   John  Smoltz   Smoltz   hbp://en.wikipedia.org/wiki/Rede_Globo   TV  Globo   Globo   hbp://en.wikipedia.org/wiki/London_Wasps   London   Wasps   hbp://en.wikipedia.org/wiki/Chicago_Cubs   CHICAGO   CUBS   Chicago  Cubs   hCp://en.wikipedia.org/wiki/England_cricket_team   ENGLAND   Englishmen   hbp://en.wikipedia.org/wiki/Alexander_Downer   Alexander  Downer   Downer   hCp://en.wikipedia.org/wiki/Wales   Wales   Welsh  
  • 23. www.insight-­‐centre.org   AIDA  CoNLL  –  Some  findings   •  Use  of  metonymy.   •  Disambigua%on  (Norway  vs.  Norwegians).   •  Men%on  to  an  en%ty  using  part  of  the  name.  
  • 24. www.insight-­‐centre.org   AIDA  CoNLL   •  434  ambiguous  men%on  strings  (corpus  level)   French   hbp://en.wikipedia.org/wiki/France     hbp://en.wikipedia.org/wiki/France_na%onal_football_team   NORTHAMPTON   hbp://en.wikipedia.org/wiki/Northampton   hbp://en.wikipedia.org/wiki/Northampton_Town_F.C.   hbp://en.wikipedia.org/wiki/Northamptonshire_County_Cricket_Club   hbp://en.wikipedia.org/wiki/Northampton_Saints   West   hbp://en.wikipedia.org/wiki/Western_World   hbp://en.wikipedia.org/wiki/American_League_West   Volkswagen  AG   hbp://en.wikipedia.org/wiki/Volkswagen   hbp://en.wikipedia.org/wiki/Volkswagen_Group   EDMONTON   hbp://en.wikipedia.org/wiki/Edmonton   hbp://en.wikipedia.org/wiki/Edmonton_Oilers   Rangers   hbp://en.wikipedia.org/wiki/Texas_Rangers_(baseball)   hbp://en.wikipedia.org/wiki/Rangers_F.C.   Va%can   hbp://en.wikipedia.org/wiki/Holy_See   hbp://en.wikipedia.org/wiki/Va%can_Library   hbp://en.wikipedia.org/wiki/Va%can_City   Shell   hbp://en.wikipedia.org/wiki/Shell_Turbo_Chargers   hbp://en.wikipedia.org/wiki/Shell_Oil_Company   Irish   hbp://en.wikipedia.org/wiki/Republic_of_Ireland   hbp://en.wikipedia.org/wiki/Republic_of_Ireland_na%onal_football_team   hbp://en.wikipedia.org/wiki/Northern_Ireland  
  • 25. www.insight-­‐centre.org   AIDA  CoNLL   •  190  ambiguous  men%on  strings  (document)   17  Iraq   BAGHDAD   hbp://en.wikipedia.org/wiki/Baghdad   hbp://en.wikipedia.org/wiki/Iraq   965testa  SOCCER   SILVA   hbp://en.wikipedia.org/wiki/Mario_Silva   hbp://en.wikipedia.org/wiki/Mauro_Silva   1102testa  SOCCER   WORLD  CUP   hCp://en.wikipedia.org/wiki/1998_FIFA_World_Cup   hCp://en.wikipedia.org/wiki/FIFA_World_Cup   791  PRESS   Chinese   hbp://en.wikipedia.org/wiki/People’s_Republic_of_China   hbp://en.wikipedia.org/wiki/Chinese_language   179  Soccer   Liechenstein   hCp://en.wikipedia.org/wiki/Liechtenstein_na8onal_football_team   hCp://en.wikipedia.org/wiki/Liechtenstein   703  Cricket   Pakistan   hbp://en.wikipedia.org/wiki/Pakistan_na%onal_cricket_team   hbp://en.wikipedia.org/wiki/Pakistan   1323testb  Frankfurt   Frankfurt   hbp://en.wikipedia.org/wiki/Frankfurt_Stock_Exchange   hbp://en.wikipedia.org/wiki/Frankfurt_am_Main   1054testa  CRICKET   ENGLAND   hbp://en.wikipedia.org/wiki/England_cricket_team   hbp://en.wikipedia.org/wiki/England  
  • 26. www.insight-­‐centre.org   AIDA  CoNLL  –  Some  findings   •  Even  misspelled  text  is  marked.   •  “Classes”  and  “instances”  are  annotated.  
  • 27. www.insight-­‐centre.org   AIDA  CoNLL   •  39  Classes   hbp://dbpedia.org/ontology/Agent   2579   hbp://xmlns.com/foaf/0.1/Person   426   hbp://dbpedia.org/ontology/Place   333   hbp://dbpedia.org/ontology/City   234   hbp://dbpedia.org/ontology/Country   194   hbp://dbpedia.org/ontology/Administra%veRegion   76   hCp://dbpedia.org/ontology/Newspaper   55   hbp://dbpedia.org/ontology/ArchitecturalStructure   39   hCp://dbpedia.org/ontology/EthnicGroup   30   hbp://dbpedia.org/ontology/Airport   21   hCp://dbpedia.org/ontology/Event   18   hbp://dbpedia.org/ontology/Island   12   hCp://dbpedia.org/ontology/Film   10   hbp://dbpedia.org/ontology/BodyOfWater   10  
  • 28. www.insight-­‐centre.org   AIDA  CoNLL  –  Some  findings   •  Not  only  Person,  Loca%on  and  Organiza%on.  
  • 29. www.insight-­‐centre.org   Experiments   •  How  were  those  en%%es  annotated?   •  Which  Wikipedia  pages  were  chosen  as   represen%ng  en%%es?  
  • 30. www.insight-­‐centre.org   Experiments   •  How  were  those  en%%es  annotated?   •  Which  Wikipedia  pages  were  chosen  as   represen%ng  en%%es?   •  What  is  the  Annota8on  Guideline?  
  • 31. www.insight-­‐centre.org   Experiments   •  What  is  an  En%ty?     •  What  have  been  iden%fied  as  en%%es?   •  How  to  manually  detect  en88es  from  text?   •  How  the  defini%on  of  En%ty  change  from  one   domain  to  another?      
  • 32. www.insight-­‐centre.org   Experiments   •  Survey  on  Annota%on  Guidelines   – Ques%on:  “Is  there  any  guideline  for  en%ty   annota%on?”   – Search  Strategy:   •  Papers  from  “en%ty  annota%on  guidelines”.   •  Guidelines  from  annotated  corpora  provided  by  En%ty   Recogni%on,  Disambigua%on  and  Linking  challenges.  
  • 33. www.insight-­‐centre.org   Experiments   •  Survey  on  Annota%on  Guidelines   – Common   Problems   (differ   from   one   domain   to   another)   •  Men%on  Boundaries   •  Name  varia%ons   •  Metonymy   – Annota%on  Process   – Evalua%on  
  • 34. www.insight-­‐centre.org   Next  Steps   •  Corpus  Sampling  for  Annota%on   •  Development  of  Annota%on  Guidelines   – Domain/Task  dependent   – Itera%ve  Process   •  Domains:   – Touris%c  Domain  (TripAdvisor  corpus)   – Electronics  Domain   – Other  
  • 35. www.insight-­‐centre.org   Next  Steps   •  What  is  an  En%ty?     •  What  have  been  iden%fied  as  en%%es?   •  How  to  manually  detect  en%%es  from  text?   •  How  the  defini8on  of  En8ty  change  from  one   domain  to  another?      
  • 36. www.insight-­‐centre.org   Next  Steps   •  What  is  an  En%ty?     •  What  have  been  iden%fied  as  en%%es?   •  How  to  manually  detect  en%%es  from  text?   •  How  the  defini%on  of  En%ty  change  from  one   domain  to  another?   •  How  to  iden8fy  the  most  frequent  classes  in   a  domain?