The NERD project
Upcoming SlideShare
Loading in...5
×
 

The NERD project

on

  • 2,513 views

Seminar at Ecole Centrale, Paris, France

Seminar at Ecole Centrale, Paris, France

Statistics

Views

Total Views
2,513
Views on SlideShare
1,553
Embed Views
960

Actions

Likes
4
Downloads
19
Comments
0

2 Embeds 960

http://www.linkedtv.eu 959
http://webcache.googleusercontent.com 1

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The NERD project The NERD project Presentation Transcript

  • Giuseppe Rizzo <giuseppe.rizzo@eurecom.fr>
  • What is a Named Entity recognition task? A task that aims to locate and classify the name of a person or an organization, a location, a brand, a product, a numeric expression including time, date, money and percent in a textual document 12 March 2012 Seminar @ Ecole Centrale, Paris 2/21
  • History of NER benchmarks CoNLL 2003 and CoNLL 2005  schema (4 types): person, organization, location and miscellaneous  language independent task ACE 2004, ACE 2005 and ACE 2007  schema (7 types): person, organization, location, facility, weapon, vehicle and geo-political entity  entity recognition, not just name (e.g. description, pronoun)  find relationships among entities extracted TAC 2009 (Knowledge Base Track)  schema (3 types): person, organization and location  create a knowledge base from the named entities extracted ETAPE 2012 (Named Entity Task)  schema: Quaero (7 main types, 32 sub-types) 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -3
  • NER Tools Standalone software  GATE  Stanford CoreNLP  Temis Web APIs 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -4
  • Factual comparison of 10 Web NER tools Alchemy DBpe Evri Extr Lup Calais Saplo WM Yahoo Zemanta diaGranularity OEN OEN OED OEN OEN OEN OED OEN OEN OEDLanguage EN EN EN EN EN EN EN EN EN EN FR GR* IT FR FR SW FR GR PT* IT SP SP IT SP* PT RU SP SWQuota 30000 unl 3000 3000 unl 50000 1333 unl 5000 10000(calls/day)Sample C/C++ Java AS Java N/A Java Java Java JS C#Clients C# JS Java JS Perl PHP Java Java PHP PHP PHP JS Perl Python Perl PHP5 PHP Python Python Ruby RubyContent 150KB 452KB 8KB 32KB 20KB 8KB 26KB 80KB 7769KB 970KBchunk 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -5
  • Alchemy DBpedia Evri Extr Lup Calais Saplo WM Yahoo ZemantaResponse JSON Factual comparison (II) HTML HTM HTML HTML JSON JSON JSON JSON XMLFormat MicroF JSON L JSON JSON MicroF XML XML JSON XML RDF JSO RDF RDFa RDF RDF XML N XML XML RDFEntity 324 320 5 34 319 95 5 7 13 81typenumberEntity N/A char N/A word range of char N/A POS range N/Aposition offset offset chars offset offset of charsClassif. Alchemy DBpedia Evri DBpe DBpedia OpenC N/A ESTER Yahoo FreeBaseOntologies FreeBase dia LinkedM alais Scema.org DBDefer. DBpeda DBpedia Evri DBpe DBpedia OpenC N/A DBpedia Wikipe WikipediaVocabulari FreeBase dia LinkedM alais Geonam dia IMDBes USCensus DB es MusicBrai UMBEL CIAFact nz OpenCyc book Amazon YAGO Wikicom YouTube MusicBrainz panies TechCrun CrunchBase ch ... 12 March 2012 Seminar @ Ecole Centrale, Paris 6/21
  • Human made benchmarks We performed two evaluation experiments:  WEKEX 2011  ISWC 2011 t = (entity, type, URI, relevant)  Each field has been rated by a Boolean value: true if correct, false otherwiseRizzo G., Troncy R. (2011), NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data.In: International Semantic Web Conference 2011 (ISWC11), Bonn, Germany. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -7
  • WEKEX 2011 Benchmark  Controlled experiment  4 human raters  10 English news articles (5 from BBC and 5 from The New York Times)  Each rater evaluated each article for 5 extractors 200 total evaluations  Fleisss kappa score  moderate agreement among ratersRizzo G., Troncy R. (2011), NERD: Evaluating Named Entity Recognition Tools in the Web of Data.In: (ISWC11) Workshop on Web Scale Knowledge Extraction (WEKEX11), Bonn, Germany. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -8
  • Results different behavior for different sources 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale -9
  • ISWC 2011 Benchmark Controlled experiment  10 human raters  2 English news articles from The New York Times  each rater evaluated each article for 6 extractors 120 total evaluations Fleisss kappa score  substantial agreement among raters 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 10
  • Results 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 11
  • What is NERD? ontology1 REST API2 UI3 The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data1 http://nerd.eurecom.fr/ontology2 http://nerd.eurecom.fr/api/application.wadl3 http://nerd.eurecom.fr/ 12 March 2012 Seminar @ Ecole Centrale, Paris 12/21
  • NERD Ontology Align the taxonomies used by the extractors 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 13
  • Building the NERD ontology NERD type Occurrence Person 10 Organization 10 Country 6 Company 6 Location 6 Continent 5 City 5 RadioStation 5 Album 5 Product 5 ... ... 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 14
  • NERD REST API /document /user GET, /annotation/{extractor} POST, JSON/RDF* /extraction PUT, /evaluation DELETE “entities” : [{ ... “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }]Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web ExtractionTools. In: European chapter of the Association for Computational Linguistics (EACL12), Avignon, France. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 15
  • NIF: NLP Interchange Format Framework Different outputs for the NLP tools OpenCalais DBpedia Spotlight "_type": "Organization", "@URI": "http://dbpedia.org/resource/DBpedia", “name": "North Atlantic Treaty Organization", "@types": "DBpedia:Software,DBpedia:Work” "organizationtype": "governmental civilian", "@surfaceForm": "dbpedia", "nationality": "N/A", "@offset": "0", "_typeReference": "@support": "11", http://s.opencalais.com/1/type/em/e/Organization", "@similarityScore": "0.2387271374464035", ... … Manual effort required for integration or reuse  time consuming  need to capture the definition of the attributes used in the response format NIF uses RDF for representing NER results as Linked Data 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 16
  • Named Entities as textual annotations Lets consider the document: http://www.w3.org/DesignIssues/LinkedData.html The Semantic Web isnt just about putting data on the web. It is about making links, so that a person or machine can explore the web of data. With linked data, when you have some of it, you can find other, related, data.…. All the above plus, Use open standards from W3C (RDF and SPARQL) to identify things, so that people can point at your stuff ... entities: { … [entity: W3C, startChar: 23107, endChar: 23110], … } 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 17
  • NERD meets NIF Model documents through a set of strings deferencable on the Web : offset_23107_ 23110 a str:String ; str:referenceContext :offset_0_26546 . Map string to entity : offset_23107_ 23110 sso:oen dbpedia:W3C. Classification dbpedia:W3C rdf:type nerd:Organization .Rizzo G, Troncy R., Hellmann S. and Bruemmer M. (2012), NERD meets NIF: Lifting NLP Extraction Results to the LinkedData Cloud. In: (LDOW12) Linked Data on the Web (WWW12), Lyon, France. 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 18
  • NERD Demo 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 19
  • NERD Timeline and Future Work beginning Comparison of named entity extractors NERD benchmarks NERD REST API and NERD ontology Lift NERD output results to the LOD cloud today NERD “smart” service: combining the best of all NER tools Dashboard for improving the NERD user experience 12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 20
  • http://nerd.eurecom.fr @giusepperizzo @rtroncy #nerd http://www.slideshare.net/giusepperizzo12/03/2012 - Multimedia Semantics and Interaction - Séminaire Ecole Centrale - 21