STLab ISTC - CNR
ESWC-15 Open Knowledge
Extraction Challenge
2 June 2015 - Portrož, Slovenia
Andrea Giovanni Nuzzolese1, Anna Lisa Gentile2,
Valentina Presutti1, Aldo Gangemi1,3, Roberto Navigli4, Dario Garigliotti4
1STLab, Institute of Cognitive Science and Technology, National Research Council, Italy
2University of Sheffield, UK
3LIPN, Université Paris 13, Sorbone Cité, UMR CNRS, France
4University of Roma “La Sapienza”, Italy
STLab ISTC - CNR
• Semantic Web vision
• populate the Web with machine understandable data
• Most of the Web content consists of natural language text, e.g.,
Web sites, news, blogs, micro-posts, etc.
• Goal:
• extract relevant knowledge
• publish as Linked Data
Background
2
STLab ISTC - CNR
Motivations
3
• Problem
• existing Knowledge Extraction systems are often not directly reusable for
populating the Semantic Web (SW)
• lack of a “genuine” SW reference evaluation framework
• Tasks such as named entity recognition, relation extraction, frame
detection, etc.
• often do not provide output as Linked Data
• Aim of OKE challenge:
• reference framework for Knowledge Extraction from text
• produce output wrt specific SW requirements
STLab ISTC - CNR
Task 1
4
Entity Recognition, Linking andTyping for Knowledge Base population
• Identifying Entities in a sentence and create an OWL individual
(owl:Individual statement) representing it
• entity: any discourse referent either named or anonymous
• Linking (owl:sameAs statement) such an individual to DBpedia
• Assigning a type to such individual (rdf:type statement) selected
from a set of top-level DOLCE Ultra Lite classes
• i.e., dul:Person, dul:Place, dul:Organization and dul:Role
STLab ISTC - CNR
• We want to recognize the following entities
Task 1: Example
5
• The results must be provided in NIF format
Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas, but in
effect had no formal training in either botany or art.
STLab ISTC - CNR
Task 1: Evaluation criteria
6
• Ability to recognize entities in a text
• only full matches are counted as correct
• Ability to assign the correct type
• evaluation will be carried out only on the 4 target DOLCE types
• Ability to link individuals to DBpedia 2014
• participants must link entities to DBpedia only when relevant
• We calculate the average Precision, Recall and F-measure
STLab ISTC - CNR
Task 2
7
• Identifying the type(s) of a given entity
• as expressed in a given definition
• Creating owl:Class statements for defining each of them
• as a new class in the knowledge base
• Creating a rdf:type statements
• between the given entity and the new created classes
• Aligning the identified types to a subset of DOLCE Ultra Lite
classes
Class Induction and entity typing forVocabulary and Knowledge Base enrichment
STLab ISTC - CNR
Task 2: Example
8
Brian Banner is a fictional villain from the Marvel Comics Universe created by Bill
Mantlo and Mike Mignola and first appearing in print in late 1985.
• Brian Banner will be given as the input target entity
• We want to recognize the following
• The results must be provided in NIF format
STLab ISTC - CNR
Task 2: Evaluation criteria
9
• Ability to recognize strings that describe the type of a target
entity
• Ability to align the identified type with the reference ontology,
i.e., DOLCE
STLab ISTC - CNR
Participants
10
• M. Röder, R. Usbeck and A. Ngonga Ngomo. CETUS — A
Baseline Approach to Type Extraction (Tasks 1-2)
• J. Plu, G. Rizzo and R.Troncy. A Hybrid Approach for Entity
Recognition and Linking (Task 1)
• S. Consoli and D. Reforgiato. Using FRED for Named Entity
Resolution, Linking and Typing for Knowledge Base population
(Tasks 1-2)
• J. Gao and S. Mazumdar. Exploiting Linked Open Data to Uncover
Entity Type (Task 2)
STLab ISTC - CNR
CETUS - A Baseline Approach to
Type Extraction
11
• FOX
• Entity Recognition based on Stanford NER, Illinois, Balie and OpenNLP
• Entity Linking based on AGDISTIS, an open-source named entity
disambiguation system
Task 1
STLab ISTC - CNR
CETUS - A Baseline Approach to
Type Extraction
12
Task 2
Entity replacementCoreference resolution
Pattern-based evidence recognition Ontology Matching
STLab ISTC - CNR
A Hybrid Approach for Entity
Recognition and Linking
13
• Hybrid: NER + NEL combining linguistics and semantics
• NER - output: set of entity candidates
• Stanford POS Tagger & Stanford NER
• Dictionary of Wikipedia titles built wrt to the dataset types
• In case of overlaps, the longest entity is considered
• NEL - output: each candidate is matched with a DBpedia referent
if any (NIL otherwise)
• distance function that compares the candidate with: title, redirect,
disambiguation page, and Page Rank of the Wikipedia resource
STLab ISTC - CNR
Using FRED for Named Entity Resolution, Linking
and Typing for Knowledge Base population
The Black Hand might not have decided to barbarously assassinate Franz Ferdinand after he
arrived in Sarajevo on June 28th, 1914
STLab ISTC - CNR
• Recognizing Lexical mention of types
• feature extraction
• POS, lemma and NER, etc.
• Gazetteer
• semantic distance using LOD
• model selection
• Maximum Entropy Markov Model for
• Class alignment
• syntactic and semantic similarity between recognised types and DOLCE
Exploiting Linked Open Data to
Uncover Entity Type
15
STLab ISTC - CNR
Evaluation
16
1. GERBIL — General Entity Annotation Benchmark Framework by Ricardo Usbeck, Michael Röder,Axel-Cyrille
Ngonga Ngomo, Ciro Baron,Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix,
Bernd Eickmann, Paolo Ferragina, Christiane Lemke,Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe
Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann in 24th WWW conference
• GERBIL offers
• an easy-to-use web-based platform
• agile comparison of annotators
• multiple datasets support
• uniform measuring approaches
• We use GERBIL1 as benchmarking system for
evaluating precision, recall and F-measure
http://aksw.org/Projects/GERBIL.html
STLab ISTC - CNR
17
Thank you!
https://github.com/anuzzolese/oke-challenge

Oke

  • 1.
    STLab ISTC -CNR ESWC-15 Open Knowledge Extraction Challenge 2 June 2015 - Portrož, Slovenia Andrea Giovanni Nuzzolese1, Anna Lisa Gentile2, Valentina Presutti1, Aldo Gangemi1,3, Roberto Navigli4, Dario Garigliotti4 1STLab, Institute of Cognitive Science and Technology, National Research Council, Italy 2University of Sheffield, UK 3LIPN, Université Paris 13, Sorbone Cité, UMR CNRS, France 4University of Roma “La Sapienza”, Italy
  • 2.
    STLab ISTC -CNR • Semantic Web vision • populate the Web with machine understandable data • Most of the Web content consists of natural language text, e.g., Web sites, news, blogs, micro-posts, etc. • Goal: • extract relevant knowledge • publish as Linked Data Background 2
  • 3.
    STLab ISTC -CNR Motivations 3 • Problem • existing Knowledge Extraction systems are often not directly reusable for populating the Semantic Web (SW) • lack of a “genuine” SW reference evaluation framework • Tasks such as named entity recognition, relation extraction, frame detection, etc. • often do not provide output as Linked Data • Aim of OKE challenge: • reference framework for Knowledge Extraction from text • produce output wrt specific SW requirements
  • 4.
    STLab ISTC -CNR Task 1 4 Entity Recognition, Linking andTyping for Knowledge Base population • Identifying Entities in a sentence and create an OWL individual (owl:Individual statement) representing it • entity: any discourse referent either named or anonymous • Linking (owl:sameAs statement) such an individual to DBpedia • Assigning a type to such individual (rdf:type statement) selected from a set of top-level DOLCE Ultra Lite classes • i.e., dul:Person, dul:Place, dul:Organization and dul:Role
  • 5.
    STLab ISTC -CNR • We want to recognize the following entities Task 1: Example 5 • The results must be provided in NIF format Florence May Harding studied at a school in Sydney, and with Douglas Robert Dundas, but in effect had no formal training in either botany or art.
  • 6.
    STLab ISTC -CNR Task 1: Evaluation criteria 6 • Ability to recognize entities in a text • only full matches are counted as correct • Ability to assign the correct type • evaluation will be carried out only on the 4 target DOLCE types • Ability to link individuals to DBpedia 2014 • participants must link entities to DBpedia only when relevant • We calculate the average Precision, Recall and F-measure
  • 7.
    STLab ISTC -CNR Task 2 7 • Identifying the type(s) of a given entity • as expressed in a given definition • Creating owl:Class statements for defining each of them • as a new class in the knowledge base • Creating a rdf:type statements • between the given entity and the new created classes • Aligning the identified types to a subset of DOLCE Ultra Lite classes Class Induction and entity typing forVocabulary and Knowledge Base enrichment
  • 8.
    STLab ISTC -CNR Task 2: Example 8 Brian Banner is a fictional villain from the Marvel Comics Universe created by Bill Mantlo and Mike Mignola and first appearing in print in late 1985. • Brian Banner will be given as the input target entity • We want to recognize the following • The results must be provided in NIF format
  • 9.
    STLab ISTC -CNR Task 2: Evaluation criteria 9 • Ability to recognize strings that describe the type of a target entity • Ability to align the identified type with the reference ontology, i.e., DOLCE
  • 10.
    STLab ISTC -CNR Participants 10 • M. Röder, R. Usbeck and A. Ngonga Ngomo. CETUS — A Baseline Approach to Type Extraction (Tasks 1-2) • J. Plu, G. Rizzo and R.Troncy. A Hybrid Approach for Entity Recognition and Linking (Task 1) • S. Consoli and D. Reforgiato. Using FRED for Named Entity Resolution, Linking and Typing for Knowledge Base population (Tasks 1-2) • J. Gao and S. Mazumdar. Exploiting Linked Open Data to Uncover Entity Type (Task 2)
  • 11.
    STLab ISTC -CNR CETUS - A Baseline Approach to Type Extraction 11 • FOX • Entity Recognition based on Stanford NER, Illinois, Balie and OpenNLP • Entity Linking based on AGDISTIS, an open-source named entity disambiguation system Task 1
  • 12.
    STLab ISTC -CNR CETUS - A Baseline Approach to Type Extraction 12 Task 2 Entity replacementCoreference resolution Pattern-based evidence recognition Ontology Matching
  • 13.
    STLab ISTC -CNR A Hybrid Approach for Entity Recognition and Linking 13 • Hybrid: NER + NEL combining linguistics and semantics • NER - output: set of entity candidates • Stanford POS Tagger & Stanford NER • Dictionary of Wikipedia titles built wrt to the dataset types • In case of overlaps, the longest entity is considered • NEL - output: each candidate is matched with a DBpedia referent if any (NIL otherwise) • distance function that compares the candidate with: title, redirect, disambiguation page, and Page Rank of the Wikipedia resource
  • 14.
    STLab ISTC -CNR Using FRED for Named Entity Resolution, Linking and Typing for Knowledge Base population The Black Hand might not have decided to barbarously assassinate Franz Ferdinand after he arrived in Sarajevo on June 28th, 1914
  • 15.
    STLab ISTC -CNR • Recognizing Lexical mention of types • feature extraction • POS, lemma and NER, etc. • Gazetteer • semantic distance using LOD • model selection • Maximum Entropy Markov Model for • Class alignment • syntactic and semantic similarity between recognised types and DOLCE Exploiting Linked Open Data to Uncover Entity Type 15
  • 16.
    STLab ISTC -CNR Evaluation 16 1. GERBIL — General Entity Annotation Benchmark Framework by Ricardo Usbeck, Michael Röder,Axel-Cyrille Ngonga Ngomo, Ciro Baron,Andreas Both, Martin Brümmer, Diego Ceccarelli, Marco Cornolti, Didier Cherix, Bernd Eickmann, Paolo Ferragina, Christiane Lemke,Andrea Moro, Roberto Navigli, Francesco Piccinno, Giuseppe Rizzo, Harald Sack, René Speck, Raphaël Troncy, Jörg Waitelonis, and Lars Wesemann in 24th WWW conference • GERBIL offers • an easy-to-use web-based platform • agile comparison of annotators • multiple datasets support • uniform measuring approaches • We use GERBIL1 as benchmarking system for evaluating precision, recall and F-measure http://aksw.org/Projects/GERBIL.html
  • 17.
    STLab ISTC -CNR 17 Thank you! https://github.com/anuzzolese/oke-challenge