SlideShare a Scribd company logo
TellMeFirst
Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2
A Knowledge Domain Discovery Framework
THE HAGUE, NETHERLANDS – Feb 12th 2016
1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino
2- Istituto Superiore Mario Boella (ISMB)
Nexa Center for
Internet & Society
2
Interdisciplinary Research
Digital Culture
Support to Policy
Community
http://nexa.polito.it/
@nexacenter
What is TellMeFirst and how it works
How we build a generalist training set based on DBpedia and
Wikipedia
What is a domain training set (wrt the generalist one)
How we create a domain training set using a configurable pipeline
Agenda
3
What is TellMeFirst and
how it works
4
5
tellmefirst.polito.it
6
How TellMeFirst works
7
How TellMeFirst works
8
TellMeFirst Classifier
TellMeFirst exploits an approach where the training set based on
DBpedia and Wikipedia is compared with the target document
In the training set, each DBpedia entity (i.e., Barack Obama) is
represented by all the Wikipedia paragraphs in which it appears
as wikilink (http://en.wikipedia.org/wiki/Barack_Obama)
A vector distance metric is used to understand how much a
Wikipedia paragraph is similar to the target document (Mendes,
2011)
How we build a generalist
training set based on DBpedia
and Wikipedia
9
Traditional approach
(based on DBpedia Spotlight)
10
Traditional approach
(based on DBpedia Spotlight)
11
The DBpedia Extractor
It takes as input some datasets built through the DBpedia Information
Extraction Framework (such as labels, redirects, disambiguations)
The output is a list of “good” URIs that effectively represent entities
(avoiding disambiguations and redirects pages)
The DBpedia/Wikipedia Mapper
It maps “good” URIs on the dump of Wikipedia and then it creates a
Lucene Index that defines the training set
12
Training set - A Lucene Index
What is a domain training set
13
Domain training set
It contains a subset of DBpedia entities indexed in the generalist
training set
It is defined according to the domain of documents that you need
to classify
It is build through a software component properly driven by
SPARQL queries and advanced services (i.e., Linked Data
Recommenders), to create a new list of “good” URIs
14
How we build a domain
training set using a configurable
pipeline
15
Traditional approach (recap)
16
Domain training set - Pipeline (i)
17
Domain training set - Pipeline (ii)
18
Domain Index - Pipeline (iii)
19
Domain Engine - SPARQL
20
Domain Engine - LDR
First implementation: Linked Data Recommender (LDR)
developed by the SoftEng group of the Politecnico di
Torino
Get all DBpedia categories from a DBpedia entity
Get DBpedia entities related to a specific DBpedia entity and a DBpedia
category
Pipeline: get new entities with LDR from resources
retrieved with SPARQL queries 21
Example - Colosseum
The Colosseum or Coliseum (/kɒləˈsiːəm/ kol-ə-see-əm), also known as the
Flavian Amphitheatre (Latin: Amphitheatrum Flavium; Italian: Anfiteatro Flavio
[amfiteˈaːtro ˈflaːvjo] or Colosseo [kolosˈsɛːo]), is an oval amphitheatre in the
centre of the city of Rome, Italy. Built of concrete and sand, it is the largest
amphitheatre ever built and is considered one of the greatest works of
architecture and engineering ever.
The Colosseum is situated just east of the Roman Forum. Construction began
under the emperor Vespasian in 72 AD, and was completed in 80 AD under his
successor and heir Titus. Further modifications were made during the reign of
Domitian (81–96). These three emperors are known as the Flavian dynasty, and
the amphitheatre was named in Latin for its association with their family name
(Flavius).
22
Colosseum - Generalist training set
23
Colosseum - Domain training set
24
Comparison of results (i)
25
Titus, Vespasian, and Domitian are identified through the
generalist training set and are directly mentioned in the
text
Arch of Titus, Temple of Vespasian and Titus, obtained with
the domain training set, are related to emperors
mentioned in the previous point, but refer to the cultural
heritage of the Ancient Rome
Comparison of results (ii)
26
Flavian dynasty and Flavia entities are mentioned in the text,
but they are not so relevant for the cultural heritage
domain
The Great Fire of Rome is not strictly related to the entities
mentioned in the text, but it is relevant from an historical
point of view
Wrap up
27
TellMeFirst is a tool for classifying and enriching textual
documents using a training set based on DBpedia and
Wikipedia
We are capable to build a training set for TellMeFirst with a
configurable pipeline to get a subset of all DBpedia entities
Driving this configurable pipeline, we are able to create a training
set for a specific knowledge domain (such as cultural heritage)
Future developments
Define a training set for classifying scientific publications
available in Open Access
Build a GUI in order to enable domain experts to create a
domain training set, without a specific knowledge of
Linked Data framework
We are open to collaborations on TellMeFirst!
28
Acknowledgments
●Joint Open Lab of Telecom Italia
(http://www.telecomitalia.com/tit/it/innovazione/
i-luoghi-della-ricerca/joint-open-labs.html)
●Software Engineering Research Group (DAUIN),
Politecnico di Torino (http://softeng.polito.it/)
• Giuseppe Futia
– Mail: giuseppe.futia@polito.it
– Twitter: @giuseppe_futia
• Antonio Vetrò
– Mail: antonio.vetro@polito.it
– Twitter: @phisaz
• Giuseppe Rizzo
– Mail: giuseppe.rizzo@ismb.it
– Twitter: @giusepperizzo
Contacts
Appendix
31
32

More Related Content

Similar to TellMeFirst - A knowledge domain discovery framework

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
 
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
MatteoBelcao
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media Archiving
Sven Lieber
 
mx & dbs
mx & dbsmx & dbs
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioI Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
CulturaItalia
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
Chiara Del Vescovo
 
UAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.pptUAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.ppt
Rajesh Kumar Das
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
vty
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Pedro Príncipe
 
UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013
Damiano Spina
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...
Xiaogang (Marshall) Ma
 
Wims paper (1)
Wims paper (1)Wims paper (1)
Wims paper (1)
Wims paper (1)Wims paper (1)
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Elena-Oana Tabaranu
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
vty
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
m_ackermann
 
SWAN/SIOC: Aligning Scientific Discourse Representation and Social Semantics
SWAN/SIOC: Aligning Scientific Discourse Representation and Social SemanticsSWAN/SIOC: Aligning Scientific Discourse Representation and Social Semantics
SWAN/SIOC: Aligning Scientific Discourse Representation and Social Semantics
John Breslin
 
Seminar: OAIS Model application in digital preservation projects
Seminar: OAIS Model application in digital preservation projectsSeminar: OAIS Model application in digital preservation projects
Seminar: OAIS Model application in digital preservation projects
Michael Day
 
Harvesting&Metadata Enrich Project EVA 2009
Harvesting&Metadata Enrich Project   EVA 2009Harvesting&Metadata Enrich Project   EVA 2009
Harvesting&Metadata Enrich Project EVA 2009
ICL - Image Communication Laboratory
 
One Standard to rule them all?: Descriptive Choices for Open Education
One Standard to rule them all?: Descriptive Choices for Open EducationOne Standard to rule them all?: Descriptive Choices for Open Education
One Standard to rule them all?: Descriptive Choices for Open Education
R. John Robertson
 

Similar to TellMeFirst - A knowledge domain discovery framework (20)

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge GraphsSEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
SEMANTIC WEB SOURCES – comparison of open-source Knowledge Graphs
 
BESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media ArchivingBESOCIAL A Knowledge Graph for Social Media Archiving
BESOCIAL A Knowledge Graph for Social Media Archiving
 
mx & dbs
mx & dbsmx & dbs
mx & dbs
 
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studioI Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
I Linked Open Data nei Beni Culturali, alcuni progetti e casi di studio
 
Documents, services, and data on the web
Documents, services, and data on the webDocuments, services, and data on the web
Documents, services, and data on the web
 
UAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.pptUAEU_MDL_Slides_rev1.ppt
UAEU_MDL_Slides_rev1.ppt
 
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in DataverseClariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
Clariah Tech Day: Controlled Vocabularies and Ontologies in Dataverse
 
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
Infraestrutura para a Ciência Aberta na Europa - OpenAIRE: O poder dos reposi...
 
UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013UNED Online Reputation Monitoring Team at RepLab 2013
UNED Online Reputation Monitoring Team at RepLab 2013
 
From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...From data portal to knowledge portal: Leveraging semantic technologies to sup...
From data portal to knowledge portal: Leveraging semantic technologies to sup...
 
Wims paper (1)
Wims paper (1)Wims paper (1)
Wims paper (1)
 
Wims paper (1)
Wims paper (1)Wims paper (1)
Wims paper (1)
 
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpediaSemantic Tagging for the XWiki Platform with Zemanta and DBpedia
Semantic Tagging for the XWiki Platform with Zemanta and DBpedia
 
Building COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science ProjectBuilding COVID-19 Museum as Open Science Project
Building COVID-19 Museum as Open Science Project
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
SWAN/SIOC: Aligning Scientific Discourse Representation and Social Semantics
SWAN/SIOC: Aligning Scientific Discourse Representation and Social SemanticsSWAN/SIOC: Aligning Scientific Discourse Representation and Social Semantics
SWAN/SIOC: Aligning Scientific Discourse Representation and Social Semantics
 
Seminar: OAIS Model application in digital preservation projects
Seminar: OAIS Model application in digital preservation projectsSeminar: OAIS Model application in digital preservation projects
Seminar: OAIS Model application in digital preservation projects
 
Harvesting&Metadata Enrich Project EVA 2009
Harvesting&Metadata Enrich Project   EVA 2009Harvesting&Metadata Enrich Project   EVA 2009
Harvesting&Metadata Enrich Project EVA 2009
 
One Standard to rule them all?: Descriptive Choices for Open Education
One Standard to rule them all?: Descriptive Choices for Open EducationOne Standard to rule them all?: Descriptive Choices for Open Education
One Standard to rule them all?: Descriptive Choices for Open Education
 

Recently uploaded

Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
xclpvhuk
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
Walaa Eldin Moustafa
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 

Recently uploaded (20)

Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
一比一原版(Unimelb毕业证书)墨尔本大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data Lake
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 

TellMeFirst - A knowledge domain discovery framework

  • 1. TellMeFirst Giuseppe Futia1, Antonio Vetrò1, Giuseppe Rizzo2 A Knowledge Domain Discovery Framework THE HAGUE, NETHERLANDS – Feb 12th 2016 1- Nexa Center for Internet and Society, DAUIN, Politecnico di Torino 2- Istituto Superiore Mario Boella (ISMB)
  • 2. Nexa Center for Internet & Society 2 Interdisciplinary Research Digital Culture Support to Policy Community http://nexa.polito.it/ @nexacenter
  • 3. What is TellMeFirst and how it works How we build a generalist training set based on DBpedia and Wikipedia What is a domain training set (wrt the generalist one) How we create a domain training set using a configurable pipeline Agenda 3
  • 4. What is TellMeFirst and how it works 4
  • 8. 8 TellMeFirst Classifier TellMeFirst exploits an approach where the training set based on DBpedia and Wikipedia is compared with the target document In the training set, each DBpedia entity (i.e., Barack Obama) is represented by all the Wikipedia paragraphs in which it appears as wikilink (http://en.wikipedia.org/wiki/Barack_Obama) A vector distance metric is used to understand how much a Wikipedia paragraph is similar to the target document (Mendes, 2011)
  • 9. How we build a generalist training set based on DBpedia and Wikipedia 9
  • 10. Traditional approach (based on DBpedia Spotlight) 10
  • 11. Traditional approach (based on DBpedia Spotlight) 11 The DBpedia Extractor It takes as input some datasets built through the DBpedia Information Extraction Framework (such as labels, redirects, disambiguations) The output is a list of “good” URIs that effectively represent entities (avoiding disambiguations and redirects pages) The DBpedia/Wikipedia Mapper It maps “good” URIs on the dump of Wikipedia and then it creates a Lucene Index that defines the training set
  • 12. 12 Training set - A Lucene Index
  • 13. What is a domain training set 13
  • 14. Domain training set It contains a subset of DBpedia entities indexed in the generalist training set It is defined according to the domain of documents that you need to classify It is build through a software component properly driven by SPARQL queries and advanced services (i.e., Linked Data Recommenders), to create a new list of “good” URIs 14
  • 15. How we build a domain training set using a configurable pipeline 15
  • 17. Domain training set - Pipeline (i) 17
  • 18. Domain training set - Pipeline (ii) 18
  • 19. Domain Index - Pipeline (iii) 19
  • 20. Domain Engine - SPARQL 20
  • 21. Domain Engine - LDR First implementation: Linked Data Recommender (LDR) developed by the SoftEng group of the Politecnico di Torino Get all DBpedia categories from a DBpedia entity Get DBpedia entities related to a specific DBpedia entity and a DBpedia category Pipeline: get new entities with LDR from resources retrieved with SPARQL queries 21
  • 22. Example - Colosseum The Colosseum or Coliseum (/kɒləˈsiːəm/ kol-ə-see-əm), also known as the Flavian Amphitheatre (Latin: Amphitheatrum Flavium; Italian: Anfiteatro Flavio [amfiteˈaːtro ˈflaːvjo] or Colosseo [kolosˈsɛːo]), is an oval amphitheatre in the centre of the city of Rome, Italy. Built of concrete and sand, it is the largest amphitheatre ever built and is considered one of the greatest works of architecture and engineering ever. The Colosseum is situated just east of the Roman Forum. Construction began under the emperor Vespasian in 72 AD, and was completed in 80 AD under his successor and heir Titus. Further modifications were made during the reign of Domitian (81–96). These three emperors are known as the Flavian dynasty, and the amphitheatre was named in Latin for its association with their family name (Flavius). 22
  • 23. Colosseum - Generalist training set 23
  • 24. Colosseum - Domain training set 24
  • 25. Comparison of results (i) 25 Titus, Vespasian, and Domitian are identified through the generalist training set and are directly mentioned in the text Arch of Titus, Temple of Vespasian and Titus, obtained with the domain training set, are related to emperors mentioned in the previous point, but refer to the cultural heritage of the Ancient Rome
  • 26. Comparison of results (ii) 26 Flavian dynasty and Flavia entities are mentioned in the text, but they are not so relevant for the cultural heritage domain The Great Fire of Rome is not strictly related to the entities mentioned in the text, but it is relevant from an historical point of view
  • 27. Wrap up 27 TellMeFirst is a tool for classifying and enriching textual documents using a training set based on DBpedia and Wikipedia We are capable to build a training set for TellMeFirst with a configurable pipeline to get a subset of all DBpedia entities Driving this configurable pipeline, we are able to create a training set for a specific knowledge domain (such as cultural heritage)
  • 28. Future developments Define a training set for classifying scientific publications available in Open Access Build a GUI in order to enable domain experts to create a domain training set, without a specific knowledge of Linked Data framework We are open to collaborations on TellMeFirst! 28
  • 29. Acknowledgments ●Joint Open Lab of Telecom Italia (http://www.telecomitalia.com/tit/it/innovazione/ i-luoghi-della-ricerca/joint-open-labs.html) ●Software Engineering Research Group (DAUIN), Politecnico di Torino (http://softeng.polito.it/)
  • 30. • Giuseppe Futia – Mail: giuseppe.futia@polito.it – Twitter: @giuseppe_futia • Antonio Vetrò – Mail: antonio.vetro@polito.it – Twitter: @phisaz • Giuseppe Rizzo – Mail: giuseppe.rizzo@ismb.it – Twitter: @giusepperizzo Contacts
  • 32. 32