SlideShare a Scribd company logo
1 of 47
Download to read offline
A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, and
Yulan He°
Knowledge Media Institute, The Open University, Milton Keynes
Ÿ University of Sheffield, Sheffield
w Lancaster University, Lancaster
° Aston University, Birmingham
UK. 2013
Harnessing Linked Knowledge Sources for
Topic Classification in Social Media
INTRODUCTION
Social Media Streams - Risk in violent and criminal activities
INTRODUCTION
Research Questions:
o  Can semantic features help in topic classification (TC)?
o  Which knowledge source (KS) data and KS taxonomies
provide useful information for improving the TC of tweets?
OUTLINE
• Introduction
- Topic Classification (TC) of Microposts
- Related Work
- State of the art limitations
• Proposed Approach
• Experiments
• Findings
• Conclusions
INTRODUCTION
u  Difficulties of Topic Classification of microposts
o  Restricted number of characters
o  Irregular and ill-formed words
•  Mixing upper and lowercase letter
§  Makes it difficult to detect proper nouns, and other part of
speech tags.
•  Wide variety of language
§  E.g., “see u soon”
o  Event-dependent emerging jargon
• Volatile jargon relevant to particular events
§  E.g., “Jan.25” (used during the Egyptian revolution
o  High Topical Diversity
o  Sparse data
INTRODUCTION
Social Knowledge Sources (KS)
DBpedia* Yago2 Freebase
Resources 2.35 million 447million 3.6 million
Classes 359 562,312 1,450
Properties 1,820 253,213,842 7,000
*Using dbpedia ontology
o  Structured Semantic Web Representation of data
•  Maintained by thousand of editors
§  E.g DBpedia, derived from Wikipedia
§  Freebase
•  Evolves and adapts as knowledge changes [Syed et al,
2008]
o  Cover a broad range of topics
o  Characterise topics with a large number of resources
INTRODUCTION
Local and External Metadata of a Tweet
INTRODUCTION
Local and External Metadata of a Tweet
NER:CountryNER:Person
NER:Person
INTRODUCTION
Local and External Metadata of a Tweet
NER:CountryNER:Person
NER:Person
<http://dbpedia.org/resource/Barack_Obama
<http://dbpedia.org/resource/Egypt
<http://dbpedia.org/resource/Hosni_Mubarak
PROPOSED APPROACH
o  State of the art limitations
§  Use of single knowledge sources
§  Entities’ metadata is constrained by the used NER service
(e.g OpenCalais, Alchemy).
o  Our approach
§  Exploits multiple knowledge sources.
§  Enhances the entity metadata by deriving semantic graphs.
§  Leverages the graph structures surrounding entities present
in a KS for the TC task.
Exploiting Knowledge Sources for the Topic Classification of
Microposts
OUTLINE
• Introduction
• Proposed Approach
• Semantic Meta-graphs
• Weighting Schemas
• Enhancing TC with Semantic Features
• Experiments
• Findings
• Conclusions
PROPOSED APPROACH
Rationale…
1
2
PROPOSED APPROACH
Rationale…
1
2
Could be more indicative
of War and Conflict
PROPOSED APPROACH
Rationale…
2
Not necessarily a good
indicator of War and
Conflict
PROPOSED APPROACH
Rationale…
1
2
Can the graph structure of existing Knowledge sources provide
an abstraction of the use of these entity types for representing a
topic ?
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
1 Datasets Collection
SPARQL query for all resources from a
given Topic (e.g. War )
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
2 Datasets Enrichment
From tweets and articles’ abstracts, extract
entities and link them to resources in
DBpedia and Freebase.
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
3 Semantic Features Derivation
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
4
Build a Topic Classifier based on Features
Derived from Crossed-Sources
PROPOSED APPROACH
Framework for Topic Classification of Tweets
Concept Enrichment
DBFBDB-FB
RetrieveArticles
TW
Retrieve
Tweets
Derive Semantic Features
Build Cross-Source Topic Classifier
Annotate
Tweets
4
Build a Topic Classifier based on Features
Derived from Crossed-Sources
PROPOSED APPROACH
Deriving Semantic Meta-Graphs
<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates>
<dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
PROPOSED APPROACH
Deriving Semantic Meta-Graphs
<dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates>
<dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
PROPOSED APPROACH
Definition 1- Resource Meta-graph
Is a sequence of tuples G:=(R,P,C,Y) where
•  R, P, C are finite sets whose elements are resources,
properties and classes;
•  Y is a ternary relation representing a
hypergraph with ternary edges.
•  Y is a tripartite graph where the vertices
are
Y ! R " P "C
H Y( ) = V, D
D = r, p,c{ } r, p,c( ) ! Y{ }
PROPOSED APPROACH
Resource Meta-graph
The meta-graph of entity e is the aggregation of all resources,
properties and classes related to this entity.
Obama
birthPlace
author
spouse
Projecting on Properties Projecting on Classes
LivingPeople
PresidentOfTheUnitedStates
Obama
Person
Author
PROPOSED APPROACH
Resource Meta-graph
The meta-graph of entity e is the aggregation of all resources,
properties and classes related to this entity.
Obama
birthPlace
author
spouse
Projecting on Properties Projecting on Classes
LivingPeople
PresidentOfTheUnitedStates
Obama
Person
Author
How can we weight these graphs to reveal semantic
features characterise Obama in the context of
Violence?
?
?
?
?
?? ?
PROPOSED APPROACH
Weighting Semantic Features
Specificity
Measures the relative importance of a property to
a given class in a KS graph GKS:
p ! G e( )
c ! G e( )
specificityKS p,c( ) = pN R(c)( )
N(R(c))
PROPOSED APPROACH
Weighting Semantic Features
Generality
Captures the specialisation of a property p to a given class c,
by computing the property’s frequency among other
semantically related classes R’(c).
Where N(R’(c)) is the number of resources whose type is
either c or a specialisation of c’s parent classes.
generalityKS p,c( ) =
N R'(c)( )
pN (R'(c))
PROPOSED APPROACH
Weighting Semantic Features
SG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation (A1)
Class Features
Property Features
Class+ Property Features
A1!CF' = F + CF
A1!PF' = F + pF
A1!C+PF' = F + cF + pF
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation (A1)
Class Features
Property Features
Class+ Property Features
A1!CF' = F + CF
A1!PF' = F + pF
A1!C+PF' = F + cF + pF
F
president, obama, televised, statement, hosni, mubarak, resignation,
cnn, says, egypt
FA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/
populationDensity….
FA1+ C
PopulatedPlace, Office_holder, PresidentOfTheUnitedStates,
Politician…
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation with Generalisation (A2)
This augmentation exploits the subsumption relation among
classes within the DBpedia or Freebase ontologies. In this
cases we consider the set of parent classes of c.
Parent(c) Features
Parent(c) + Property Features
A2!CF' = F + parent(c)F
A2!C+PF' = F + pF + parent(c)F
PROPOSED APPROACH
Enhancing Feature Space with Semantic Features
Semantic Augmentation with Generalisation (A2)
This augmentation exploits the subsumption relation among
classes within the DBpedia or Freebase ontologies. In this
cases we consider the set of parent classes of c.
Parent(c) Features
Parent(c)+Property Features
A2!CF' = F + parent(c)F
A2!C+PF' = F + pF + parent(c)F
F
president, obama, televised, statement, hosni, mubarak, resignation,
cnn, says, egypt
FA2+ parent(c)
Place, Office_holder, President, Politician…
OUTLINE
• Introduction
• Proposed Approach
• Experiments
• Dataset
• Baseline Features
• Results
• Findings
• Conclusions
PROPOSED APPROACH
Datasets
o  Twitter Dataset [Abel et al., 2011] (TW)
§  Collected during two months starting on Nov 2010.
§  Topically annotated
§  Using tweets labelled as “War & Conflict” (War),
“Law & Crime” (Cri), “Disaster &
Accident” (DisAcc).
§  Multilabelled dataset comprising 10,189 Tweets.
o  DBpedia (DB) and Freebase (FB) Dataset
§  SPARQL queried endpoints for all resources from
categories and subcategories of skos:concept of War,
Cri, DisAcc.
•  DBpedia – 9,465 articles
•  Freebase – 16,915 articles
PROPOSED APPROACH
Datasets
PROPOSED APPROACH
Experimental Setup A
1.  Use annotated Tweets for training (TW)
-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),
and Part of Speech tags (PoS).
-  Enhance Features using the DBpedia and Freebase
graphs.
2.  Train a SVM classifier based on the TW corpus. Trained/
Tested on 80%-20% over five independent runs.
3.  Compute Precision, Recall, and F-measure.
PROPOSED APPROACH
Results for TW dataset
PROPOSED APPROACH
Experimental Setup B
1.  Use labelled articles from DBpedia (DB) and Freebase
(FB) for training
-  Baseline: Bag of Words (BoW), Bag of Entities (BoE),
and Part of Speech tags (PoS).
-  Enhance Features using the DBpedia and Freebase
graphs.
2.  Train a SVM classifier based on the DB, FB, DB+FB, DB
+FB+TW training corpus and test on TW. Trained/Tested
on 80%-20% over five independent runs.
3.  Compute Precision, Recall, and F-measure.
PROPOSED APPROACH
Results for Training on KS articles, and Testing on TW
PROPOSED APPROACH
Factors contributing to the performance of a KS graph for TC
1.  Topic-Class Entropy
2.  Entity-Class Entropy
3.  Topic-Class-Property Entropy
PROPOSED APPROACH
Correlating Entropy metrics with the performance of the
cross-source TC classifiers.
PROPOSED APPROACH
Correlating Entropy metrics with the performance of the
cross-source TC classifiers.
Indicates that the higher the number of ambiguous
entities in a topic within a KS graph, the lower the
performance of the TC.
FINDINGS
1.  KSs combined with Twitter data provide complementary
information for TC of Tweets, outperforming the KS
approaches and the approach using Tweets only.
2.  A KS performance on TC depends on the coverage of
the entities within that KS.
3.  When entities have low coverage in a KS, exploiting the
mapping between corresponding KSs’ ontologies is
beneficial.
CONCLUSIONS
•  Explored the task of topic classification of tweets
•  Exploited information in KSs (e.g. DBpedia, Freebase)
using semantic graphs for concepts and properties
surrounding an entity.
•  Presented the importance of considering graph
structures in KSs for the supervised classification of
tweets, by achieving significant improvement over
various state-of-the-art approaches using both single
KSs and Tweets only.
CONTACT US
A.  Elizabeth Cano
•  http://people.kmi.open.ac.uk/cano/
B.  Andrea Varga
•  http://sites.google.com/site/missandreavarga/
C.  Matthew Rowe
•  http://lancs.ac.uk/staff/rowem/
D.  Fabio Ciravegna
•  http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna
E.  Yulan He
•  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

More Related Content

What's hot

Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalFaegheh Hasibi
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...inscit2006
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)krisztianbalog
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)krisztianbalog
 
Rules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataRules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataHang Dong
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Daniel Valcarce
 

What's hot (6)

Exploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity RetrievalExploiting Entity Linking in Queries For Entity Retrieval
Exploiting Entity Linking in Queries For Entity Retrieval
 
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...Intelligent Methods in Models of Text Information Retrieval: Implications for...
Intelligent Methods in Models of Text Information Retrieval: Implications for...
 
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)Entity Retrieval (tutorial organized by Radialpoint in Montreal)
Entity Retrieval (tutorial organized by Radialpoint in Montreal)
 
Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)Entity Retrieval (WWW 2013 tutorial)
Entity Retrieval (WWW 2013 tutorial)
 
Rules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging dataRules for inducing hierarchies from social tagging data
Rules for inducing hierarchies from social tagging data
 
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
Exploring Statistical Language Models for Recommender Systems [RecSys '15 DS ...
 

Viewers also liked

CIMAT 2011 - Problemáticas
CIMAT 2011 - ProblemáticasCIMAT 2011 - Problemáticas
CIMAT 2011 - Problemáticasmikealebrije
 
Centro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad PresentacionCentro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad PresentacionRamon Costa i Pujol
 
Concepts of IT-Based Modern Living
Concepts of IT-Based Modern LivingConcepts of IT-Based Modern Living
Concepts of IT-Based Modern Livingmatthiasvogt
 
Villa Victoria Mar Del Plata
Villa Victoria   Mar Del PlataVilla Victoria   Mar Del Plata
Villa Victoria Mar Del Platavirginiae
 
Curriculum Febbraio 2009
Curriculum Febbraio 2009Curriculum Febbraio 2009
Curriculum Febbraio 2009limpbizkit
 
Genano professional air decontamination
Genano professional air decontaminationGenano professional air decontamination
Genano professional air decontaminationpekka ilmaranta
 
Fuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarlaFuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarlaJaime Diaz
 
Dossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne AndristDossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne AndristAnne Andrist
 
Welcomm Presentation 2
Welcomm Presentation 2Welcomm Presentation 2
Welcomm Presentation 2Sonal Haja
 
Presentacion athagon ingame
Presentacion athagon ingamePresentacion athagon ingame
Presentacion athagon ingameAthagon
 
How Consumers Engage with Mobile Apps
How Consumers Engage with Mobile AppsHow Consumers Engage with Mobile Apps
How Consumers Engage with Mobile AppsSIXTY
 
120925 meroni polimi desis lab
120925 meroni polimi desis lab120925 meroni polimi desis lab
120925 meroni polimi desis labmakeacube
 
Green with liability
Green with liabilityGreen with liability
Green with liabilityFERMA
 
Tarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit MástercardTarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit MástercardBanco Nacional
 
The Search For Peace Pdrc
The Search For Peace PdrcThe Search For Peace Pdrc
The Search For Peace Pdrcibrahimrainbow
 

Viewers also liked (20)

CIMAT 2011 - Problemáticas
CIMAT 2011 - ProblemáticasCIMAT 2011 - Problemáticas
CIMAT 2011 - Problemáticas
 
Centro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad PresentacionCentro De Innovacion En Productividad Presentacion
Centro De Innovacion En Productividad Presentacion
 
Concepts of IT-Based Modern Living
Concepts of IT-Based Modern LivingConcepts of IT-Based Modern Living
Concepts of IT-Based Modern Living
 
Actividad 1. módulo vi. sustentación. clcp
Actividad 1. módulo vi. sustentación. clcpActividad 1. módulo vi. sustentación. clcp
Actividad 1. módulo vi. sustentación. clcp
 
Villa Victoria Mar Del Plata
Villa Victoria   Mar Del PlataVilla Victoria   Mar Del Plata
Villa Victoria Mar Del Plata
 
Curriculum Febbraio 2009
Curriculum Febbraio 2009Curriculum Febbraio 2009
Curriculum Febbraio 2009
 
Genano professional air decontamination
Genano professional air decontaminationGenano professional air decontamination
Genano professional air decontamination
 
Auxiliar juveniles 1 Trim 2011
Auxiliar juveniles 1 Trim 2011Auxiliar juveniles 1 Trim 2011
Auxiliar juveniles 1 Trim 2011
 
Fuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarlaFuerza vital, cómo recuperarla
Fuerza vital, cómo recuperarla
 
Dossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne AndristDossier pédagogique Visages d'enfants par Anne Andrist
Dossier pédagogique Visages d'enfants par Anne Andrist
 
Dsg Studie Emotions
Dsg Studie EmotionsDsg Studie Emotions
Dsg Studie Emotions
 
Welcomm Presentation 2
Welcomm Presentation 2Welcomm Presentation 2
Welcomm Presentation 2
 
Presentacion athagon ingame
Presentacion athagon ingamePresentacion athagon ingame
Presentacion athagon ingame
 
Master en Dirección y Gestión de Empresas de Moda
Master en Dirección y Gestión de Empresas de ModaMaster en Dirección y Gestión de Empresas de Moda
Master en Dirección y Gestión de Empresas de Moda
 
Reiner 940 HandJet printer
Reiner 940 HandJet printerReiner 940 HandJet printer
Reiner 940 HandJet printer
 
How Consumers Engage with Mobile Apps
How Consumers Engage with Mobile AppsHow Consumers Engage with Mobile Apps
How Consumers Engage with Mobile Apps
 
120925 meroni polimi desis lab
120925 meroni polimi desis lab120925 meroni polimi desis lab
120925 meroni polimi desis lab
 
Green with liability
Green with liabilityGreen with liability
Green with liability
 
Tarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit MástercardTarjeta prepago BN E-credit Mástercard
Tarjeta prepago BN E-credit Mástercard
 
The Search For Peace Pdrc
The Search For Peace PdrcThe Search For Peace Pdrc
The Search For Peace Pdrc
 

Similar to Harnessing Linked Knowledge Sources for Topic Classification in Social Media

Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Mariana Damova, Ph.D
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsAndre Freitas
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ Prateek Jain
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Seth Grimes
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webFabien Gandon
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisMathieu d'Aquin
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...Holistic Benchmarking of Big Linked Data
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018Andre Freitas
 
Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD Aldo Gangemi
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsStretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsAmparo Elizabeth Cano Basave
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generationkrisztianbalog
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsAndre Freitas
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)Frank van Harmelen
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 

Similar to Harnessing Linked Knowledge Sources for Topic Classification in Social Media (20)

Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011Contextual Ontology Alignment - ESWC 2011
Contextual Ontology Alignment - ESWC 2011
 
Effective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP SystemsEffective Semantics for Engineering NLP Systems
Effective Semantics for Engineering NLP Systems
 
NLP & DBpedia
 NLP & DBpedia NLP & DBpedia
NLP & DBpedia
 
ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+ ESWC 2011 BLOOMS+
ESWC 2011 BLOOMS+
 
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Auto...
 
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
bridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the webbridging formal semantics and social semantics on the web
bridging formal semantics and social semantics on the web
 
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept AnalysisExtracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
Extracting Relevant Questions to an RDF Dataset Using Formal Concept Analysis
 
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
EARL: Joint Entity and Relation Linking for Question Answering over Knowledge...
 
AI_Session 21 First order logic.pptx
AI_Session 21 First order logic.pptxAI_Session 21 First order logic.pptx
AI_Session 21 First order logic.pptx
 
Quantifying the bias in data links
Quantifying the bias in data linksQuantifying the bias in data links
Quantifying the bias in data links
 
Open IE tutorial 2018
Open IE tutorial 2018Open IE tutorial 2018
Open IE tutorial 2018
 
Framester and WFD
Framester and WFD Framester and WFD
Framester and WFD
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsStretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
 
Table Retrieval and Generation
Table Retrieval and GenerationTable Retrieval and Generation
Table Retrieval and Generation
 
Different Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering SystemsDifferent Semantic Perspectives for Question Answering Systems
Different Semantic Perspectives for Question Answering Systems
 
LDAvis
LDAvisLDAvis
LDAvis
 
How the Web can change social science research (including yours)
How the Web can change social science research (including yours)How the Web can change social science research (including yours)
How the Web can change social science research (including yours)
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 

More from Amparo Elizabeth Cano Basave

A Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesA Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesAmparo Elizabeth Cano Basave
 
Detecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaDetecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaAmparo Elizabeth Cano Basave
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsAmparo Elizabeth Cano Basave
 
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Amparo Elizabeth Cano Basave
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Amparo Elizabeth Cano Basave
 
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Amparo Elizabeth Cano Basave
 
Veracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web ResourcesVeracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web ResourcesAmparo Elizabeth Cano Basave
 

More from Amparo Elizabeth Cano Basave (13)

A Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political DebatesA Study of the Impact of Persuasive Argumentation in Political Debates
A Study of the Impact of Persuasive Argumentation in Political Debates
 
Detecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social mediaDetecting child grooming behaviour patterns on social media
Detecting child grooming behaviour patterns on social media
 
Violence det ijcnlp13-slideshare
Violence det ijcnlp13-slideshareViolence det ijcnlp13-slideshare
Violence det ijcnlp13-slideshare
 
Volatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity StreamsVolatile Classification of Point of Interests based on Social Activity Streams
Volatile Classification of Point of Interests based on Social Activity Streams
 
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
Sensing 
Presence
(PreSense)
Ontology
–
 
User 
Modelling
 in 
the 
Semantic ...
 
Topica
TopicaTopica
Topica
 
Does sizematter
Does sizematterDoes sizematter
Does sizematter
 
Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams Entity-Based Semantics Emerging from Personal Awareness Streams
Entity-Based Semantics Emerging from Personal Awareness Streams
 
Ekaw2010 tutorial3 practical
Ekaw2010 tutorial3 practicalEkaw2010 tutorial3 practical
Ekaw2010 tutorial3 practical
 
Ekaw2010 tutorial3
Ekaw2010 tutorial3Ekaw2010 tutorial3
Ekaw2010 tutorial3
 
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
Representing, Proving and Sharing Trustworthiness of Web Resources Using Vera...
 
Veracity poster
Veracity posterVeracity poster
Veracity poster
 
Veracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web ResourcesVeracity- Modeling and Proving Trustworthiness of Web Resources
Veracity- Modeling and Proving Trustworthiness of Web Resources
 

Recently uploaded

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Harnessing Linked Knowledge Sources for Topic Classification in Social Media

  • 1. A. Elizabeth Cano, Andrea VargaŸ, Matthew Rowew, Fabio CiravegnaŸ, and Yulan He° Knowledge Media Institute, The Open University, Milton Keynes Ÿ University of Sheffield, Sheffield w Lancaster University, Lancaster ° Aston University, Birmingham UK. 2013 Harnessing Linked Knowledge Sources for Topic Classification in Social Media
  • 2. INTRODUCTION Social Media Streams - Risk in violent and criminal activities
  • 3. INTRODUCTION Research Questions: o  Can semantic features help in topic classification (TC)? o  Which knowledge source (KS) data and KS taxonomies provide useful information for improving the TC of tweets?
  • 4. OUTLINE • Introduction - Topic Classification (TC) of Microposts - Related Work - State of the art limitations • Proposed Approach • Experiments • Findings • Conclusions
  • 5. INTRODUCTION u  Difficulties of Topic Classification of microposts o  Restricted number of characters o  Irregular and ill-formed words •  Mixing upper and lowercase letter §  Makes it difficult to detect proper nouns, and other part of speech tags. •  Wide variety of language §  E.g., “see u soon” o  Event-dependent emerging jargon • Volatile jargon relevant to particular events §  E.g., “Jan.25” (used during the Egyptian revolution o  High Topical Diversity o  Sparse data
  • 6. INTRODUCTION Social Knowledge Sources (KS) DBpedia* Yago2 Freebase Resources 2.35 million 447million 3.6 million Classes 359 562,312 1,450 Properties 1,820 253,213,842 7,000 *Using dbpedia ontology o  Structured Semantic Web Representation of data •  Maintained by thousand of editors §  E.g DBpedia, derived from Wikipedia §  Freebase •  Evolves and adapts as knowledge changes [Syed et al, 2008] o  Cover a broad range of topics o  Characterise topics with a large number of resources
  • 7. INTRODUCTION Local and External Metadata of a Tweet
  • 8. INTRODUCTION Local and External Metadata of a Tweet NER:CountryNER:Person NER:Person
  • 9. INTRODUCTION Local and External Metadata of a Tweet NER:CountryNER:Person NER:Person <http://dbpedia.org/resource/Barack_Obama <http://dbpedia.org/resource/Egypt <http://dbpedia.org/resource/Hosni_Mubarak
  • 10. PROPOSED APPROACH o  State of the art limitations §  Use of single knowledge sources §  Entities’ metadata is constrained by the used NER service (e.g OpenCalais, Alchemy). o  Our approach §  Exploits multiple knowledge sources. §  Enhances the entity metadata by deriving semantic graphs. §  Leverages the graph structures surrounding entities present in a KS for the TC task. Exploiting Knowledge Sources for the Topic Classification of Microposts
  • 11. OUTLINE • Introduction • Proposed Approach • Semantic Meta-graphs • Weighting Schemas • Enhancing TC with Semantic Features • Experiments • Findings • Conclusions
  • 13. PROPOSED APPROACH Rationale… 1 2 Could be more indicative of War and Conflict
  • 14. PROPOSED APPROACH Rationale… 2 Not necessarily a good indicator of War and Conflict
  • 15. PROPOSED APPROACH Rationale… 1 2 Can the graph structure of existing Knowledge sources provide an abstraction of the use of these entity types for representing a topic ?
  • 16. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 1 Datasets Collection SPARQL query for all resources from a given Topic (e.g. War )
  • 17. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 18. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 19. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 2 Datasets Enrichment From tweets and articles’ abstracts, extract entities and link them to resources in DBpedia and Freebase.
  • 20. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 3 Semantic Features Derivation
  • 21. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 4 Build a Topic Classifier based on Features Derived from Crossed-Sources
  • 22. PROPOSED APPROACH Framework for Topic Classification of Tweets Concept Enrichment DBFBDB-FB RetrieveArticles TW Retrieve Tweets Derive Semantic Features Build Cross-Source Topic Classifier Annotate Tweets 4 Build a Topic Classifier based on Features Derived from Crossed-Sources
  • 23. PROPOSED APPROACH Deriving Semantic Meta-Graphs <dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates> <dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  • 24. PROPOSED APPROACH Deriving Semantic Meta-Graphs <dbpedia:Barack_Obama, rdf:type, yago:PresidentOfTheUnitedStates> <dbpedia:Barack_Obama, dbo:birthPlace, dbpedia:Hawaii>
  • 25. PROPOSED APPROACH Definition 1- Resource Meta-graph Is a sequence of tuples G:=(R,P,C,Y) where •  R, P, C are finite sets whose elements are resources, properties and classes; •  Y is a ternary relation representing a hypergraph with ternary edges. •  Y is a tripartite graph where the vertices are Y ! R " P "C H Y( ) = V, D D = r, p,c{ } r, p,c( ) ! Y{ }
  • 26. PROPOSED APPROACH Resource Meta-graph The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity. Obama birthPlace author spouse Projecting on Properties Projecting on Classes LivingPeople PresidentOfTheUnitedStates Obama Person Author
  • 27. PROPOSED APPROACH Resource Meta-graph The meta-graph of entity e is the aggregation of all resources, properties and classes related to this entity. Obama birthPlace author spouse Projecting on Properties Projecting on Classes LivingPeople PresidentOfTheUnitedStates Obama Person Author How can we weight these graphs to reveal semantic features characterise Obama in the context of Violence? ? ? ? ? ?? ?
  • 28. PROPOSED APPROACH Weighting Semantic Features Specificity Measures the relative importance of a property to a given class in a KS graph GKS: p ! G e( ) c ! G e( ) specificityKS p,c( ) = pN R(c)( ) N(R(c))
  • 29. PROPOSED APPROACH Weighting Semantic Features Generality Captures the specialisation of a property p to a given class c, by computing the property’s frequency among other semantically related classes R’(c). Where N(R’(c)) is the number of resources whose type is either c or a specialisation of c’s parent classes. generalityKS p,c( ) = N R'(c)( ) pN (R'(c))
  • 30. PROPOSED APPROACH Weighting Semantic Features SG p,c( ) = specificityKS p,c( )! generalityKS p,c( )
  • 31. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation (A1) Class Features Property Features Class+ Property Features A1!CF' = F + CF A1!PF' = F + pF A1!C+PF' = F + cF + pF
  • 32. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation (A1) Class Features Property Features Class+ Property Features A1!CF' = F + CF A1!PF' = F + pF A1!C+PF' = F + cF + pF F president, obama, televised, statement, hosni, mubarak, resignation, cnn, says, egypt FA1+ P dbpedia:birth, dbpedia:state, …., dbpedia-owl:PopulatedPlace/ populationDensity…. FA1+ C PopulatedPlace, Office_holder, PresidentOfTheUnitedStates, Politician…
  • 33. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation with Generalisation (A2) This augmentation exploits the subsumption relation among classes within the DBpedia or Freebase ontologies. In this cases we consider the set of parent classes of c. Parent(c) Features Parent(c) + Property Features A2!CF' = F + parent(c)F A2!C+PF' = F + pF + parent(c)F
  • 34. PROPOSED APPROACH Enhancing Feature Space with Semantic Features Semantic Augmentation with Generalisation (A2) This augmentation exploits the subsumption relation among classes within the DBpedia or Freebase ontologies. In this cases we consider the set of parent classes of c. Parent(c) Features Parent(c)+Property Features A2!CF' = F + parent(c)F A2!C+PF' = F + pF + parent(c)F F president, obama, televised, statement, hosni, mubarak, resignation, cnn, says, egypt FA2+ parent(c) Place, Office_holder, President, Politician…
  • 36. PROPOSED APPROACH Datasets o  Twitter Dataset [Abel et al., 2011] (TW) §  Collected during two months starting on Nov 2010. §  Topically annotated §  Using tweets labelled as “War & Conflict” (War), “Law & Crime” (Cri), “Disaster & Accident” (DisAcc). §  Multilabelled dataset comprising 10,189 Tweets. o  DBpedia (DB) and Freebase (FB) Dataset §  SPARQL queried endpoints for all resources from categories and subcategories of skos:concept of War, Cri, DisAcc. •  DBpedia – 9,465 articles •  Freebase – 16,915 articles
  • 38. PROPOSED APPROACH Experimental Setup A 1.  Use annotated Tweets for training (TW) -  Baseline: Bag of Words (BoW), Bag of Entities (BoE), and Part of Speech tags (PoS). -  Enhance Features using the DBpedia and Freebase graphs. 2.  Train a SVM classifier based on the TW corpus. Trained/ Tested on 80%-20% over five independent runs. 3.  Compute Precision, Recall, and F-measure.
  • 40. PROPOSED APPROACH Experimental Setup B 1.  Use labelled articles from DBpedia (DB) and Freebase (FB) for training -  Baseline: Bag of Words (BoW), Bag of Entities (BoE), and Part of Speech tags (PoS). -  Enhance Features using the DBpedia and Freebase graphs. 2.  Train a SVM classifier based on the DB, FB, DB+FB, DB +FB+TW training corpus and test on TW. Trained/Tested on 80%-20% over five independent runs. 3.  Compute Precision, Recall, and F-measure.
  • 41. PROPOSED APPROACH Results for Training on KS articles, and Testing on TW
  • 42. PROPOSED APPROACH Factors contributing to the performance of a KS graph for TC 1.  Topic-Class Entropy 2.  Entity-Class Entropy 3.  Topic-Class-Property Entropy
  • 43. PROPOSED APPROACH Correlating Entropy metrics with the performance of the cross-source TC classifiers.
  • 44. PROPOSED APPROACH Correlating Entropy metrics with the performance of the cross-source TC classifiers. Indicates that the higher the number of ambiguous entities in a topic within a KS graph, the lower the performance of the TC.
  • 45. FINDINGS 1.  KSs combined with Twitter data provide complementary information for TC of Tweets, outperforming the KS approaches and the approach using Tweets only. 2.  A KS performance on TC depends on the coverage of the entities within that KS. 3.  When entities have low coverage in a KS, exploiting the mapping between corresponding KSs’ ontologies is beneficial.
  • 46. CONCLUSIONS •  Explored the task of topic classification of tweets •  Exploited information in KSs (e.g. DBpedia, Freebase) using semantic graphs for concepts and properties surrounding an entity. •  Presented the importance of considering graph structures in KSs for the supervised classification of tweets, by achieving significant improvement over various state-of-the-art approaches using both single KSs and Tweets only.
  • 47. CONTACT US A.  Elizabeth Cano •  http://people.kmi.open.ac.uk/cano/ B.  Andrea Varga •  http://sites.google.com/site/missandreavarga/ C.  Matthew Rowe •  http://lancs.ac.uk/staff/rowem/ D.  Fabio Ciravegna •  http://staffwww.dcs.shef.ac.uk/people/F.Ciravegna E.  Yulan He •  http://www1.aston.ac.uk/eas/staff/dr-yulan-he

Editor's Notes

  1. I will present a work done in collaboration with the universities of sheffield, lancaster and Aston. This work was done as part of the Violence Detection project which investigates different approaches for the detection of violence-related events emerging from social media streams.
  2. During the last 2 years we have witnessed the use of these services to express different emotions within society; these services have become a proxy of information which communicates the social perception of situations regarding for exampleTerrorismSocial Crisis RacismTherefore the real time identification of the topics discussed in these channels could aid in different scenarios includeing violence detection and emergency response situations.
  3. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  4. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  5. These two tweets make reference to the same entity, “President Obama”.However the context in which the entity is used is different, in the first case, the co-occurrence of Obama, Egypt and Mubarak could be more indicative of the War and Conlict topic, while in the second case the occurrence of President Obama and Michelle, is less likely to indicate a war and conflict related topic.So we wonder whether the graph structure of existing Knowledge source could aid in provide an abstraction of the use of these entity types for representing a topic.
  6. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  7. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  8. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and Connflict
  9. Our intuition indicates that in the first case, the role of Obama as President of the United States, could be more indicative for the topic War and ConnflictHow can we weight this graphs so as to reveal which of these features characterise Obama in the context of Violence?
  10. In order to capture the relative importance of each feature in a semantic meta-graph we propose two different weighting strategies. These are based on generality and specificity of a feature in a given meta-graph.Models the relative importance of a property p to a given class, together with the generality of the property in a KS’s graph.Where Np is the number of times property p appears in all resources of type c in the KS graph KS.
  11. In order to capture the relative importance of each feature in a semantic meta-graph we propose two different weighting strategies. These are based on generality and specificity of a feature in a given meta-graph.Models the relative importance of a property p to a given class, together with the generality of the property in a KS’s graph.Where Np is the number of times property p appears in all resources of type c in the KS graph KS.
  12. Where parent(c) denotes the total number of unique parent classes derived from a Ks graph.
  13. For evaluating the impact of enhancing the feature space with semantic features for the task of topic classification of tweets. We evaluated the performance of using a large corpus of tweets and a two large coverage KS which are Dbpedia and Freebase. The Twitter dataset was derived previously by Abel et al. and it comprises tweets which were collected during two months starting from November 2010. This dataset has been topically annotated.
  14. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.
  15. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.Topic-Class Entropy :- Low entropy(LE) indicates a focused topic, while high entropy(HE) indicates that it is more random on the subjects it discusses.Entity-Class Entropy: - LE indicates a topic is less ambiguous (i.e. entities belong to fewer classes, while (HE) high ambiguity at the level of the entities. Topic-Class-Property Entropy:- LE indicates a topic is dominated by few class-properties, while (HE) reveals high property diversity.
  16. The darker the closer to red the more correlated the values are. These indicates that as the number of ambiguous entities increases in a topic, the performance of the TC decreases.
  17. The darker the closer to red the more correlated the values are. These indicates that as the number of ambiguous entities increases in a topic, the performance of the TC decreases.
  18. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.
  19. For each of the tweets and each of the articles we performed lovins stemming and extracted entities using opencalais and zemanta. Then as described before we built the semantic metagraphs from DB and from Freebase KS. It is important to mention that the twitter dataset consists of tweets which contains at least one entity.