Entity Enrichment and Consolidation
in ARCOMEM
Elena Demidova1,
including slides by: Stefan Dietze1, Diana Maynard2, Thoma...
The ARCOMEM approach
• Make use of the Social Web
– Huge source of user generated content
– Wide range of articulation met...
The extraction components for text
Aim
 Extraction of Entities, Topics, Events and Opinions (ETOEs) from
 Web Pages
 So...
ETOE extraction with GATE: an example
Slide 4
candidate multi-word term
Data consolidation & integration problem
Data extracted from different components or during
different processing cycles no...
Data clustering & enrichment
Enrichment of entities with related references to Linked
Data, particularly reference dataset...
<Event>Trichet warns of systemic debt crisis</Event>
<Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation>...
<Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment>
<Enrichment>http://dbpedia.org/resource/ECB</Enri...
=> dbpprop:office dbpedia:President_of_the_European_Central_Bank
dbpedia:Governor_of_the_Banque_de_France
=> dcterms:subje...
ARCOMEM entities and enrichments - graph
Slide 10
 Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (...
 Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)
 1013 clusters of correlated entities/eve...
THANK YOU
CONTACT DETAILS
Dr. Elena Demidova
L3S Research Center
+49 511 762 17732
demidova@L3S.de
www.arcomem.eu
Upcoming SlideShare
Loading in …5
×

Arcomem training enrichment_beginner

261 views
237 views

Published on

This presentation on data enrichment is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
261
On SlideShare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Arcomem training enrichment_beginner

  1. 1. Entity Enrichment and Consolidation in ARCOMEM Elena Demidova1, including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3 1 L3S Research Center, Hannover, Germany 2 University Sheffield, UK 3 IMIS, RC ATHENA, Athens, Greece
  2. 2. The ARCOMEM approach • Make use of the Social Web – Huge source of user generated content – Wide range of articulation methods From simple „I like it“-Buttons to complete articles – Represents the diversity of opinions of the public • User activities often triggered by – Events and related entities (e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations) – Topics (e.g. Global Warming, Financial Crisis, Swine Flu)  A semantic-aware and socially-driven preservation model is a natural way to go Slide 2
  3. 3. The extraction components for text Aim  Extraction of Entities, Topics, Events and Opinions (ETOEs) from  Web Pages  Social Web (Twitter, YouTube, Facebook, …) Challenges  Entity recognition from degraded input sources (tweets etc)  Advancing state of the art NLP and text mining  Dynamics detection: evolution of terms/entities  Semantic representation of Web objects and entities  Appropriate RDF schemas for ETOE and Web objects  Exploiting (Linked Open) Web data to enrich extracted ETOE  Entity classification (into events, locations, topics etc) & consolidation Slide 3
  4. 4. ETOE extraction with GATE: an example Slide 4 candidate multi-word term
  5. 5. Data consolidation & integration problem Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required. Slide 5 <Location>Greece</Location> <Person>Venizelos</Person> <Location>Griechenland</Location> <Organisation>Greek Parliament</Organisation> ?
  6. 6. Data clustering & enrichment Enrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …) => use enrichments for correlation/clustering/consolidation Slide 6
  7. 7. <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 7
  8. 8. <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 8
  9. 9. => dbpprop:office dbpedia:President_of_the_European_Central_Bank dbpedia:Governor_of_the_Banque_de_France => dcterms:subject category:Living_people category:Karlspreis_recipients category:Alumni_of_the_École_Nationale_d'Administration category:People_from_Lyon… <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 9
  10. 10. ARCOMEM entities and enrichments - graph Slide 10  Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)  1013 clusters of correlated entities/events
  11. 11.  Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)  1013 clusters of correlated entities/events => cluster expansion by considering related enrichments ARCOMEM entities and enrichments - graph Slide 11
  12. 12. THANK YOU CONTACT DETAILS Dr. Elena Demidova L3S Research Center +49 511 762 17732 demidova@L3S.de www.arcomem.eu

×