Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Arcomem training enrichment_beginner

325 views

Published on

This presentation on data enrichment is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Published in: Education, Technology
  • Be the first to comment

  • Be the first to like this

Arcomem training enrichment_beginner

  1. 1. Entity Enrichment and Consolidation in ARCOMEM Elena Demidova1, including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3 1 L3S Research Center, Hannover, Germany 2 University Sheffield, UK 3 IMIS, RC ATHENA, Athens, Greece
  2. 2. The ARCOMEM approach • Make use of the Social Web – Huge source of user generated content – Wide range of articulation methods From simple „I like it“-Buttons to complete articles – Represents the diversity of opinions of the public • User activities often triggered by – Events and related entities (e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations) – Topics (e.g. Global Warming, Financial Crisis, Swine Flu)  A semantic-aware and socially-driven preservation model is a natural way to go Slide 2
  3. 3. The extraction components for text Aim  Extraction of Entities, Topics, Events and Opinions (ETOEs) from  Web Pages  Social Web (Twitter, YouTube, Facebook, …) Challenges  Entity recognition from degraded input sources (tweets etc)  Advancing state of the art NLP and text mining  Dynamics detection: evolution of terms/entities  Semantic representation of Web objects and entities  Appropriate RDF schemas for ETOE and Web objects  Exploiting (Linked Open) Web data to enrich extracted ETOE  Entity classification (into events, locations, topics etc) & consolidation Slide 3
  4. 4. ETOE extraction with GATE: an example Slide 4 candidate multi-word term
  5. 5. Data consolidation & integration problem Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required. Slide 5 <Location>Greece</Location> <Person>Venizelos</Person> <Location>Griechenland</Location> <Organisation>Greek Parliament</Organisation> ?
  6. 6. Data clustering & enrichment Enrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …) => use enrichments for correlation/clustering/consolidation Slide 6
  7. 7. <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 7
  8. 8. <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 8
  9. 9. => dbpprop:office dbpedia:President_of_the_European_Central_Bank dbpedia:Governor_of_the_Banque_de_France => dcterms:subject category:Living_people category:Karlspreis_recipients category:Alumni_of_the_École_Nationale_d'Administration category:People_from_Lyon… <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> <Event>Trichet warns of systemic debt crisis</Event> <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> Enrichment for clustering & correlation: example Slide 9
  10. 10. ARCOMEM entities and enrichments - graph Slide 10  Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)  1013 clusters of correlated entities/events
  11. 11.  Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange)  1013 clusters of correlated entities/events => cluster expansion by considering related enrichments ARCOMEM entities and enrichments - graph Slide 11
  12. 12. THANK YOU CONTACT DETAILS Dr. Elena Demidova L3S Research Center +49 511 762 17732 demidova@L3S.de www.arcomem.eu

×