• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Arcomem training – Enrichment Beginner (update)

on

  • 358 views

This presentation on data enrichment is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social ...

This presentation on data enrichment is part of the ARCOMEM training curriculum. Feel free to roam around or contact us on Twitter via @arcomem to learn more about ARCOMEM training on archiving Social Media.

Statistics

Views

Total Views
358
Views on SlideShare
358
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Arcomem training – Enrichment Beginner (update) Arcomem training – Enrichment Beginner (update) Presentation Transcript

    • Entity Enrichment and Clustering in ARCOMEM Elena Demidova1, including slides by: Stefan Dietze1, Diana Maynard2, Thomas Risse1, Wim Peters2, Katerina Doka3, Yannis Stavrakas3 1 L3S Research Center, Hannover, Germany 2 University 3 Sheffield, UK IMIS, RC ATHENA, Athens, Greece
    • The ARCOMEM approach • Make use of the Social Web – Huge source of user generated content – Wide range of articulation methods From simple „I like it“-Buttons to complete articles – Represents the diversity of opinions of the public • User activities often triggered by – Events and related entities (e.g. Sport Events, Celebrations, Crises, News Articles, Persons, Locations) – Topics (e.g. Global Warming, Financial Crisis, Swine Flu) A semantic-aware and socially-driven preservation model is a natural way to go Slide 2
    • The extraction components for text Aim Extraction of Entities, Topics, Events and Opinions (ETOEs) from Web Pages Social Web (Twitter, YouTube, Facebook, …) Challenges Entity recognition from degraded input sources (tweets etc) Advancing state of the art NLP and text mining Dynamics detection: evolution of terms/entities Semantic representation of Web objects and entities Appropriate RDF schemas for ETOE and Web objects Exploiting (Linked Open) Web data to enrich extracted ETOE Entity classification (into events, locations, topics etc) & consolidation Slide 3
    • ETOE extraction with GATE: an example candidate multi-word term Slide 4
    • Data consolidation & integration problem Data extracted from different components or during different processing cycles not aligned => consolidation, disambiguation & correlation required. <Location>Greece</Location> <Person>Venizelos</Person> <Location>Griechenland</Location> <Organisation>Greek Parliament</Organisation> ? Slide 5
    • Data enrichment & clustering Enrichment of entities with related references to Linked Data, particularly reference datasets (DBpedia, Freebase, …) => use enrichments for clustering/correlation/consolidation Slide 6
    • Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> Slide 7
    • Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> Slide 8
    • Enrichment for clustering & correlation: example <Person>Jean Claude Trichet</Person> <Organisation>ECB</Organisation> <Event>Trichet warns of systemic debt crisis</Event> <Enrichment>http://dbpedia.org/resource/Jean-Claude_Trichet</Enrichment> <Enrichment>http://dbpedia.org/resource/ECB</Enrichment> => dbpprop:office => dcterms:subject dbpedia:President_of_the_European_Central_Bank dbpedia:Governor_of_the_Banque_de_France category:Living_people category:Karlspreis_recipients category:Alumni_of_the_École_Nationale_d'Administration category:People_from_Lyon Slide 9
    • ARCOMEM entities, enrichments & clusters Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange) 1013 clusters of correlated entities/events Cluster built around enrichment db:Market Slide 10
    • ARCOMEM entities and enrichments - graph Nodes: entities/events (blue), enrichments DBpedia (green), Freebase (orange) 1013 clusters of correlated entities/events => cluster expansion by considering related enrichments Cluster expansion Cluster built around enrichment db:Market Slide 11
    • THANK YOU CONTACT DETAILS Dr. Elena Demidova L3S Research Center +49 511 762 17732 demidova@L3S.de www.arcomem.eu