• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content

Annotating streams of heterogeneous data for topic generation

by Postdoctoral Researcher on Feb 06, 2013


Talk given at the VU University Amsterdam, NL - February 6, 2013 ...

Talk given at the VU University Amsterdam, NL - February 6, 2013

Abstract: Since the advent of Linked Data, we have observed a dramatic increase of structured data sources published on the Web. They provide mainly entity to entity interconnections, resulting in a Web of Linked Entities, disambiguated through URIs, spanning structured and unstructured data. Several efforts have been made to exploit such a mine of information for enhancing text understanding, by connecting pieces of text to real world objects, i.e. entities, that are easily discoverable by intelligent agents, resulting in a proliferation of different systems for text annotation through "Web" entities.

In this perspective, we have developed a framework for harmonizing the access to such systems and their output results. The NERD ontology [1] aligns the difference in the annotations and provide a definition for a set of axioms taken from the long tail distribution of common classes among the used extractors. Powered on top of the NERD ontology, we have developed NERD [2] which implements a combined logic that looks for minimizing the error of annotation taking the best, when possible, from these extractors. We have observed that the well-known entity classes, such as Person, Location, Organization are well covered from these extractors, while Event is less, mainly due to a lack of definition and knowledge about what are events. As a follow-up of the Eventmedia project [3], we are defining an event spotter which takes advantage from the large event graph knowledge described in the Eventmedia dataset [4].

Sources of structured and unstructured data are also social platforms. They constantly record streams of heterogeneous data about human’s activities, feelings, emotions, conversations opening a window to the world in real-time. Making sense out of these streams is extremely challenging. We are currently investigating the role of named entities as centroids for micropost topic generations, presenting
them through visual galleries.

[1] - http://nerd.eurecom.fr/ontology
[2] - http://nerd.eurecom.fr
[3] - http://eventmedia.eurecom.fr
[4] - http://eventmedia.eurecom.fr/sparql



Total Views
Views on SlideShare
Embed Views



3 Embeds 193

http://www.linkedtv.eu 183
https://twitter.com 9
http://translate.googleusercontent.com 1



Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Post Comment
Edit your comment

Annotating streams of heterogeneous data for topic generation Annotating streams of heterogeneous data for topic generation Presentation Transcript