Annotating streams of heterogeneous data for topic generation
by Università degli Studi di Torino on Feb 06, 2013
- 922 views
Talk given at the VU University Amsterdam, NL - February 6, 2013 ...
Talk given at the VU University Amsterdam, NL - February 6, 2013
Abstract: Since the advent of Linked Data, we have observed a dramatic increase of structured data sources published on the Web. They provide mainly entity to entity interconnections, resulting in a Web of Linked Entities, disambiguated through URIs, spanning structured and unstructured data. Several efforts have been made to exploit such a mine of information for enhancing text understanding, by connecting pieces of text to real world objects, i.e. entities, that are easily discoverable by intelligent agents, resulting in a proliferation of different systems for text annotation through "Web" entities.
In this perspective, we have developed a framework for harmonizing the access to such systems and their output results. The NERD ontology  aligns the difference in the annotations and provide a definition for a set of axioms taken from the long tail distribution of common classes among the used extractors. Powered on top of the NERD ontology, we have developed NERD  which implements a combined logic that looks for minimizing the error of annotation taking the best, when possible, from these extractors. We have observed that the well-known entity classes, such as Person, Location, Organization are well covered from these extractors, while Event is less, mainly due to a lack of definition and knowledge about what are events. As a follow-up of the Eventmedia project , we are defining an event spotter which takes advantage from the large event graph knowledge described in the Eventmedia dataset .
Sources of structured and unstructured data are also social platforms. They constantly record streams of heterogeneous data about human’s activities, feelings, emotions, conversations opening a window to the world in real-time. Making sense out of these streams is extremely challenging. We are currently investigating the role of named entities as centroids for micropost topic generations, presenting
them through visual galleries.
 - http://nerd.eurecom.fr/ontology
 - http://nerd.eurecom.fr
 - http://eventmedia.eurecom.fr
 - http://eventmedia.eurecom.fr/sparql
- Total Views
- Views on SlideShare
- Embed Views