News construction from microblogging posts using open data


Published on

Presentation of the report of the same name. The research was made for the course on Semantic Web at the Univesidad Simon Bolívar.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

News construction from microblogging posts using open data

  1. 1. News construction from microblogging post using linked open data
  2. 2. Introduction Information access can be limited in some situations where traditional media outlets can’t cover the events due to geographical limitations or censorship in situations such as civil unrest, war or natural disasters. In this research we propose a method to create searchable, semantically annotated news articles from tweets in an automated way using the cloud of linked open data.
  3. 3. Motivation “Everyone has the right to freedom of opinion and expression; this right includes freedom to hold opinions without interference and to seek, receive and impart information and ideas through any media and regardless of frontiers.”
  4. 4. An example A tweet is not a document it will be unreachable in few days and the information lost.
  5. 5. An example We want to create a news article from the tweet using the cloud of linked open data transforming the message into a document that can be retrive and use later
  6. 6. What we want to do Determine thet 5 W's of the post ● Who is it about? ● What happened? ● Where did it take place? ● When did it take place? ● Why did it happen? ● How did it happen? Use the cloud of linked open data to expand each concept,person, organization, place or action decribed in the post
  7. 7. Tweet ID Twitter API Tweet Text Wordnet (Local) List of candidates Wikipedia API Word type recognition Noise removal Dbpedia Endpoint List of candidates with know wikipedia page Sparql query Semantic information Author information Virtuoso EndpointTurtle File Our method - overview
  8. 8. The rNews core news ontology
  9. 9. Experimentation ● We selected 90 tweets directly from the Twitter search on 3 subjects: The Brazilian riot during the 2014 world cup, Barack Obama and Venezuela. ● Manually tag each tweet (twice) ● Run the automated aproach and compare the results
  10. 10. Results Expected Terms: 413 Found: 433 Expected and found: 317 No added Value: 63 Wrong: 53 Precision: 76.36 % Errors: 12.24
  11. 11. Future work ● Use a federated engine (ANAPSID) to provide a more complete information on the subject. ● Desing and implement and algorithm that retrieves all relevant information from the linked open data cloud. ● Use open data to resolve the disambiguation problem to minimize the incorrectly suggested concepts.
  12. 12. Conclutions These results encourage us to further develop the method and the system to solve first the disambiguation problems and to create a more ambitious approach that will allow us to create a semantically annotated news stream based not only on tweet, but also includes other microblogging services, independent blogs and corporate media outlets that can serve a centralized semantic endpoint for data journalism.
  13. 13. Thank you !