News construction from microblogging posts using open data
News construction from
microblogging post using linked
Information access can be limited in some
situations where traditional media outlets
can’t cover the events due to geographical
limitations or censorship in situations such
as civil unrest, war or natural disasters.
In this research we propose a method to
create searchable, semantically annotated
news articles from tweets in an automated
way using the cloud of linked open data.
“Everyone has the right to freedom of opinion and expression; this right includes
freedom to hold opinions without interference and to seek, receive and impart
information and ideas through any media and regardless of frontiers.”
A tweet is not a document it
will be unreachable in few days
and the information lost.
We want to create a
news article from the
tweet using the cloud of
linked open data
message into a
document that can be
retrive and use later
What we want to do
Determine thet 5 W's of the post
● Who is it about?
● What happened?
● Where did it take place?
● When did it take place?
● Why did it happen?
● How did it happen?
Use the cloud of linked open data to expand each
concept,person, organization, place or action decribed
in the post
List of candidates
Word type recognition
List of candidates
Virtuoso EndpointTurtle File
Our method - overview
● We selected 90 tweets directly from the Twitter search
on 3 subjects: The Brazilian riot during the 2014 world
cup, Barack Obama and Venezuela.
● Manually tag each tweet (twice)
● Run the automated aproach and compare the results
● Use a federated engine (ANAPSID) to provide a more
complete information on the subject.
● Desing and implement and algorithm that retrieves all
relevant information from the linked open data cloud.
● Use open data to resolve the disambiguation problem to
minimize the incorrectly suggested concepts.
These results encourage us to further develop the method and
the system to solve first the disambiguation problems and to
create a more ambitious approach that will allow us to create a
semantically annotated news stream based not only on tweet,
but also includes other microblogging services, independent
blogs and corporate media outlets that can serve a centralized
semantic endpoint for data journalism.