Presentation of NewsReader as keynote for VOGIN-IP 2015. Can we handle the news? How computers reads millions of news articles to extract what, when, where and who is involved over longer periods of time. News reading technology is developed for 4 languages (English, Dutch Spanish and Italian), creating RDF from text.
2. Can we handle the news?
• Information broker LexisNexis:
• 1.5 million news articles on a single working day
• 30,000 different sources
3. How did the
automotive industry change
during the financial crisis?
• 6 million English articles in the LexisNexis
archive published in the last 10 years
• In 2012: 2 million Google hits for “Volkswagen
takeover” not sorted by publication date
7. VOLUME OF CHANGE
7
1995 96 97 98 99 2000 01 02 03 04 2005 06 07 08 09 2010 11 12 13 14 2015
1995 96 97 98 99 2000 01 02 03 04 2005 06 07 08 09 2010 11 12 13 14 2015
SpeculationPast New New
On 16 September 2008,
Porsche increased its
shares by another
4.89%, in effect taking
control of the company,
with more than 35% of
the voting rights.
6 Jan 2009 – Porsche
has been on a quest
to takeover VW for
more than two years.
Past
8. VOLUME OF CHANGE
8
1995 96 97 98 99 2000 01 02 03 04 2005 06 07 08 09 2010 11 12 13 14 2015
1995 96 97 98 99 2000 01 02 03 04 2005 06 07 08 09 2010 11 12 13 14 2015
SpeculationPast New New
205M mentions
27M entities
1.3
MILLION
ARTICLES
HOW
MANY
CHANGES?
On 16 September 2008,
Porsche increased its
shares by another
4.89%, in effect taking
control of the company,
with more than 35% of
the voting rights.
6 Jan 2009 – Porsche
has been on a quest
to takeover VW for
more than two years.
Past
Volume of
entities
10. NewsReader (ict316404)
• Reading Technology to process massive streams of
news from many different sources in 4 languages
(English, Dutch, Spanish and Italian):
• Recording the changes in the world as they are told in
the media over long periods of time → history-
recorder.
• What happened, where and when, who was involved.
• Who made what statement, where do sources agree
and disagree: provenance!
23. e
e
p
p
t
l
p
p
t
p
e
e
t
e
Event coreference
• Instance based event-coreference:
• All event mentions with similar Iemma
and same time anchor
• Share at least one actor (possibly
DBPedia URI)
• Share at least one place (possibly
DBPedia URI)
• Aggregation of SEM instances from NAF
mentions and the extraction of
provenance layers through named graphs
similar
similar
similar
similar
coref
coref
coref
coref
l
buy/acquire sell/sales
- time
- concept
- participant
coref
24. SEM in RDF-TriG format
ENTITY INSTANCE
<http://dbpedia.org/resource/PorscheSE>
rdfs:label "Porsche" , “Porsche company" ;
gaf:denotedBy
<nwr:data/cars/2013/1/1/5760-PM51-JD34-
P4RM.xml#char=98,104> ,
<nwr:data/cars/2013/1/1/57K5-FKK1-
DYBW-2534.xml#char=44934,44940> .
dbo:Agent,Company
dbr:Privately_held_company
schema.org/Organization
25. SEM in RDF-TriG format
EVENT INSTANCE
<nwr:data/cars/2013/1/1/5758-BPN1-F0J6-D2T2.xml#sellEvent>
a fn:Commerce_sell , fn:Commerce_buy;
rdfs:label "sell" , “buy”;
gaf:denotedBy
<nwr:data/cars/2013/1/1/5758-BPN1-F0J6-
D2T2.xml#char=12,15> ,
<nwr:data/cars/2013/1/1/5758-BPN1-F0J6-
D2T2.xml#char=1352,1356> ,
<nwr:data/cars/2013/1/1/5760-PM51-JD34-
P4H7.xml#char=1536,1540>.
36. Perspective model
• Chrysler expects to sell 5,000 diesel Liberty
SUVs, President Dieter Zetsche says at a
DaimlerChrysler Innovation Symposium in New
York.
• Manfred Bisschoff said that Chrysler will
probably not sell 5,000 diesel Liberty SUVs
37.
38.
39. Conclusions
• Event-centric presentation of news
• Capturing the changes and not the amount of talk
• Perspective model to capture the relation between the source
and the statements
• From text to RDF, from unstructured to structured
• Reasoning over long-term developments involving millions of
participants
• Querying and interacting with the data through advanced
visualisation techniques