Analyzing Published and Consumed Digital & Digitized News

Analyzing Published and Consumed
Digital & Digitized News
Martijn Kleppe
Vrije Universiteit Amsterdam
m.kleppe@vu.nl
www.martijnkleppe.nl
@martijnkleppe
Slides on Slideshare:
bit.ly/LeuvenKleppe
Social Media: Incubators of a renewed news media landscape
27 November 2015
Leuven

Newstracker

How do media cover
debates in Dutch Parliament?

http://www.tweedekamer.nl/hoe_werkt_het/de_tweede_kamer_in_beeld/handelingenkamer

https://commons.wikimedia.org/wiki/File:Hilversum-Nieuwjaarsborrel_WMNL_2015_bij_Beeld_en_Geluid_(9).JPG

PoliMedia approach
PoliMedia
Search
through
parliamentary
debate
Newspapers
KB
Television
Sound and Vision
Radio
KB
Dutch
Digital
Parliament
KB

Link debates to news items
Intuition 1: The news item contains a topic a/o name of a
politician and is published within a week after a debate
Intuition 2: The more overlap in topics and named entities, the
more probably there is a link.

Not only USE data
Also GIVE data

“Give me all fragments of
debates with over 60
related news items”
SELECT ?speech ?no_newsitems {{
SELECT ?speech (COUNT(?news) AS ?no_news_items)
WHERE{
?speech <http://purl.org/linkedpolitics/nl/polivoc#coveredAt>
?news .
}
GROUP BY ?speech }
FILTER (?no_news_items > 60) }
SPARQL Endpoint

• Yeah! It works (but no television)
• Not perfect
• But still ok (recall: 62%; precision: 80%)
• It is open for everyone: www.polimedia.nl
• + via a Sparql Endpoint
• People actually use it 
Results

NRC Handelsblad, Ewoud Sander, Voor al haar mantelzorgen, 14 April 2014
“Another digital source
I often use is PoliMedia.nl”
Yeah! An article in
NRC HANDELSBLAD!

• Yeah! It works (but no television)
• Not perfect
• But still ok (recall: 62%; precision: 80%)
• It is open for everyone: www.polimedia.nl
• + via a Sparql Endpoint
• People actually use it 
• We want more: social media, television, recent data
Results

Credits
Martijn Kleppe
Max Kemman
Henri Beunders
Laura Hollink
Damir Juric
Geert Jan Houben
Jaap Blom
Johan Oomen
Financed by Data files

The New News Consumer
www.news-use.com
Newstracker

http://www.nrc.nl/apps/bigboard/

THAT HOWWHAT?
Focus on most
read articles
24/7 News
consumption?

THAT HOWWHAT?
What genres of news websites
do news users consume 24/7?

THAT HOWWHAT?
What genres of news websites
do news users consume 24/7?
For what do news users
consume these websites 24/7?
How does the consumption of news websites
fit in their everyday surfing behavior?

Reading
Watching
Viewing
Listening
Checking
Snacking
Monitoring
Searching
Clicking

The Newstracker
• Collects web activities
• Of specified & authenticated users
• Via a custom built system
• That collects & cleans web activities
• Extracts textual & visual content of news websites
• And stores this as a 1 dataset

The Newstracker
Web activities is a lot…
And monitoring everything is quite privacy intrusive…
So selection and structure is needed, via:
• Whitelist of 4.000 websites
• Labels indicating genre of website
• Subgenres of News and Information websites

The Newstracker
Internet
Cleaning & Processing:
Deduplicate, Only websites on
whitelist, Add labels

The Newstracker
• April – July 2015
• 42 participants: students
• Laptop main device
• 16.162 registered, relevant & labelled URLs
• 20 in-depth interviews

Results
News and Information
Shopping
Education
Search Engines
Video, Music & Radio

Visited websites per genre during the day
Website selection depends on
genre & time of the day

Visited News and Information websites
during the day
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
00.00 -
06.00 -
Night
06.00 -
09.00 -
Waking up
09.00 -
12.00 -
Morning
12.00 -
13.00 -
Lunch
13.00 -
17.00 -
Afternoon
17.00 -
19.00 - Eve
19.00 -
00.00 -
Evening
Generic Lifestyle Remarkable
Subgenre of News and Information
website is a determining factor

How?
Via Homepage Via Referral
TOTAL
General News
Lifestyle
Remarkable
59% 41%
64% 36%
49% 51%
48% 52%

“Lindanieuws.nl is more
entertainment. Sometimes I
really think ‘this makes no
sense’, but it is fun to read. It’s
more entertainment then true
news, the way I consume it”.
Lean-back
Snacking

“Fashion is my hobby”.
• Visits same websites everyday
• In same order
• Starts at homepage
Lean-forward
monitoring

Lean-forward
monitoring
Lean-back
Snacking

Date Time URL
26-4-2015 16:53:05 http://www.vi.nl/home.htm
26-4-2015 16:54:02 http://www.vi.nl/nieuws/promes-maakt-opnieuw-het-verschil-voor-spartak.htm
26-4-2015 17:00:20
http://www.soccernews.nl/news/313971/Kramer_wil_PSV-
aanvallers_verslaan:_Ik_sta_er_dichtbij
26-4-2015 17:01:51 http://www.google.nl/
26-4-2015 17:02:01 http://en.wikipedia.org/wiki/Michiel_Kramer
26-4-2015 17:02:15 http://en.wikipedia.org/wiki/Mike_van_Duinen
26-4-2015 17:02:23 http://en.wikipedia.org/wiki/Gervane_Kastaneer
26-4-2015 17:03:00 http://nl.wikipedia.org/wiki/Wilmer_Kousemaker
26-4-2015 17:03:09 http://en.wikipedia.org/wiki/Wilmer_Kousemaker
26-4-2015 17:04:15 http://nl.wikipedia.org/wiki/Benny_Kerstens
26-4-2015 17:04:39 http://nl.wikipedia.org/wiki/Aykut_Demir
“Via VI.nl, I get to the Wikipedia
lemma of e.g. Wesley Sneijder
and then I look to a team player
of Sneijder and think ‘He!’ you’re
playing for years at that club and
then I click further.”
Lean-forward
monitoring
Serendipitous
consumption

Conclusion
News consumption 24/7
BUT….
Which website
When
In which order
≠ Subgenre News Website

Conclusion
News consumption 24/7
BUT….
Which website
When
In which order
=
Personal interest plays
essential role in what
they consider to be
news,
and determines the
pattern of everyday
news consumption

What’s next-2?
 Different usergroups:
• Different age groups
• Regional News
• Tech news
• Other countries
 Requires:
• Updated website whitelist
• Updated scraping templates

What’s next-3?
What role do form and content play?
26-4-2015 16:52:59 user28 http://www.bbc.co.uk/sport/0/football/32470569
26-4-2015 16:53:02 user28 http://www.bbc.com/sport/0/football/32470569
26-4-2015 16:53:05 user28 http://www.vi.nl/home.htm
26-4-2015 16:54:02 user28 http://www.vi.nl/nieuws/promes-maakt-opnieuw-het-verschil-voor-spartak.htm
26-4-2015 17:00:20 user28 http://www.soccernews.nl/news/313971/Kramer_wil_PSV-aanvallers_verslaan:_Ik_sta_er_dichtbij
News
+Sport
Next step:
• Automated
Content Analysis
of text on
topic + style
• Visual Content
Analysis

Acknowledgements
www.news-use.com
Marco Otte Hildebrand Bijleveld Leonie Durlinger Stefan Heijdra
Irene Costera Meijer Marcel Broersma Tim Groot KormelinkChris Peters Joelle SwartAnna van
Cauwenberge

Acknowledgements
www.news-use.com

Questions?
Martijn Kleppe
Vrije Universiteit Amsterdam
m.kleppe@vu.nl
@martijnkleppe
www.martijnkleppe.nl
www.polimedia.nl
www.news-use.com
Slides on Slideshare:
bit.ly/LeuvenKleppe
Social Media: Incubators of a renewed news media landscape
27 November 2015
Leuven

Analyzing Published and Consumed Digital & Digitized News

Recommended

Recommended

More Related Content

Similar to Analyzing Published and Consumed Digital & Digitized News

Similar to Analyzing Published and Consumed Digital & Digitized News (20)

More from Martijn Kleppe

More from Martijn Kleppe (16)

Recently uploaded

Recently uploaded (20)

Analyzing Published and Consumed Digital & Digitized News

Editor's Notes