SlideShare a Scribd company logo
1 of 28
Download to read offline
Readability and Linguistic Subjectivity of News
Ilias Flaounas
University of Bristol
February 22, 2011
I. Flaounas (University of Bristol) February 22, 2011 1 / 21
Our research area
I. Flaounas (University of Bristol) February 22, 2011 2 / 21
Traditional Research on Media
few outlets per study (< 10)
limited numbers of news-items
(few hundreds in best cases)
restricted time periods (few
days)
news-items from a single
country’s media
manual annotation (‘coding’)
commercial databases – and
their constrains
hypothesis driven research
I. Flaounas (University of Bristol) February 22, 2011 3 / 21
Research Focus
In our research we undertake a large-scale mainstream news-media textual
content analysis using automated techniques.
I. Flaounas (University of Bristol) February 22, 2011 4 / 21
Research Focus
In our research we undertake a large-scale mainstream news-media textual
content analysis using automated techniques.
‘Mainstream news-media’ since we do not focus on modern
online-only news spreading means such as blogs or Twitter.
I. Flaounas (University of Bristol) February 22, 2011 4 / 21
Research Focus
In our research we undertake a large-scale mainstream news-media textual
content analysis using automated techniques.
‘Mainstream news-media’ since we do not focus on modern
online-only news spreading means such as blogs or Twitter.
‘Textual’ since we use only the textual information of news rather
than analysing e.g. images, videos, or speech.
I. Flaounas (University of Bristol) February 22, 2011 4 / 21
Research Focus
In our research we undertake a large-scale mainstream news-media textual
content analysis using automated techniques.
‘Mainstream news-media’ since we do not focus on modern
online-only news spreading means such as blogs or Twitter.
‘Textual’ since we use only the textual information of news rather
than analysing e.g. images, videos, or speech.
‘Large-scale’ since we analyse hundreds of outlets, typically for
extended periods of time, involving millions of news items.
I. Flaounas (University of Bristol) February 22, 2011 4 / 21
Research Focus
In our research we undertake a large-scale mainstream news-media textual
content analysis using automated techniques.
‘Mainstream news-media’ since we do not focus on modern
online-only news spreading means such as blogs or Twitter.
‘Textual’ since we use only the textual information of news rather
than analysing e.g. images, videos, or speech.
‘Large-scale’ since we analyse hundreds of outlets, typically for
extended periods of time, involving millions of news items.
‘Automated’ in the sense that the analysis is performed by applying
Artificial Intelligence techniques rather than using human ‘coders’.
In our research data management is also a challenge!
I. Flaounas (University of Bristol) February 22, 2011 4 / 21
NOAM: News Outlets Analysis & Monitoring system
I. Flaounas, O. Ali, M. Turchi, T. Snowsill, F. Nicart, T. De Bie, N. Cristianini: “NOAM: News Outlets Analysis and Monitoring
System”, SIGMOD, Accepted for publication, 2011.
I. Flaounas (University of Bristol) February 22, 2011 5 / 21
Current Status
Our corpus in numbers:
> 1300 multilingual news sources
> 3000 news feeds
133 countries
22 languages
> 3 years of continuous monitoring
40K news items per day
> 30M news items in total
I. Flaounas (University of Bristol) February 22, 2011 6 / 21
Support Vector Machines as Topic Taggers
We trained 15 topic taggers on 5 years data from:
◮ Reuters
◮ NY Times
Typical text preprocessing: Stemming, stop-words removal, TF-IDF
Two-class SVMs
Cosine similarity
Empirical tuning of C parameter per tagger
We set decision threshold to get maximum F0.5-Score at the testset
I. Flaounas (University of Bristol) February 22, 2011 7 / 21
SVM Taggers
Trained on Reuters & NY Times corpora
Topic F0.5-Score F0.5 Std.Dev. Precision Recall
CRIME 78.92 1.51 82.93 66.59
DISASTERS 83.4 3.7 87.69 70.34
ELECTIONS 70.32 8.74 78.99 49.32
FASHION 83.88 18.61 94.61 71.27
INFLATION-PRICES 77.01 3.19 81.45 63.38
MARKETS 92.02 0.32 94.09 84.63
PETROLEUM 70.67 2.78 75.14 58.73
SCIENCE 73.63 5.17 83.72 50.62
SPORTS 97.78 0.5 98.31 95.75
WEATHER 71.43 3.68 82.91 46.84
ART 81.67 1.34 84.9 71.38
BUSINESS 81.16 1.19 86.23 65.87
ENVIRONMENT 64.29 4.26 73.48 43.7
POLITICS 73.81 2.29 76.65 64.81
RELIGION 74.95 4.21 83.57 53.59
I. Flaounas (University of Bristol) February 22, 2011 8 / 21
The experiment
The goal is to measure two writing style properties of news:
Linguistic Subjectivity
Readability
over different topics and outlets.
Corpus for the Experiment
10 months monitoring, (Jan. 1st, 2010 – Oct 31st, 2010)
498 English-language media
99 different countries
2.5M articles appeared in ‘Main’ feed
I. Flaounas (University of Bristol) February 22, 2011 9 / 21
Articles Annotation
We annotated 926,411 articles, with 1,037,359 tags, an average of 1.12
tags per article.
Topic Articles
ART 42896 MARKETS 24319
BUSINESS 126494 PETROLEUM 21236
CRIME 277626 POLITICS 201776
DISASTERS 83828 RELIGION 34441
ELECTIONS 28656 SCIENCE 10076
ENVIRONMENT 16103 SPORTS 141665
FASHION 1284 WEATHER 8505
INFLATION-PRICES 2331 Total 1037359
I. Flaounas (University of Bristol) February 22, 2011 10 / 21
Linguistic Subjectivity
We measure the number of sentimental adjectives over the total
number of adjectives per article.
We detect adjectives using Stanford POS tagger
We measure sentiment using SentiWordnet.
We characterize an adjective sentimental if either its positive or
negative sentimental score is > 0.25.
10K items per topic randomly selected
I. Flaounas (University of Bristol) February 22, 2011 11 / 21
Validation of Linguistic Subjectivity?
This is a challenge due to miss of a golden standard.
I. Flaounas (University of Bristol) February 22, 2011 12 / 21
Validation of Linguistic Subjectivity?
This is a challenge due to miss of a golden standard.
But we found that:
Editorials and Opinion articles are more linguistically subjective
compared to average.
◮ 5766 Ed/Op articles from 57 different sources.
◮ LS mean value 26.15%(std.dev of 0.29%)
◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%).
I. Flaounas (University of Bristol) February 22, 2011 12 / 21
Validation of Linguistic Subjectivity?
This is a challenge due to miss of a golden standard.
But we found that:
Editorials and Opinion articles are more linguistically subjective
compared to average.
◮ 5766 Ed/Op articles from 57 different sources.
◮ LS mean value 26.15%(std.dev of 0.29%)
◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%).
Popular articles are more linguistically subjective compared to average
(Based on 108,516 popular articles in period of study)
I. Flaounas (University of Bristol) February 22, 2011 12 / 21
Validation of Linguistic Subjectivity?
This is a challenge due to miss of a golden standard.
But we found that:
Editorials and Opinion articles are more linguistically subjective
compared to average.
◮ 5766 Ed/Op articles from 57 different sources.
◮ LS mean value 26.15%(std.dev of 0.29%)
◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%).
Popular articles are more linguistically subjective compared to average
(Based on 108,516 popular articles in period of study)
UK tabloids are more Linguistically Subjective compared to
broadsheets.
I. Flaounas (University of Bristol) February 22, 2011 12 / 21
Linguistic Subjectivity of Topics
0 5 10 15 20 25
POLITICS
ELECTIONS
BUSINESS
SCIENCE
ENVIRONMENT
RELIGION
PETROLEUM
RANDOM
SPORTS
PRICES
MOST POP.
WEATHER
MARKETS
ART
CRIME
DISASTERS
FASHION
I. Flaounas (University of Bristol) February 22, 2011 13 / 21
Readability
We measure readability based on the Flesch Reading Ease Test
FRET(article) = 206.835 − (1.015 · ASL) − 84.6 · ASW
Scores range from 0–100.
The higher the FRET the easier the text to read.
10K items per topic randomly selected
As validation we checked the readability of CBBC Newsround. It was the
most readable set of articles with mean score 62.50.
I. Flaounas (University of Bristol) February 22, 2011 14 / 21
Readability of Topics
0 10 20 30 40 50
POLITICS
ENVIRONMENT
PRICES
SCIENCE
BUSINESS
ELECTIONS
RELIGION
PETROLEUM
CRIME
RANDOM
MARKETS
MOST POP.
DISASTERS
WEATHER
FASHION
ART
SPORTS
I. Flaounas (University of Bristol) February 22, 2011 15 / 21
Readability vs. Linguistic Subjectivity on Topics
14 16 18 20 22 24 26 28
36
38
40
42
44
46
48
50
ART
BUSINESS
ENVIRONMENT
POLITICS
RELIGION
CRIME
DISASTERS
ELECTIONS
FASHION
MARKETS
PETROLEUM
PRICESSCIENCE
SPORTS
WEATHER
Linguistic Subjectivity
Readability
I. Flaounas (University of Bristol) February 22, 2011 16 / 21
Outlets
We compare for Readability and Linguistic Subjectivity of
8 US newspapers
8 UK newspapers (4 Tabloids / 4 Broadsheets)
Newspaper Articles
Chicago Tribune 5477 Daily Mail 24326
Daily News 2212 Daily Mirror 7731
Los Angeles Times 6696 Daily Star 8946
New York Post 32033 Daily Telegraph 22682
NY Times 11508 Independent 43557
The Wall Street Journal 12300 The Guardian 15393
The Washington Post 7228 The Sun 9048
USA Today 6208 The Times 2957
I. Flaounas (University of Bristol) February 22, 2011 17 / 21
Linguistic Subjectivity of Outlets
0 5 10 15 20 25 30
The Wall Str Journal
The Washington Post
USA Today
The Times
Los Angeles Times
NY Times
Daily Telegraph
The Guardian
Chicago Tribune
Daily Star
New York Post
Independent
Daily Mail
Daily News
Daily Mirror
The Sun
I. Flaounas (University of Bristol) February 22, 2011 18 / 21
Readability of Outlets
0 10 20 30 40 50 60
The Guardian
USA Today
Daily Mail
Daily Star
The Washington Post
Los Angeles Times
The Wall Str Journal
Daily News
Daily Telegraph
New York Post
NY Times
The Times
Chicago Tribune
Independent
Daily Mirror
The Sun
I. Flaounas (University of Bristol) February 22, 2011 19 / 21
Readability vs. Linguistic Subjectivity on Outlets
15 20 25 30
30
35
40
45
50
55
60
Chicago Tribune
Daily Mail
Daily Mirror
Daily News
Daily Star
Daily Telegraph
Independent
Los Angeles Times
New York Post
NY Times
The Guardian
The Sun
The Times
The Wall Street Journal
The Washington Post
USA Today
Linguistic Subjectivity
Readability
I. Flaounas (University of Bristol) February 22, 2011 20 / 21
More info and results at: http://mediapatterns.enm.bris.ac.uk
Thank you!
I. Flaounas (University of Bristol) February 22, 2011 21 / 21

More Related Content

Similar to Readability and Linguistic Subjectivity of News

EU REAWATCH: research and innovation policy analysis
EU REAWATCH: research and innovation policy analysisEU REAWATCH: research and innovation policy analysis
EU REAWATCH: research and innovation policy analysisPer Koch
 
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Anne-Wil Harzing
 
The changing journals landscape
The changing journals landscapeThe changing journals landscape
The changing journals landscapeLaura Czerniewicz
 
Different forms of expertise in democratising technological cultures.
Different forms of expertise in democratising technological cultures.Different forms of expertise in democratising technological cultures.
Different forms of expertise in democratising technological cultures.Fondazione Giannino Bassetti
 
References, authors, journals and scientific disciplines underlying the susta...
References, authors, journals and scientific disciplines underlying the susta...References, authors, journals and scientific disciplines underlying the susta...
References, authors, journals and scientific disciplines underlying the susta...Nuno Quental
 
Open access for researchers, policy makers and research managers
Open access  for researchers, policy makers and research managersOpen access  for researchers, policy makers and research managers
Open access for researchers, policy makers and research managersIryna Kuchma
 
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...OpenAIRE
 
Re-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open ScholarshipRe-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open ScholarshipLeslie Chan
 
The Changing Journal Landscape
The Changing Journal Landscape The Changing Journal Landscape
The Changing Journal Landscape Eve Gray
 
Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...
Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...
Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...Charleston Conference
 
Open peer review : Introductuion
Open peer review : Introductuion Open peer review : Introductuion
Open peer review : Introductuion OpenAccessBelgium
 
MC Security Studies Dissertation - Final Submission
MC Security Studies Dissertation - Final SubmissionMC Security Studies Dissertation - Final Submission
MC Security Studies Dissertation - Final SubmissionMarek Cimpl
 
Institutional electronic repositories: a mandate for all researchers
Institutional electronic repositories: a mandate for all researchersInstitutional electronic repositories: a mandate for all researchers
Institutional electronic repositories: a mandate for all researcherscalsi
 

Similar to Readability and Linguistic Subjectivity of News (20)

Georgia ppt 06 08
Georgia ppt 06 08Georgia ppt 06 08
Georgia ppt 06 08
 
EU REAWATCH: research and innovation policy analysis
EU REAWATCH: research and innovation policy analysisEU REAWATCH: research and innovation policy analysis
EU REAWATCH: research and innovation policy analysis
 
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
 
The changing journals landscape
The changing journals landscapeThe changing journals landscape
The changing journals landscape
 
Different forms of expertise in democratising technological cultures.
Different forms of expertise in democratising technological cultures.Different forms of expertise in democratising technological cultures.
Different forms of expertise in democratising technological cultures.
 
References, authors, journals and scientific disciplines underlying the susta...
References, authors, journals and scientific disciplines underlying the susta...References, authors, journals and scientific disciplines underlying the susta...
References, authors, journals and scientific disciplines underlying the susta...
 
ArticoloInglese
ArticoloIngleseArticoloInglese
ArticoloInglese
 
Open access for researchers, policy makers and research managers
Open access  for researchers, policy makers and research managersOpen access  for researchers, policy makers and research managers
Open access for researchers, policy makers and research managers
 
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...
OpenAIRE-COAR conference 2014: Re-imagining the role of institutional reposit...
 
Re-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open ScholarshipRe-imagining the role of Institutional Repository in Open Scholarship
Re-imagining the role of Institutional Repository in Open Scholarship
 
The Changing Journal Landscape
The Changing Journal Landscape The Changing Journal Landscape
The Changing Journal Landscape
 
Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...
Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...
Health Sciences Annual Lively Lunch, handouts by Ramune Kubilius, Galter Heal...
 
Open Science What? Why? For What? How?
Open Science What? Why? For What? How?Open Science What? Why? For What? How?
Open Science What? Why? For What? How?
 
2nd Thematic Conference on Knowledge Commons - Call for papers
2nd Thematic Conference on Knowledge Commons - Call for papers2nd Thematic Conference on Knowledge Commons - Call for papers
2nd Thematic Conference on Knowledge Commons - Call for papers
 
Final report on Ethnocentrism
Final report on EthnocentrismFinal report on Ethnocentrism
Final report on Ethnocentrism
 
Maccallum
MaccallumMaccallum
Maccallum
 
Open peer review : Introductuion
Open peer review : Introductuion Open peer review : Introductuion
Open peer review : Introductuion
 
MC Security Studies Dissertation - Final Submission
MC Security Studies Dissertation - Final SubmissionMC Security Studies Dissertation - Final Submission
MC Security Studies Dissertation - Final Submission
 
Institutional electronic repositories: a mandate for all researchers
Institutional electronic repositories: a mandate for all researchersInstitutional electronic repositories: a mandate for all researchers
Institutional electronic repositories: a mandate for all researchers
 
Scientometrics
Scientometrics Scientometrics
Scientometrics
 

More from Ilias Flaounas

Improving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed BanditsImproving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed BanditsIlias Flaounas
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksIlias Flaounas
 
The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianIlias Flaounas
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereIlias Flaounas
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceIlias Flaounas
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationIlias Flaounas
 
Inference and validation of networks
Inference and validation of networksInference and validation of networks
Inference and validation of networksIlias Flaounas
 

More from Ilias Flaounas (8)

Improving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed BanditsImproving experimentation velocity via Multi-Armed Bandits
Improving experimentation velocity via Multi-Armed Bandits
 
Multi-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricksMulti-Armed Bandits:
 Intro, examples and tricks
Multi-Armed Bandits:
 Intro, examples and tricks
 
The story of the Product Growth team at Atlassian
The story of the Product Growth team at AtlassianThe story of the Product Growth team at Atlassian
The story of the Product Growth team at Atlassian
 
On Storing Big Data
On Storing Big DataOn Storing Big Data
On Storing Big Data
 
Detecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphereDetecting macro-patterns in the EU media sphere
Detecting macro-patterns in the EU media sphere
 
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social IntelligenceCelebrity Watch: Browsing News Content by Exploiting Social Intelligence
Celebrity Watch: Browsing News Content by Exploiting Social Intelligence
 
ECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translationECML/PKDD 2009: Found in translation
ECML/PKDD 2009: Found in translation
 
Inference and validation of networks
Inference and validation of networksInference and validation of networks
Inference and validation of networks
 

Recently uploaded

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangaloreamitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 

Recently uploaded (20)

Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 

Readability and Linguistic Subjectivity of News

  • 1. Readability and Linguistic Subjectivity of News Ilias Flaounas University of Bristol February 22, 2011 I. Flaounas (University of Bristol) February 22, 2011 1 / 21
  • 2. Our research area I. Flaounas (University of Bristol) February 22, 2011 2 / 21
  • 3. Traditional Research on Media few outlets per study (< 10) limited numbers of news-items (few hundreds in best cases) restricted time periods (few days) news-items from a single country’s media manual annotation (‘coding’) commercial databases – and their constrains hypothesis driven research I. Flaounas (University of Bristol) February 22, 2011 3 / 21
  • 4. Research Focus In our research we undertake a large-scale mainstream news-media textual content analysis using automated techniques. I. Flaounas (University of Bristol) February 22, 2011 4 / 21
  • 5. Research Focus In our research we undertake a large-scale mainstream news-media textual content analysis using automated techniques. ‘Mainstream news-media’ since we do not focus on modern online-only news spreading means such as blogs or Twitter. I. Flaounas (University of Bristol) February 22, 2011 4 / 21
  • 6. Research Focus In our research we undertake a large-scale mainstream news-media textual content analysis using automated techniques. ‘Mainstream news-media’ since we do not focus on modern online-only news spreading means such as blogs or Twitter. ‘Textual’ since we use only the textual information of news rather than analysing e.g. images, videos, or speech. I. Flaounas (University of Bristol) February 22, 2011 4 / 21
  • 7. Research Focus In our research we undertake a large-scale mainstream news-media textual content analysis using automated techniques. ‘Mainstream news-media’ since we do not focus on modern online-only news spreading means such as blogs or Twitter. ‘Textual’ since we use only the textual information of news rather than analysing e.g. images, videos, or speech. ‘Large-scale’ since we analyse hundreds of outlets, typically for extended periods of time, involving millions of news items. I. Flaounas (University of Bristol) February 22, 2011 4 / 21
  • 8. Research Focus In our research we undertake a large-scale mainstream news-media textual content analysis using automated techniques. ‘Mainstream news-media’ since we do not focus on modern online-only news spreading means such as blogs or Twitter. ‘Textual’ since we use only the textual information of news rather than analysing e.g. images, videos, or speech. ‘Large-scale’ since we analyse hundreds of outlets, typically for extended periods of time, involving millions of news items. ‘Automated’ in the sense that the analysis is performed by applying Artificial Intelligence techniques rather than using human ‘coders’. In our research data management is also a challenge! I. Flaounas (University of Bristol) February 22, 2011 4 / 21
  • 9. NOAM: News Outlets Analysis & Monitoring system I. Flaounas, O. Ali, M. Turchi, T. Snowsill, F. Nicart, T. De Bie, N. Cristianini: “NOAM: News Outlets Analysis and Monitoring System”, SIGMOD, Accepted for publication, 2011. I. Flaounas (University of Bristol) February 22, 2011 5 / 21
  • 10. Current Status Our corpus in numbers: > 1300 multilingual news sources > 3000 news feeds 133 countries 22 languages > 3 years of continuous monitoring 40K news items per day > 30M news items in total I. Flaounas (University of Bristol) February 22, 2011 6 / 21
  • 11. Support Vector Machines as Topic Taggers We trained 15 topic taggers on 5 years data from: ◮ Reuters ◮ NY Times Typical text preprocessing: Stemming, stop-words removal, TF-IDF Two-class SVMs Cosine similarity Empirical tuning of C parameter per tagger We set decision threshold to get maximum F0.5-Score at the testset I. Flaounas (University of Bristol) February 22, 2011 7 / 21
  • 12. SVM Taggers Trained on Reuters & NY Times corpora Topic F0.5-Score F0.5 Std.Dev. Precision Recall CRIME 78.92 1.51 82.93 66.59 DISASTERS 83.4 3.7 87.69 70.34 ELECTIONS 70.32 8.74 78.99 49.32 FASHION 83.88 18.61 94.61 71.27 INFLATION-PRICES 77.01 3.19 81.45 63.38 MARKETS 92.02 0.32 94.09 84.63 PETROLEUM 70.67 2.78 75.14 58.73 SCIENCE 73.63 5.17 83.72 50.62 SPORTS 97.78 0.5 98.31 95.75 WEATHER 71.43 3.68 82.91 46.84 ART 81.67 1.34 84.9 71.38 BUSINESS 81.16 1.19 86.23 65.87 ENVIRONMENT 64.29 4.26 73.48 43.7 POLITICS 73.81 2.29 76.65 64.81 RELIGION 74.95 4.21 83.57 53.59 I. Flaounas (University of Bristol) February 22, 2011 8 / 21
  • 13. The experiment The goal is to measure two writing style properties of news: Linguistic Subjectivity Readability over different topics and outlets. Corpus for the Experiment 10 months monitoring, (Jan. 1st, 2010 – Oct 31st, 2010) 498 English-language media 99 different countries 2.5M articles appeared in ‘Main’ feed I. Flaounas (University of Bristol) February 22, 2011 9 / 21
  • 14. Articles Annotation We annotated 926,411 articles, with 1,037,359 tags, an average of 1.12 tags per article. Topic Articles ART 42896 MARKETS 24319 BUSINESS 126494 PETROLEUM 21236 CRIME 277626 POLITICS 201776 DISASTERS 83828 RELIGION 34441 ELECTIONS 28656 SCIENCE 10076 ENVIRONMENT 16103 SPORTS 141665 FASHION 1284 WEATHER 8505 INFLATION-PRICES 2331 Total 1037359 I. Flaounas (University of Bristol) February 22, 2011 10 / 21
  • 15. Linguistic Subjectivity We measure the number of sentimental adjectives over the total number of adjectives per article. We detect adjectives using Stanford POS tagger We measure sentiment using SentiWordnet. We characterize an adjective sentimental if either its positive or negative sentimental score is > 0.25. 10K items per topic randomly selected I. Flaounas (University of Bristol) February 22, 2011 11 / 21
  • 16. Validation of Linguistic Subjectivity? This is a challenge due to miss of a golden standard. I. Flaounas (University of Bristol) February 22, 2011 12 / 21
  • 17. Validation of Linguistic Subjectivity? This is a challenge due to miss of a golden standard. But we found that: Editorials and Opinion articles are more linguistically subjective compared to average. ◮ 5766 Ed/Op articles from 57 different sources. ◮ LS mean value 26.15%(std.dev of 0.29%) ◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%). I. Flaounas (University of Bristol) February 22, 2011 12 / 21
  • 18. Validation of Linguistic Subjectivity? This is a challenge due to miss of a golden standard. But we found that: Editorials and Opinion articles are more linguistically subjective compared to average. ◮ 5766 Ed/Op articles from 57 different sources. ◮ LS mean value 26.15%(std.dev of 0.29%) ◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%). Popular articles are more linguistically subjective compared to average (Based on 108,516 popular articles in period of study) I. Flaounas (University of Bristol) February 22, 2011 12 / 21
  • 19. Validation of Linguistic Subjectivity? This is a challenge due to miss of a golden standard. But we found that: Editorials and Opinion articles are more linguistically subjective compared to average. ◮ 5766 Ed/Op articles from 57 different sources. ◮ LS mean value 26.15%(std.dev of 0.29%) ◮ Articles in Main-feed have mean LS of 19.45% (std. dev 0.22%). Popular articles are more linguistically subjective compared to average (Based on 108,516 popular articles in period of study) UK tabloids are more Linguistically Subjective compared to broadsheets. I. Flaounas (University of Bristol) February 22, 2011 12 / 21
  • 20. Linguistic Subjectivity of Topics 0 5 10 15 20 25 POLITICS ELECTIONS BUSINESS SCIENCE ENVIRONMENT RELIGION PETROLEUM RANDOM SPORTS PRICES MOST POP. WEATHER MARKETS ART CRIME DISASTERS FASHION I. Flaounas (University of Bristol) February 22, 2011 13 / 21
  • 21. Readability We measure readability based on the Flesch Reading Ease Test FRET(article) = 206.835 − (1.015 · ASL) − 84.6 · ASW Scores range from 0–100. The higher the FRET the easier the text to read. 10K items per topic randomly selected As validation we checked the readability of CBBC Newsround. It was the most readable set of articles with mean score 62.50. I. Flaounas (University of Bristol) February 22, 2011 14 / 21
  • 22. Readability of Topics 0 10 20 30 40 50 POLITICS ENVIRONMENT PRICES SCIENCE BUSINESS ELECTIONS RELIGION PETROLEUM CRIME RANDOM MARKETS MOST POP. DISASTERS WEATHER FASHION ART SPORTS I. Flaounas (University of Bristol) February 22, 2011 15 / 21
  • 23. Readability vs. Linguistic Subjectivity on Topics 14 16 18 20 22 24 26 28 36 38 40 42 44 46 48 50 ART BUSINESS ENVIRONMENT POLITICS RELIGION CRIME DISASTERS ELECTIONS FASHION MARKETS PETROLEUM PRICESSCIENCE SPORTS WEATHER Linguistic Subjectivity Readability I. Flaounas (University of Bristol) February 22, 2011 16 / 21
  • 24. Outlets We compare for Readability and Linguistic Subjectivity of 8 US newspapers 8 UK newspapers (4 Tabloids / 4 Broadsheets) Newspaper Articles Chicago Tribune 5477 Daily Mail 24326 Daily News 2212 Daily Mirror 7731 Los Angeles Times 6696 Daily Star 8946 New York Post 32033 Daily Telegraph 22682 NY Times 11508 Independent 43557 The Wall Street Journal 12300 The Guardian 15393 The Washington Post 7228 The Sun 9048 USA Today 6208 The Times 2957 I. Flaounas (University of Bristol) February 22, 2011 17 / 21
  • 25. Linguistic Subjectivity of Outlets 0 5 10 15 20 25 30 The Wall Str Journal The Washington Post USA Today The Times Los Angeles Times NY Times Daily Telegraph The Guardian Chicago Tribune Daily Star New York Post Independent Daily Mail Daily News Daily Mirror The Sun I. Flaounas (University of Bristol) February 22, 2011 18 / 21
  • 26. Readability of Outlets 0 10 20 30 40 50 60 The Guardian USA Today Daily Mail Daily Star The Washington Post Los Angeles Times The Wall Str Journal Daily News Daily Telegraph New York Post NY Times The Times Chicago Tribune Independent Daily Mirror The Sun I. Flaounas (University of Bristol) February 22, 2011 19 / 21
  • 27. Readability vs. Linguistic Subjectivity on Outlets 15 20 25 30 30 35 40 45 50 55 60 Chicago Tribune Daily Mail Daily Mirror Daily News Daily Star Daily Telegraph Independent Los Angeles Times New York Post NY Times The Guardian The Sun The Times The Wall Street Journal The Washington Post USA Today Linguistic Subjectivity Readability I. Flaounas (University of Bristol) February 22, 2011 20 / 21
  • 28. More info and results at: http://mediapatterns.enm.bris.ac.uk Thank you! I. Flaounas (University of Bristol) February 22, 2011 21 / 21