SlideShare a Scribd company logo
1 of 12
Picking the NYT Picks:
Editorial Criteria and Automation
in the Curation of Online News Comments
Nicholas Diakopoulos
University of Maryland, College Park – College of Journalism
@ndiakopoulos | nickdiakopoulos.com | nad@umd.edu
“NYT Picks is the most popular comment queue. We
spend a lot of time tweaking that and getting that
right.”
What are criteria for selection?
How can we augment moderator capability to consider more comments?
Criteria from Literature
Negative / Exclusion
Personal attacks, profanity, abusive
behavior
Positive / Inclusion Internal Coherence
Thoughtfulness
Brevity / Length
Relevance
Fairness / Diversity
Novelty
Argument Quality
Criticality
Emotionality
Entertainment Value
Readability
Personal Experience
Crowdsourcing
Argument Quality
Criticality
Emotionality
Entertainment Value
Readability
Personal Experience
Internal Coherence
Thoughtfulness
Brevity / Length
Relevance
Fairness / Diversity
Novelty
RQ1: Do “NYT Picks” comments reflect positive
editorial criteria identified in literature?
Automation
Argument Quality
Criticality
Emotionality
Entertainment Value
Readability
Personal Experience
Internal Coherence
Thoughtfulness
Brevity / Length
Relevance
Fairness / Diversity
Novelty
RQ2: Can algorithmic approaches to assessing criteria
be developed?
Automated scores point towards scalable
opportunities for moderation and UX…
But automation also raises questions about
over-generalization across contexts, and
algorithmic transparency
Questions?
Contact
Nick Diakopoulos
University of Maryland, College of Journalism
Twitter: @ndiakopoulos
Email: nad@umd.edu
Web: http://www.nickdiakopoulos.com
More Info
N. Diakopoulos. The Editor’s Eye: Curation and Comment
Relevance on the New York Times. Proc. CSCW. March,
2015.

More Related Content

Viewers also liked (20)

Tania
TaniaTania
Tania
 
Richard Tofel
Richard TofelRichard Tofel
Richard Tofel
 
Mensing
MensingMensing
Mensing
 
Lloyd
LloydLloyd
Lloyd
 
Poplin
PoplinPoplin
Poplin
 
Moro
MoroMoro
Moro
 
Hinsley
HinsleyHinsley
Hinsley
 
Singer
SingerSinger
Singer
 
Andrei Dynko
Andrei DynkoAndrei Dynko
Andrei Dynko
 
Grueskin austin
Grueskin austinGrueskin austin
Grueskin austin
 
Mario
MarioMario
Mario
 
Marymont
MarymontMarymont
Marymont
 
West
WestWest
West
 
Downey
DowneyDowney
Downey
 
Financialindependence
FinancialindependenceFinancialindependence
Financialindependence
 
Kelly G. Niknejad
Kelly G. NiknejadKelly G. Niknejad
Kelly G. Niknejad
 
Pmfail
PmfailPmfail
Pmfail
 
Perrin
PerrinPerrin
Perrin
 
Ramshaw
RamshawRamshaw
Ramshaw
 
Lu Wu
Lu WuLu Wu
Lu Wu
 

Similar to Diakopoulos

5 io employee selection
5 io employee selection5 io employee selection
5 io employee selection
Harve Abella
 
5 io employee selection
5 io employee selection5 io employee selection
5 io employee selection
Harve Abella
 
DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...
DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...
DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...
IL Group (CILIP Information Literacy Group)
 
Presentation for Doctoral Consortium at UMAP'11
Presentation for Doctoral Consortium at UMAP'11Presentation for Doctoral Consortium at UMAP'11
Presentation for Doctoral Consortium at UMAP'11
Thieme Hennis
 

Similar to Diakopoulos (20)

Pragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientistsPragmatic ethical and fair AI for data scientists
Pragmatic ethical and fair AI for data scientists
 
Chap013
Chap013Chap013
Chap013
 
SocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meetingSocialCite makes its debut at the HighWire Press meeting
SocialCite makes its debut at the HighWire Press meeting
 
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIERA NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
A NOVEL APPROACH FOR TWITTER SENTIMENT ANALYSIS USING HYBRID CLASSIFIER
 
Maa250 assignment 2 ethics and financial services trimester
Maa250 assignment 2 ethics and financial services trimester Maa250 assignment 2 ethics and financial services trimester
Maa250 assignment 2 ethics and financial services trimester
 
presentatie Reputation Management & workshop PhD community
presentatie Reputation Management & workshop PhD community presentatie Reputation Management & workshop PhD community
presentatie Reputation Management & workshop PhD community
 
Qualitative analysis
Qualitative analysisQualitative analysis
Qualitative analysis
 
Critical Thinking Assessment Model.pptx
Critical Thinking Assessment Model.pptxCritical Thinking Assessment Model.pptx
Critical Thinking Assessment Model.pptx
 
SELECTION/HRM
SELECTION/HRMSELECTION/HRM
SELECTION/HRM
 
Unlocking Potential: A Guide to Psychometric Assessment Tools
Unlocking Potential: A Guide to Psychometric Assessment ToolsUnlocking Potential: A Guide to Psychometric Assessment Tools
Unlocking Potential: A Guide to Psychometric Assessment Tools
 
Qual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in BetweenQual, Mixed, Machine and Everything in Between
Qual, Mixed, Machine and Everything in Between
 
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015On serendipity in recommender systems - Haifa RecSoc workshop june 2015
On serendipity in recommender systems - Haifa RecSoc workshop june 2015
 
Proposal
ProposalProposal
Proposal
 
Tammaro ircdl 2013
Tammaro ircdl 2013Tammaro ircdl 2013
Tammaro ircdl 2013
 
5 io employee selection
5 io employee selection5 io employee selection
5 io employee selection
 
5 io employee selection
5 io employee selection5 io employee selection
5 io employee selection
 
Through the eyes of the editor: nursing research
Through the eyes of the editor: nursing researchThrough the eyes of the editor: nursing research
Through the eyes of the editor: nursing research
 
Diamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) AnalysisDiamonds in the Rough (Sentiment(al) Analysis
Diamonds in the Rough (Sentiment(al) Analysis
 
DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...
DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...
DeJoy Miller & Oberdick - Disciplinary literacy – a context for learning crit...
 
Presentation for Doctoral Consortium at UMAP'11
Presentation for Doctoral Consortium at UMAP'11Presentation for Doctoral Consortium at UMAP'11
Presentation for Doctoral Consortium at UMAP'11
 

More from Knight Center (20)

Martin
MartinMartin
Martin
 
Britt
BrittBritt
Britt
 
Joseph yoo
Joseph yooJoseph yoo
Joseph yoo
 
Ramirez
RamirezRamirez
Ramirez
 
Griggs
GriggsGriggs
Griggs
 
Ting tingchia
Ting tingchiaTing tingchia
Ting tingchia
 
Symson
SymsonSymson
Symson
 
Garcia ruiz
Garcia ruizGarcia ruiz
Garcia ruiz
 
Brundrett. 2015
Brundrett. 2015Brundrett. 2015
Brundrett. 2015
 
J moroney
J moroneyJ moroney
J moroney
 
Collins
CollinsCollins
Collins
 
Ray
RayRay
Ray
 
Owen
OwenOwen
Owen
 
Royal blasingame
Royal blasingameRoyal blasingame
Royal blasingame
 
Scacco
ScaccoScacco
Scacco
 
Havlak
HavlakHavlak
Havlak
 
Lee
LeeLee
Lee
 
Hernandez
HernandezHernandez
Hernandez
 
Robins
RobinsRobins
Robins
 
Witt el al
Witt el alWitt el al
Witt el al
 

Recently uploaded

Forbes Senior Contributor Billy Bambrough Master of Tech Journalism.pdf
Forbes Senior Contributor Billy Bambrough  Master of Tech Journalism.pdfForbes Senior Contributor Billy Bambrough  Master of Tech Journalism.pdf
Forbes Senior Contributor Billy Bambrough Master of Tech Journalism.pdf
UK Journal
 

Recently uploaded (8)

Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Musina
Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In MusinaTop^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Musina
Top^Clinic ^%[+27785538335__Safe*Women's clinic//Abortion Pills In Musina
 
Press-Information-Bureau-14-given-citizenship.pdf
Press-Information-Bureau-14-given-citizenship.pdfPress-Information-Bureau-14-given-citizenship.pdf
Press-Information-Bureau-14-given-citizenship.pdf
 
19052024_First India Newspaper Jaipur.pdf
19052024_First India Newspaper Jaipur.pdf19052024_First India Newspaper Jaipur.pdf
19052024_First India Newspaper Jaipur.pdf
 
Forbes Senior Contributor Billy Bambrough Master of Tech Journalism.pdf
Forbes Senior Contributor Billy Bambrough  Master of Tech Journalism.pdfForbes Senior Contributor Billy Bambrough  Master of Tech Journalism.pdf
Forbes Senior Contributor Billy Bambrough Master of Tech Journalism.pdf
 
Analyzing Nepal's Third Investment Summit.pdf
Analyzing Nepal's Third Investment Summit.pdfAnalyzing Nepal's Third Investment Summit.pdf
Analyzing Nepal's Third Investment Summit.pdf
 
13052024_First India Newspaper Jaipur.pdf
13052024_First India Newspaper Jaipur.pdf13052024_First India Newspaper Jaipur.pdf
13052024_First India Newspaper Jaipur.pdf
 
Decentralisation and local government in India
Decentralisation and local government in IndiaDecentralisation and local government in India
Decentralisation and local government in India
 
Income Tax Regime Dilemma – New VS. Old pdf
Income Tax Regime Dilemma – New VS. Old pdfIncome Tax Regime Dilemma – New VS. Old pdf
Income Tax Regime Dilemma – New VS. Old pdf
 

Diakopoulos

  • 1. Picking the NYT Picks: Editorial Criteria and Automation in the Curation of Online News Comments Nicholas Diakopoulos University of Maryland, College Park – College of Journalism @ndiakopoulos | nickdiakopoulos.com | nad@umd.edu
  • 2.
  • 3.
  • 4. “NYT Picks is the most popular comment queue. We spend a lot of time tweaking that and getting that right.” What are criteria for selection? How can we augment moderator capability to consider more comments?
  • 5. Criteria from Literature Negative / Exclusion Personal attacks, profanity, abusive behavior Positive / Inclusion Internal Coherence Thoughtfulness Brevity / Length Relevance Fairness / Diversity Novelty Argument Quality Criticality Emotionality Entertainment Value Readability Personal Experience
  • 6. Crowdsourcing Argument Quality Criticality Emotionality Entertainment Value Readability Personal Experience Internal Coherence Thoughtfulness Brevity / Length Relevance Fairness / Diversity Novelty RQ1: Do “NYT Picks” comments reflect positive editorial criteria identified in literature?
  • 7. Automation Argument Quality Criticality Emotionality Entertainment Value Readability Personal Experience Internal Coherence Thoughtfulness Brevity / Length Relevance Fairness / Diversity Novelty RQ2: Can algorithmic approaches to assessing criteria be developed?
  • 8.
  • 9.
  • 10. Automated scores point towards scalable opportunities for moderation and UX…
  • 11. But automation also raises questions about over-generalization across contexts, and algorithmic transparency
  • 12. Questions? Contact Nick Diakopoulos University of Maryland, College of Journalism Twitter: @ndiakopoulos Email: nad@umd.edu Web: http://www.nickdiakopoulos.com More Info N. Diakopoulos. The Editor’s Eye: Curation and Comment Relevance on the New York Times. Proc. CSCW. March, 2015.

Editor's Notes

  1. On September 11, 2013, Vladamir Putin published an op-ed in the NYT. Among other things, he questioned american exceptionalism – and if there’s one thing you shouldn’t do in ‘merica it’s that. He was prodding the american public. In response, comments flooded in, 6,367 of them in fact. Of those 4,447 were published along with the piece.
  2. How could you possibly organize thousands of comments and find the interesting or insightful ones? Like other commenting systems users can vote up a comment by recommending it. Comments are sorts by oldest first, or they can be filtered by their recommendation scores. . The published comments included 85 of which were deemed NYT Picks, which garner a little badge and reflect the “most interesting and thoughtful” comments. What makes this most impressive though is that each of those comments was read by a human moderator, a trained journalist, at NYT before being published. That it, the NYT practices pre-moderation, in comparison to many other publications which only look at comments after they’re published.
  3. In fact they’re read by a team led by Bassey Etim, the community manager at NYT. Together with him team of 13 moderators, they read almost every comment before it’s posted to the site. Part of that job is choosing the NYT Picks comments. “NYT Picks is the most popular comment queue. Spend a lot of time tweaking that and getting that right.” As a baseline they’re looking for about 5 picks per 100 comments. Outside of blogs they do about 22 queues a day, but they’d like to open comments on more articles. So how could we help them scale up? Talk about the potential benefits of selecting comments: signals norms and expectations for behavior, creating a beneficial feedback loop.
  4. Positive criteria considered in the literature from studies of: Letters to the editor, online comments for print, on-air radio comments Readability: style, clarity, adherence to standard grammar, degree it’s well-articulated. Stress that operationalizing these is hard and there are many challenges for future work.
  5. The focus of this work is initially on crowdsourcing ratings for 9 of these dimensions, so excluding relevance, fairness, and novelty since they are much more difficult to measure using crowdsourcing, and also I have a previous paper that looked at relevance explicitly. The crowdsourcing approach collected human ratings of 8 of the 9 criteria here (b/c length is trival to measure by counting words). 500 comments 250 each of NYT Picks and non picks. Rated on a scale from 1 to 5, collected on Amazon Mechanical Turk. 3 independent ratings of each comment. Restricted to workers with reliable history, and substantial history, and from US or Canada. Collected 1500 ratings from 89 different workers. We measured the Krippendorf’s alpha which is a measure of the interrater reliability and got slight to moderate agreement among the 3 raters except for entertainment value (so people couldn’t agree on what was funny).
  6. Eventually would like to compute scores for all of these criteria automatically, but for now we do three of them. Readability is the reading level according to the SMOG index, and index that measures the usage of more complex words. There was a high correlation between the SMOG index and the crowdsourced ratings of readability. Personal experience is based on detecting the proportion of words from LIWC dictionaries that reflect 1st person personal pronouns as well as family and friends relationships. Comment tokens are stemmed to match the dictionary
  7. So I found a statistically sig diff for all criteria except entertaining and emotionality, and emotionality was actually sig at p=0.08. Several of these criteria also correlated fairly well, such as thoughtfulness and readability, and argument quality with thoughtfulness. Future work might look at scaling up the data collection and looking at dimensionality reduction techniques
  8. All stat sig at p =0.05 or lower.
  9. Editorial selections (NYT Picks) do reflect many of the editorial criteria articulated in the literature. Continuity of professional criteria into online space (except brevity) Online spaces don’t have same space constraints and we found NYT editors preferred longer comments for Picks. Raises question of how well this serves users from their perspective. The scores we computed, in particular the personal experience score could have some really nice applications for amplifying the value of comments for moderators, as well as reporters. In some follow-up work we’ve shown the this to comment moderators and they’re excited about the possibilities. Automation could also enable new end-user experiences, where users adapt their own view of the comments based on automatically computed scores along journalistically interesting lines.
  10. Over-generalization … diff communities, or topics (e.g. sports) require different treatment, so algorithmic solutions can’t be one-size-fits-all. Is it always better to high a highly readable comment, and when does that come into tension with diversity or fairness of perspectives. Do Picks affect community or individual behavior?
  11. Mention CommentIQ project at UMD, funded by the Knight Foundation We’re going to be hiring a fellow or fellows, so if you’re interested in joining the lab, please come speak to me. We work on everything from data visualization to algorithmic accountability and transparency, as well as data mining things like online comments. If you want to combine data and computing, with design, in the context of journalism, please come talk.