Your SlideShare is downloading. ×
Finding News Curators in Twitter
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Finding News Curators in Twitter


Published on

Users interact with online news in many ways, one of them being sharing content through online social networking sites such as Twitter. There is a small but important group of users that devote a …

Users interact with online news in many ways, one of them being sharing content through online social networking sites such as Twitter. There is a small but important group of users that devote a substantial amount of effort and care to this activity. These users monitor a large variety of sources on a topic or around a story, carefully select interesting material on this topic, and disseminate it to an interested audience ranging from thousands to millions. These users are news curators, and are the main subject of study of this paper. We adopt the perspective of a journalist or news editor who wants to discover news curators among the audience engaged with a news site.
We look at the users who shared a news story on Twitter and attempt to identify news curators who may provide more information related to that story. In this paper we describe how to find this specific class of curators, which we refer to as news story curators. Hence, we proceed to compute a set of features for each user, and demonstrate that they can be used to automatically find relevant curators among the audience of two large news organizations.

This presentation is part of the SNOW workshop of the World-Wide-Web Conference, held in Rio to Janeiro, May 2013.

Published in: Technology, News & Politics
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ethan ZuckermanFinding News Curators in Twitter
  • 2. Outline¨  Motivation¨  Types of curators¨  Labeling news story curators¨  Automatically finding news story curators¨  Conclusion and future work2Photo credit (first slide): Hobvias Sudoneighm (CC-BY).
  • 3. Motivation¨  Twitter has become a powerful tool for the aggregation and consumption of time-sensitive content in general and news in particular.¨  Journalists use online social media platforms (Twitter, Facebook and others) andblogs to elicit other story angles or verify stories they are working on.To what extend the community of engaged readers - those whoshare news articles in social media – can contribute to the journalistic process?What kind of roles people play when sharing news?We want to detect users that provide further relevantinformation to a news story. We call them news story curators.3
  • 4. ExampleAl Jazeera English news article about the civil war in Syria“Syria allows UN to step up food aid” [16 Jan 2013]Users that posted the article in TwitterWhom would you follow to find out more about the civil war in Syria?4#Followers Is tweeting about@RevolutionSyria 88,122 Syria@KenanFreeSyria 13,388 Syria@UP_food 703 Food@KeriJSmith 8,838 Breaking news/top stories@BreakingNews 5,662,866 Breaking news/top stories
  • 5. Types of news story curatorsHuman AutomaticTopic-unfocusedTopic-unfocused curatorDisseminating news articles aboutdiverse topics, usually breakingnews/top storiesà @KeriJSmithNews aggregatorsCollecting news articles (e.g. fromRSS feeds) and automatically posttheir corresponding headlines andURLsà @BreakingNewsTopic-focusedTopic-focused curatorCollecting interesting informationwith a specific focus, usually ageographic region or a topicà @KenanFreeSyriaTopic-focused aggregatorsDisseminating automatically newswith topical focusà @UP_food, @RevolutionSyria5
  • 6. Types of news story curatorsHuman AutomaticTopic-unfocusedTopic-focusedTopic-focused curatorCollecting interesting informationwith a specific focus, usually ageographic region or a topicà @KenanFreeSyriaTopic-focused aggregatorsDisseminating automatically newswith topical focusà @UP_food, @RevolutionSyriaValuable curators fora specific storyThese curators are probablyless or not valuable6
  • 7. Data setsStep 1: Selection of news articles¨  News articles published in early 2013 from¤  BBC World Service [BBC] 75 articles¤  Al Jazeera English [AJE] 155 articles¨  Stories: Obamas inauguration, Mali conflict, Pollution in Beijing, etc.Step 2 : News crowd detection¨  All users who tweeted the article within the first 6 hours afterpublicationStep 3: User characteristics¨  Extraction of data from each user in the news crowd (e.g. furthertweets, profile information)7
  • 8. LabelingNews Story Curators8Photocredit:ThomasLeuthard(CCBY).
  • 9. Labeling tasksData¨  Sample of 20 news articles¨  For each news article, a sample of 10 users who posted the article¨  We shown to three assessors:¤  The title of the news article and a sample of tweets of the user¤  Profile description and the number of followers of the userLabeling-Questions9Q1) Please indicate whether the user is interested oran expert of the topic of the article story:Yes: Most of her/his tweets relate to the topic of the story (e.g.the article is about the conflict in Syria, she/he is often tweetingabout the conflict in Syria).Maybe: Many of her/his tweets relate to the topic of the story orshe/he is interested in a related topic (e.g. the article is about theconflict in Syria, she/he is tweeting about armed conflicts or theArabic world).No: She/he is not tweeting about the topic of the story.Unknown: Based on the information of the user it was notpossible to label her/him.Q2) Please indicate whether the user is a human orgenerates tweets automatically:Human: The user has conversations and personal comments in hisor her tweets. The text of tweets that have URLs (e.g. to newsarticles) seems self-written and contain user own opinions.Maybe automatic: The Twitter user has characteristics of anautomatic profile, but she/he could be human as well.Automatic: The tweet stream of the user looks automaticallygenerated. The tweets contain only headlines and URLs of newsarticles.Unknown: Based on the information of the user it was notpossible to label her/him as human or automatic.
  • 10. Resulting training setInterested?(topic-focused)Human or Automatic? Interested+ humann yes no n human automaticAJE 63 21% 79% 71 55% 45% 13%BBC 58 3% 54% 54 35% 65% 1.8%many users aretopic-unfocused and automatic10We considered only users for which at least two annotators provided a decisive label(Yes or No, Human or Automatic)
  • 11. AutomaticallyfindingNews Story Curators11Photocredit:MadsIversen(CCBY-NC-SA).
  • 12. FeaturesVisibility• Number of followers• Number of Twitter lists with userTweeting activity• Number of tweets per day• Fraction of tweets that contains a re-tweet mark "RT", a URL, a usermention or a hashtagTopic focus• Number of crowds the user belongs to• Number of distinct article sections of the crowds (e.g. sports, business) theuser belongs to12
  • 13. Simple modelsUserIsHumanUserFracURL >= 0.85automatic,otherwise humanModelHuman class:Prec/Rec: 0.85AUC: 0.81EvaluationUserIsInterestedInStoryUserSectionsQ >= 0.9not-interested,otherwise interestedModelInterested class:Prec: 0.48 / Rec: 0.93AUC: 0.83EvaluationPreselectionThe user must have•  At least 1,000 followers•  Posted an article that is estimated related to the original article [1]13[1] J. Lehmann, C. Castillo, M. Lalmas, and E. Zuckerman. Transient news crowds in social media. In ICWSM, 2013.feature (one) selection + random forest algorithm
  • 14. Complex modelsPrecision Recall AUCAutomatic 0.88 0.84 0.93Human 0.82 0.86 0.93Interested 0.95 0.92 0.90Not-interested 0.53 0.67 0.90random forest withinformation-gain-basedfeature selectionrandom forest withasymmetric misclassification costsfalse negatives (classifying an interested useras not interested) were considered 5 times morecostly than false positives14
  • 15. Precision-oriented evaluationWe compared our method with two baseline approaches¨  Users with the largest number of followers [FOLLOWER-APPROACH]¨  Users with the largest number of stories detected as related to the original one [STORY-APPROACH]Data¨  Sample of 20 news articles that had at least one curator, detected using the complex modelwith a confidence value >= 0.75¨  We extracted for each article the same number of possible curators using the other twoapproaches¨  We asked three assessors to evaluate the results(question Q1 – UserIsInterestedInStory)¨  About 210 labels for 70 units were collectedResultstrue positive/false positiveFOLLOWER-APPROACH: 2/18 = 11%STORY-APPROACH: 5/20 = 25%OUR APPROACH: 6/16 = 38%15
  • 16. Conclusion and future workWe were able to detect and model news story curators, who (could and maybe are)play an important role in the news ecosystem; not only for news readers,but for journalists and editors.¨  A large amount of activity on Twitter is automatic and some of these newsaggregators can be considered to be good curators¨  Mostly the attention of the user is quickly shifting away - posting a link does nothave to reflect a long-standing interest on the subject of the linkFuture work¨  Adding other (Twitter) variables to the system that capture, for instance,interestingness and serendipity¨  Application on other news providers¨  Analysis of the functionality of popular news aggregators, which are comparable toRSS feeds16
  • 17. Questions and Discussion…17Janette LehmannUniversitat Pompeu Fabrajnt.lehmann@gmail.comCarlos CastilloQatar Computing ResearchInstitutechato@acm.orgMounia LalmasYahoo! Labsmounia@acm.orgEthan ZuckermanMIT Center for Civic credits: Hobvias Sudoneighm (CC BY), Thomas Leuthard (CC BY), Mads Iversen (CC BY-NC-SA), Wayne Large (CC BY-ND)