SlideShare a Scribd company logo
1 of 17
Download to read offline
Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ethan Zuckerman
Finding News Curators in Twitter
Outline
¨  Motivation
¨  Types of curators
¨  Labeling news story curators
¨  Automatically finding news story curators
¨  Conclusion and future work
2
Photo credit (first slide): Hobvias Sudoneighm (CC-BY).
Motivation
¨  Twitter has become a powerful tool for the aggregation and consumption of time-
sensitive content in general and news in particular.
¨  Journalists use online social media platforms (Twitter, Facebook and others) and
blogs to elicit other story angles or verify stories they are working on.
To what extend the community of engaged readers - those who
share news articles in social media – can contribute to the journalistic process?
What kind of roles people play when sharing news?
We want to detect users that provide further relevant
information to a news story. We call them news story curators.
3
Example
Al Jazeera English news article about the civil war in Syria
“Syria allows UN to step up food aid” [16 Jan 2013]
Users that posted the article in Twitter
Whom would you follow to find out more about the civil war in Syria?
4
#Followers Is tweeting about
@RevolutionSyria 88,122 Syria
@KenanFreeSyria 13,388 Syria
@UP_food 703 Food
@KeriJSmith 8,838 Breaking news/top stories
@BreakingNews 5,662,866 Breaking news/top stories
Types of news story curators
Human Automatic
Topic-
unfocused
Topic-unfocused curator
Disseminating news articles about
diverse topics, usually breaking
news/top stories
à @KeriJSmith
News aggregators
Collecting news articles (e.g. from
RSS feeds) and automatically post
their corresponding headlines and
URLs
à @BreakingNews
Topic-
focused
Topic-focused curator
Collecting interesting information
with a specific focus, usually a
geographic region or a topic
à @KenanFreeSyria
Topic-focused aggregators
Disseminating automatically news
with topical focus
à @UP_food, @RevolutionSyria
5
Types of news story curators
Human Automatic
Topic-
unfocused
Topic-
focused
Topic-focused curator
Collecting interesting information
with a specific focus, usually a
geographic region or a topic
à @KenanFreeSyria
Topic-focused aggregators
Disseminating automatically news
with topical focus
à @UP_food, @RevolutionSyria
Valuable curators for
a specific story
These curators are probably
less or not valuable
6
Data sets
Step 1: Selection of news articles
¨  News articles published in early 2013 from
¤  BBC World Service [BBC] 75 articles
¤  Al Jazeera English [AJE] 155 articles
¨  Stories: Obama's inauguration, Mali conflict, Pollution in Beijing, etc.
Step 2 : News crowd detection
¨  All users who tweeted the article within the first 6 hours after
publication
Step 3: User characteristics
¨  Extraction of data from each user in the news crowd (e.g. further
tweets, profile information)
7
Labeling
News Story Curators
8
Photocredit:ThomasLeuthard(CCBY).
Labeling tasks
Data
¨  Sample of 20 news articles
¨  For each news article, a sample of 10 users who posted the article
¨  We shown to three assessors:
¤  The title of the news article and a sample of tweets of the user
¤  Profile description and the number of followers of the user
Labeling-Questions
9
Q1) Please indicate whether the user is interested or
an expert of the topic of the article story:
Yes: Most of her/his tweets relate to the topic of the story (e.g.
the article is about the conflict in Syria, she/he is often tweeting
about the conflict in Syria).
Maybe: Many of her/his tweets relate to the topic of the story or
she/he is interested in a related topic (e.g. the article is about the
conflict in Syria, she/he is tweeting about armed conflicts or the
Arabic world).
No: She/he is not tweeting about the topic of the story.
Unknown: Based on the information of the user it was not
possible to label her/him.
Q2) Please indicate whether the user is a human or
generates tweets automatically:
Human: The user has conversations and personal comments in his
or her tweets. The text of tweets that have URLs (e.g. to news
articles) seems self-written and contain user own opinions.
Maybe automatic: The Twitter user has characteristics of an
automatic profile, but she/he could be human as well.
Automatic: The tweet stream of the user looks automatically
generated. The tweets contain only headlines and URLs of news
articles.
Unknown: Based on the information of the user it was not
possible to label her/him as human or automatic.
Resulting training set
Interested?
(topic-focused)
Human or Automatic? Interested
+ human
n yes no n human automatic
AJE 63 21% 79% 71 55% 45% 13%
BBC 58 3% 54% 54 35% 65% 1.8%
many users are
topic-unfocused and automatic
10
We considered only users for which at least two annotators provided a decisive label
(Yes or No, Human or Automatic)
Automatically
finding
News Story Curators
11
Photocredit:MadsIversen(CCBY-NC-SA).
Features
Visibility
• Number of followers
• Number of Twitter lists with user
Tweeting activity
• Number of tweets per day
• Fraction of tweets that contains a re-tweet mark "RT", a URL, a user
mention or a hashtag
Topic focus
• Number of crowds the user belongs to
• Number of distinct article sections of the crowds (e.g. sports, business) the
user belongs to
12
Simple models
UserIsHuman
UserFracURL >= 0.85
automatic,
otherwise human
Model
Human class:
Prec/Rec: 0.85
AUC: 0.81
Evaluation
UserIsInterestedInStory
UserSectionsQ >= 0.9
not-interested,
otherwise interested
Model
Interested class:
Prec: 0.48 / Rec: 0.93
AUC: 0.83
Evaluation
Preselection
The user must have
•  At least 1,000 followers
•  Posted an article that is estimated related to the original article [1]
13
[1] J. Lehmann, C. Castillo, M. Lalmas, and E. Zuckerman. Transient news crowds in social media. In ICWSM, 2013.
feature (one) selection + random forest algorithm
Complex models
Precision Recall AUC
Automatic 0.88 0.84 0.93
Human 0.82 0.86 0.93
Interested 0.95 0.92 0.90
Not-interested 0.53 0.67 0.90
random forest with
information-gain-based
feature selection
random forest with
asymmetric misclassification costs
false negatives (classifying an interested user
as not interested) were considered 5 times more
costly than false positives
14
Precision-oriented evaluation
We compared our method with two baseline approaches
¨  Users with the largest number of followers [FOLLOWER-APPROACH]
¨  Users with the largest number of stories detected as related to the original one [STORY-APPROACH]
Data
¨  Sample of 20 news articles that had at least one curator, detected using the complex model
with a confidence value >= 0.75
¨  We extracted for each article the same number of possible curators using the other two
approaches
¨  We asked three assessors to evaluate the results
(question Q1 – UserIsInterestedInStory)
¨  About 210 labels for 70 units were collected
Results
true positive/false positive
FOLLOWER-APPROACH: 2/18 = 11%
STORY-APPROACH: 5/20 = 25%
OUR APPROACH: 6/16 = 38%
15
Conclusion and future work
We were able to detect and model news story curators, who (could and maybe are)
play an important role in the news ecosystem; not only for news readers,
but for journalists and editors.
¨  A large amount of activity on Twitter is automatic and some of these news
aggregators can be considered to be good curators
¨  Mostly the attention of the user is quickly shifting away - posting a link does not
have to reflect a long-standing interest on the subject of the link
Future work
¨  Adding other (Twitter) variables to the system that capture, for instance,
interestingness and serendipity
¨  Application on other news providers
¨  Analysis of the functionality of popular news aggregators, which are comparable to
RSS feeds
16
Questions and Discussion…
17
Janette Lehmann
Universitat Pompeu Fabra
jnt.lehmann@gmail.com
Carlos Castillo
Qatar Computing Research
Institute
chato@acm.org
Mounia Lalmas
Yahoo! Labs
mounia@acm.org
Ethan Zuckerman
MIT Center for Civic Media
ethanz@media.mit.edu
Photocredit:WayneLarge(CC-BY-ND).
Photo credits: Hobvias Sudoneighm (CC BY), Thomas Leuthard (CC BY), Mads Iversen (CC BY-NC-SA), Wayne Large (CC BY-ND)

More Related Content

Similar to Finding News Curators in Twitter: Automatic Detection of Topic-Focused Curators

Chapter 6: Social Media Metrics and Analytics
Chapter 6: Social Media Metrics and AnalyticsChapter 6: Social Media Metrics and Analytics
Chapter 6: Social Media Metrics and AnalyticsZakey Peterson
 
Chapter 6 Presentation
Chapter 6 Presentation Chapter 6 Presentation
Chapter 6 Presentation sancheev1
 
Twitter analytics for sports bloggers
Twitter analytics for sports bloggersTwitter analytics for sports bloggers
Twitter analytics for sports bloggersAmanda Sturgill
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAanargha gangadharan
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAMary Lis Joseph
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATAParvathy Devaraj
 
Altmetric: Getting Started with Article-Level Metrics
Altmetric: Getting Started with Article-Level MetricsAltmetric: Getting Started with Article-Level Metrics
Altmetric: Getting Started with Article-Level MetricsAltmetric
 
Social Media for Marketing: An Analysis of Digg.com Engagement and User Behavior
Social Media for Marketing: An Analysis of Digg.com Engagement and User BehaviorSocial Media for Marketing: An Analysis of Digg.com Engagement and User Behavior
Social Media for Marketing: An Analysis of Digg.com Engagement and User BehaviorTyler Pace
 
Social media / professional use of Twitter
Social media / professional use of TwitterSocial media / professional use of Twitter
Social media / professional use of TwitterMindy McAdams
 
Liminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of TwitterLiminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of TwitterJana Herwig
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service iiKan-Han (John) Lu
 
Academic social club
Academic social clubAcademic social club
Academic social clubscharrlibrary
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET Journal
 
OTOinsights "An Analysis of Digg.com Engagement and User Behavior"
OTOinsights "An Analysis of Digg.com Engagement and User Behavior"OTOinsights "An Analysis of Digg.com Engagement and User Behavior"
OTOinsights "An Analysis of Digg.com Engagement and User Behavior"One to One
 
Best practice - digital and social media
Best practice - digital and social mediaBest practice - digital and social media
Best practice - digital and social mediatechUK
 
Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Gonzalo Martín
 

Similar to Finding News Curators in Twitter: Automatic Detection of Topic-Focused Curators (20)

Chapter 6: Social Media Metrics and Analytics
Chapter 6: Social Media Metrics and AnalyticsChapter 6: Social Media Metrics and Analytics
Chapter 6: Social Media Metrics and Analytics
 
Chapter 6 Presentation
Chapter 6 Presentation Chapter 6 Presentation
Chapter 6 Presentation
 
Chapter 6
Chapter 6Chapter 6
Chapter 6
 
Twitter analytics for sports bloggers
Twitter analytics for sports bloggersTwitter analytics for sports bloggers
Twitter analytics for sports bloggers
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATAREAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
REAL TIME SENTIMENT ANALYSIS OF TWITTER DATA
 
SENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATASENTIMENT ANALYSIS OF TWITTER DATA
SENTIMENT ANALYSIS OF TWITTER DATA
 
Altmetric: Getting Started with Article-Level Metrics
Altmetric: Getting Started with Article-Level MetricsAltmetric: Getting Started with Article-Level Metrics
Altmetric: Getting Started with Article-Level Metrics
 
Social Media on a (Time) Budget from 2010 WTPA Convention
Social Media on a (Time) Budget from 2010 WTPA ConventionSocial Media on a (Time) Budget from 2010 WTPA Convention
Social Media on a (Time) Budget from 2010 WTPA Convention
 
Cca presentation
Cca presentationCca presentation
Cca presentation
 
Social Media for Marketing: An Analysis of Digg.com Engagement and User Behavior
Social Media for Marketing: An Analysis of Digg.com Engagement and User BehaviorSocial Media for Marketing: An Analysis of Digg.com Engagement and User Behavior
Social Media for Marketing: An Analysis of Digg.com Engagement and User Behavior
 
Social media / professional use of Twitter
Social media / professional use of TwitterSocial media / professional use of Twitter
Social media / professional use of Twitter
 
Liminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of TwitterLiminality and Communitas in Social Media: The Case of Twitter
Liminality and Communitas in Social Media: The Case of Twitter
 
Twitter as a personalizable information service ii
Twitter as a personalizable information service iiTwitter as a personalizable information service ii
Twitter as a personalizable information service ii
 
Academic Social Club
Academic Social ClubAcademic Social Club
Academic Social Club
 
Academic social club
Academic social clubAcademic social club
Academic social club
 
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
IRJET- Effective Countering of Communal Hatred During Disaster Events in Soci...
 
OTOinsights "An Analysis of Digg.com Engagement and User Behavior"
OTOinsights "An Analysis of Digg.com Engagement and User Behavior"OTOinsights "An Analysis of Digg.com Engagement and User Behavior"
OTOinsights "An Analysis of Digg.com Engagement and User Behavior"
 
Best practice - digital and social media
Best practice - digital and social mediaBest practice - digital and social media
Best practice - digital and social media
 
Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)Predicting the future with social media (Twitter y Box Office)
Predicting the future with social media (Twitter y Box Office)
 

More from Janette Lehmann

From Site to Inter-site User Engagement
From Site to Inter-site User EngagementFrom Site to Inter-site User Engagement
From Site to Inter-site User EngagementJanette Lehmann
 
Networked user engagement
Networked user engagementNetworked user engagement
Networked user engagementJanette Lehmann
 
Dynamical Classes of Collective Attention in Twitter
Dynamical Classes of Collective Attention in TwitterDynamical Classes of Collective Attention in Twitter
Dynamical Classes of Collective Attention in TwitterJanette Lehmann
 
User Engagement - A scientific Challenge
User Engagement - A scientific ChallengeUser Engagement - A scientific Challenge
User Engagement - A scientific ChallengeJanette Lehmann
 
Reading Preference and Behavior on Wikipedia
Reading Preference and Behavior on WikipediaReading Preference and Behavior on Wikipedia
Reading Preference and Behavior on WikipediaJanette Lehmann
 
Temporal Variations in Networked User Engagement
Temporal Variations in Networked User EngagementTemporal Variations in Networked User Engagement
Temporal Variations in Networked User EngagementJanette Lehmann
 
Models of user engagement
Models of user engagementModels of user engagement
Models of user engagementJanette Lehmann
 
From site to networked engagement (Keynote)
From site to networked engagement (Keynote)From site to networked engagement (Keynote)
From site to networked engagement (Keynote)Janette Lehmann
 

More from Janette Lehmann (8)

From Site to Inter-site User Engagement
From Site to Inter-site User EngagementFrom Site to Inter-site User Engagement
From Site to Inter-site User Engagement
 
Networked user engagement
Networked user engagementNetworked user engagement
Networked user engagement
 
Dynamical Classes of Collective Attention in Twitter
Dynamical Classes of Collective Attention in TwitterDynamical Classes of Collective Attention in Twitter
Dynamical Classes of Collective Attention in Twitter
 
User Engagement - A scientific Challenge
User Engagement - A scientific ChallengeUser Engagement - A scientific Challenge
User Engagement - A scientific Challenge
 
Reading Preference and Behavior on Wikipedia
Reading Preference and Behavior on WikipediaReading Preference and Behavior on Wikipedia
Reading Preference and Behavior on Wikipedia
 
Temporal Variations in Networked User Engagement
Temporal Variations in Networked User EngagementTemporal Variations in Networked User Engagement
Temporal Variations in Networked User Engagement
 
Models of user engagement
Models of user engagementModels of user engagement
Models of user engagement
 
From site to networked engagement (Keynote)
From site to networked engagement (Keynote)From site to networked engagement (Keynote)
From site to networked engagement (Keynote)
 

Recently uploaded

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 

Recently uploaded (20)

A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 

Finding News Curators in Twitter: Automatic Detection of Topic-Focused Curators

  • 1. Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ethan Zuckerman Finding News Curators in Twitter
  • 2. Outline ¨  Motivation ¨  Types of curators ¨  Labeling news story curators ¨  Automatically finding news story curators ¨  Conclusion and future work 2 Photo credit (first slide): Hobvias Sudoneighm (CC-BY).
  • 3. Motivation ¨  Twitter has become a powerful tool for the aggregation and consumption of time- sensitive content in general and news in particular. ¨  Journalists use online social media platforms (Twitter, Facebook and others) and blogs to elicit other story angles or verify stories they are working on. To what extend the community of engaged readers - those who share news articles in social media – can contribute to the journalistic process? What kind of roles people play when sharing news? We want to detect users that provide further relevant information to a news story. We call them news story curators. 3
  • 4. Example Al Jazeera English news article about the civil war in Syria “Syria allows UN to step up food aid” [16 Jan 2013] Users that posted the article in Twitter Whom would you follow to find out more about the civil war in Syria? 4 #Followers Is tweeting about @RevolutionSyria 88,122 Syria @KenanFreeSyria 13,388 Syria @UP_food 703 Food @KeriJSmith 8,838 Breaking news/top stories @BreakingNews 5,662,866 Breaking news/top stories
  • 5. Types of news story curators Human Automatic Topic- unfocused Topic-unfocused curator Disseminating news articles about diverse topics, usually breaking news/top stories à @KeriJSmith News aggregators Collecting news articles (e.g. from RSS feeds) and automatically post their corresponding headlines and URLs à @BreakingNews Topic- focused Topic-focused curator Collecting interesting information with a specific focus, usually a geographic region or a topic à @KenanFreeSyria Topic-focused aggregators Disseminating automatically news with topical focus à @UP_food, @RevolutionSyria 5
  • 6. Types of news story curators Human Automatic Topic- unfocused Topic- focused Topic-focused curator Collecting interesting information with a specific focus, usually a geographic region or a topic à @KenanFreeSyria Topic-focused aggregators Disseminating automatically news with topical focus à @UP_food, @RevolutionSyria Valuable curators for a specific story These curators are probably less or not valuable 6
  • 7. Data sets Step 1: Selection of news articles ¨  News articles published in early 2013 from ¤  BBC World Service [BBC] 75 articles ¤  Al Jazeera English [AJE] 155 articles ¨  Stories: Obama's inauguration, Mali conflict, Pollution in Beijing, etc. Step 2 : News crowd detection ¨  All users who tweeted the article within the first 6 hours after publication Step 3: User characteristics ¨  Extraction of data from each user in the news crowd (e.g. further tweets, profile information) 7
  • 9. Labeling tasks Data ¨  Sample of 20 news articles ¨  For each news article, a sample of 10 users who posted the article ¨  We shown to three assessors: ¤  The title of the news article and a sample of tweets of the user ¤  Profile description and the number of followers of the user Labeling-Questions 9 Q1) Please indicate whether the user is interested or an expert of the topic of the article story: Yes: Most of her/his tweets relate to the topic of the story (e.g. the article is about the conflict in Syria, she/he is often tweeting about the conflict in Syria). Maybe: Many of her/his tweets relate to the topic of the story or she/he is interested in a related topic (e.g. the article is about the conflict in Syria, she/he is tweeting about armed conflicts or the Arabic world). No: She/he is not tweeting about the topic of the story. Unknown: Based on the information of the user it was not possible to label her/him. Q2) Please indicate whether the user is a human or generates tweets automatically: Human: The user has conversations and personal comments in his or her tweets. The text of tweets that have URLs (e.g. to news articles) seems self-written and contain user own opinions. Maybe automatic: The Twitter user has characteristics of an automatic profile, but she/he could be human as well. Automatic: The tweet stream of the user looks automatically generated. The tweets contain only headlines and URLs of news articles. Unknown: Based on the information of the user it was not possible to label her/him as human or automatic.
  • 10. Resulting training set Interested? (topic-focused) Human or Automatic? Interested + human n yes no n human automatic AJE 63 21% 79% 71 55% 45% 13% BBC 58 3% 54% 54 35% 65% 1.8% many users are topic-unfocused and automatic 10 We considered only users for which at least two annotators provided a decisive label (Yes or No, Human or Automatic)
  • 12. Features Visibility • Number of followers • Number of Twitter lists with user Tweeting activity • Number of tweets per day • Fraction of tweets that contains a re-tweet mark "RT", a URL, a user mention or a hashtag Topic focus • Number of crowds the user belongs to • Number of distinct article sections of the crowds (e.g. sports, business) the user belongs to 12
  • 13. Simple models UserIsHuman UserFracURL >= 0.85 automatic, otherwise human Model Human class: Prec/Rec: 0.85 AUC: 0.81 Evaluation UserIsInterestedInStory UserSectionsQ >= 0.9 not-interested, otherwise interested Model Interested class: Prec: 0.48 / Rec: 0.93 AUC: 0.83 Evaluation Preselection The user must have •  At least 1,000 followers •  Posted an article that is estimated related to the original article [1] 13 [1] J. Lehmann, C. Castillo, M. Lalmas, and E. Zuckerman. Transient news crowds in social media. In ICWSM, 2013. feature (one) selection + random forest algorithm
  • 14. Complex models Precision Recall AUC Automatic 0.88 0.84 0.93 Human 0.82 0.86 0.93 Interested 0.95 0.92 0.90 Not-interested 0.53 0.67 0.90 random forest with information-gain-based feature selection random forest with asymmetric misclassification costs false negatives (classifying an interested user as not interested) were considered 5 times more costly than false positives 14
  • 15. Precision-oriented evaluation We compared our method with two baseline approaches ¨  Users with the largest number of followers [FOLLOWER-APPROACH] ¨  Users with the largest number of stories detected as related to the original one [STORY-APPROACH] Data ¨  Sample of 20 news articles that had at least one curator, detected using the complex model with a confidence value >= 0.75 ¨  We extracted for each article the same number of possible curators using the other two approaches ¨  We asked three assessors to evaluate the results (question Q1 – UserIsInterestedInStory) ¨  About 210 labels for 70 units were collected Results true positive/false positive FOLLOWER-APPROACH: 2/18 = 11% STORY-APPROACH: 5/20 = 25% OUR APPROACH: 6/16 = 38% 15
  • 16. Conclusion and future work We were able to detect and model news story curators, who (could and maybe are) play an important role in the news ecosystem; not only for news readers, but for journalists and editors. ¨  A large amount of activity on Twitter is automatic and some of these news aggregators can be considered to be good curators ¨  Mostly the attention of the user is quickly shifting away - posting a link does not have to reflect a long-standing interest on the subject of the link Future work ¨  Adding other (Twitter) variables to the system that capture, for instance, interestingness and serendipity ¨  Application on other news providers ¨  Analysis of the functionality of popular news aggregators, which are comparable to RSS feeds 16
  • 17. Questions and Discussion… 17 Janette Lehmann Universitat Pompeu Fabra jnt.lehmann@gmail.com Carlos Castillo Qatar Computing Research Institute chato@acm.org Mounia Lalmas Yahoo! Labs mounia@acm.org Ethan Zuckerman MIT Center for Civic Media ethanz@media.mit.edu Photocredit:WayneLarge(CC-BY-ND). Photo credits: Hobvias Sudoneighm (CC BY), Thomas Leuthard (CC BY), Mads Iversen (CC BY-NC-SA), Wayne Large (CC BY-ND)