Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Characterizing the Life Cycle of Online News Stories Using Social Media Reactions

971 views

Published on

Carlos Castillo, Mohammed El-Haddad, Jürgen Pfeffer and Matt Stempeck: Characterizing the Life Cycle of Online News Stories Using Social Media Reactions. In CSCW. Baltimore, USA. February 2014.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Characterizing the Life Cycle of Online News Stories Using Social Media Reactions

  1. 1. Characterizing the Life Cycle of Online News Stories Using Social Media Reactions Carlos Castillo, Mohammed El-Haddad, Matt Stempeck, Jürgen Pfeffer Twitter: @ChaToX
  2. 2. 2 Carlos Castillo – @chatox http://www.chato.cl/research/ Outline • Determining classes of news articles • Predicting traffic using social media
  3. 3. 3 Carlos Castillo – @chatox http://www.chato.cl/research/ Usage analysis in online news • Aikat (1998) – Short dwell times, weekday+, weekend-, bursty traffic. • Crane and Sornette (2008), Yang and Leskovec (2011), Lehmann et al. (2012) – Behavioral classes of attention online
  4. 4. 4 Carlos Castillo – @chatox http://www.chato.cl/research/ Analysis of social media responses • SocialFlow whitepaper (Lotan, Gaffney, and Meyer 2011) – Al Jazeera, BBC News, CNN, The Economist, Fox News and The New York Times • Hu et al. (2011) – Tweets during speech of US president
  5. 5. 5 Carlos Castillo – @chatox http://www.chato.cl/research/ Predictive Web Analytics (references)
  6. 6. 6 Carlos Castillo – @chatox http://www.chato.cl/research/ Data collection • Three weeks in October 2012 • “Beacon” embedded in Al Jazeera pages – Real-time data processing – Apache S4 application for online processing – Cassandra (NoSQL database) for storage ≈ 3M visits ≈ 200K social media reactions
  7. 7. 7 Carlos Castillo – @chatox http://www.chato.cl/research/ Summary of dataset
  8. 8. 8 Carlos Castillo – @chatox http://www.chato.cl/research/ News In-Depth Examples: • US state of Maryland abolishes death penalty (May 2nd, 2013) • Hundreds arrested in China over 'fake' meat (May 3rd, 2013) Examples: • Spirits of Japan shrine haunt Asian relations (May 2nd, 2013) • Interactive: Powering the Gulf (May 2nd, 2013)
  9. 9. 9 Carlos Castillo – @chatox http://www.chato.cl/research/ News (322) In-Depth (139) Tag clouds extracted from titles of articles
  10. 10. Average News profile
  11. 11. Average In-Depth profile
  12. 12. In-Depth items have a slower growth
  13. 13. In-Depth items have a longer shelf-life
  14. 14. In-Depth items are shared on Facebook News items are shared on Twitter
  15. 15. 15 Carlos Castillo – @chatox http://www.chato.cl/research/ Typical visitation profiles (12 hours) Decreasing (78%) Steady (9%) Increasing (3%) Rebounding (10%)
  16. 16. Examples Decreasing (78%): ● Almost all breaking news ● Sometimes delayed due to timezone differences, e.g. Hurricane Sandy Steady or Increasing (12%): ● Ongoing news: Obama/Romney, Worker strikes in SA, Syrian unrest ● Articles updated with supporting content Rebounding (10%): ● Articles picked up by external sources or social media (typically single source of traffic) ● Background articles to new developments
  17. 17. 17 Carlos Castillo – @chatox http://www.chato.cl/research/ Prediction of visits • Short-term traffic is to a large extent correlated with long-term traffic • Social media signals are correlated with traffic and shelf-life More reactions → more traffic More discussion → longer shelf-life • Can we predict 7 days after 30 minutes?
  18. 18. 18 Carlos Castillo – @chatox http://www.chato.cl/research/ Predicting traffic and shelf-life online has a long history • Predicting long-term behavior and half-life from short-term observations – Observations = comments, visits, votes, … – Behavior = total comments, total visits, … – 10+ papers specifically on web traffic • Bit.ly (2011, 2012) – Studies half-life per topic and platform
  19. 19. Results (traffic predictions)
  20. 20. Results (traffic predictions) Extrapolate visits News are more predictable than In-Depth
  21. 21. Results (traffic predictions) Improved predictions Using social media variables
  22. 22. 22 Carlos Castillo – @chatox http://www.chato.cl/research/ Selected variables, traffic prediction
  23. 23. Results (shelf-life prediction) Larger improvements for In-Depth articles Still, this is a 12 hours error in predicting something with an average of 48-72 hours
  24. 24. 24 Carlos Castillo – @chatox http://www.chato.cl/research/ http://fast.qcri.org/
  25. 25. 25 Carlos Castillo – @chatox http://www.chato.cl/research/ What did we learn? • Decrease, Stay or Increase. Rebound – Roughly 80:10:10 ratio • News vs In-Depth: different behavior • Social media signals are useful to understand and predict visits
  26. 26. 26 Carlos Castillo – @chatox http://www.chato.cl/research/ Invitation: ECML/PKDD Discovery Challenge 2014 • Open competition on predictive Web Analytics • Data provided by Chartbeat Inc.
  27. 27. Thank you! Carlos Castillo · chato@acm.org http://www.chato.cl/research/

×