Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16

835 views

Published on

Data Science in the Newsroom

Published in: Technology
  • Be the first to comment

Geetu Ambwani, Principal Data Scientist, Huffington Post at MLconf NYC - 4/15/16

  1. 1. Data Science in the Newsroom Geetu Ambwani Principal Data Scientist geetu.ambwani@huffingtonpost.com
  2. 2. What is the Huffington Post? Founded May 2005 Ranking among Digital-only news websites 1 Cross-platform monthly unique visitors Over 187 Million Number of articles per day Over 500 Number of international editions 15 Bloggers Over 100,000
  3. 3. News Industry - Trends HuffPost has consistently been an innovator in the digital publishing space. Massive Blogging Network: More than 100K bloggers across the globe
  4. 4. News Industry - Trends HuffPost has consistently been an innovator in the digital publishing space. Google Site Rank
  5. 5. News Industry - Trends HuffPost has consistently been an innovator in the digital publishing space. Biggest Social publisher
  6. 6. News Industry - Challenges
  7. 7. How Can Data Help ?
  8. 8. Ad campaigns International editionsSocial media promotion Editors User-experience Blog moderators Reporters HuffPost Studio
  9. 9. Content Lifecycle DistributionCreation Consumption
  10. 10. Content Creation: How Can Data Help ? ● Tools to help surface, discover trends in different parts of the web ● Content Enhancement with multimedia based on semantic matching (images, slideshows, videos) ● Optimizing headlines/images (RobinHood Platform)
  11. 11. Content Gap: Production Versus Consumption
  12. 12. Content Consumption: How Can Data Help? Know Your Audience ● User Cohorts: ○ Social Traffic versus FrontPage Clickers consume different content ○ Desktop Vs Mobile consumption ● Recommendations/Personalization ● Can we use data to inform product design and interface ? ○ Rearrange share buttons based on traffic origin (Facebook vs Pinterest)
  13. 13. Content Lifecycle DistributionCreation Consumption
  14. 14. Content Distribution: Can Data Help ? ● People’s attention is increasingly concentrated on social streams ○ More traffic to publishers from social than any other way ● Are Distributed Platforms the new home page ? ○ Facebook Instant, Apple News, Snapchat Discover, Google Amp ○ Messenger Bots ● You need to be where your audience is: ○ Identify the content mix that is maximally engaging on an external platform ○ Can we use data to seed these distribution networks ? (Facebook HuffPost Pages, Snapchat Discover)
  15. 15. Content Distribution: Can Data Help ? ● HuffPost produces 1000 articles a day - which of these do we promote ? ● Article PVs follow a very skewed distribution of success ○ Only 1% of our articles > 100k PVs ● Content performs differently on different networks. ● Can we predict the articles that will get traction in advance so ■ We can optimally seed multiple distribution channels (Facebook HP Pages, Snapchat Discover) ■ Target for premium/high value ads to maximize revenue ■ Populate Recommendation Widgets
  16. 16. Content Distribution: Can Data Help ? Challenges ● Histogram of traffic distribution - highly skewed. ● The very act of promoting something causes a bump in traffic. ● Data normalization - how long do want to wait before predicting ? ● Very imbalanced data set Our Approach ● Random Forest classifier. ● Multiple success criteria ● Historical examples of (+) and (-) articles. Downsampling. ● Different normalization thresholds ● Feature engineering: traffic growth ratios; initial organic social traffic per minute; distinct referrers;
  17. 17. Slackbot for the social promotion team ● 20% lift in PVs per predicted article
  18. 18. ● 20% lift in PVs per predicted article
  19. 19. Conclusion A Data Driven Newsroom today means ● More than just keeping track of clicks and shares ● Using predictive analytics to drive product and content placement Machine Learning will be a key driver for success with the advent of distributed content
  20. 20. Thanks ! MachineLearning@HuffPost

×