This document summarizes Devansh Dhutia's presentation about content delivery at USA Today Network. It discusses how the network reaches over 1 billion monthly viewers through 3000 journalists and leverages both active and passive search. It also outlines initiatives like My Topics for personalized alerts, News Near Me for local content, and different types of backfill content used to supplement news coverage. The presentation concludes by discussing lessons learned and opportunities to improve the distributed platform and content pipeline.
Injustice - Developers Among Us (SciFiDevCon 2024)
How Does the USA Today Network Provide Its Readers With Meaningful Content? - Devansh Dhutia, USA Today Network
1. Delivering Meaningful Content at USA
Today Network
Devansh Dhutia
Manager, Development – USA Today Network
Montreal
October 15-18
2. Agenda
• About USA Today Network
• Where are the readers?
• My Topics
• News Near Me
• Content Backfill
– Types of backfill
• Distributed Platform
– Time series data
– Rights Management
• Lessons Learned
• Enter Content Pipeline
• Where to next?
• Q&A
2
5. About USA Today Network
~ 1.1 billion monthly views
~ 3000 journalists creating content
< 1 % of end users use active search
> 75% of pages leverage passive search
100% of authors leverage active/passive search to package &
curate content
~ 25 million monthly syndication requests
5
8. • Automated Push Alerts
• Customized headline feed
• Increased reader engagement
• Native Apps only
“This is just one of the first ways
we’re making personalized consumer
experiences a priority”
Jason Jedlinski, VP Product Management
22. Lessons Learned
22
• Write as you want to read it is fast
• Challenge: keeping various views consistent.
• Challenge: related data changes require large reprocessing
• Challenge: lack of strict schema makes changes unpredictable
• Challenge: business logic spread across multiple tiers
• Denormalization simplifies queries
• Challenge: Simple data changes require large chunks of re-indexing
• Challenge: Denormalization makes your index look different from data
23. Lessons Learned (cont.)
23
• Expose raw search engine’s power to users
• Challenge: Most users don’t care to craft solr specific queries
• Challenge: New use cases can go to production without query review
• Challenge: Query sprawl
24. Enter Content Pipeline
24
• Write the data once
• Benefit: Single view to maintain
• Benefit: All consumers work off single model
• Benefit: Business logic pushed to production tier
• Normalize models in storage and in index
• Benefit: Related model updates do not require reprocessing
• Benefit: Retrieve only the data you care about
25. Enter Content Pipeline (cont.)
25
• GraphQL: Customers choose what they want
• Benefit: Customers have an ala carte selection of data to query
• Benefit: All data access becomes uniform
• Benefit: Api engineers can understand what data is actually used and what isn’t
• Abstract the search index nuances away from user
• Benefit: search becomes another graphql query
• Benefit: new searches are reviewed
• Benefit: Relevance engineering can happen independently from application development
26. Where to next?
26
• Feedback loops from the distributed platform
• Solve the “hard” personalization problem
• Switch more of our customers to the graphql based content pipeline
• Faster “on-the-fly” access management
Micro service polling solr
Near realtime
High adoption rate – 8% lift in PV depth & high lift on return frequency
Increased local market app downloads through cross-pollination of content
2 step detection for closest market then content from market
Widening capability for news near you
Unlike retail whose lifecycle continues
News is short lived
AN – 14.5M users / mo
Significant amount content used by various partners
DRM
Pre-tagging content – reindex everything on changes
Time series sharding