14. Web server Web server Web server
App bring
I server you NEWS!!!
App server App server
Memcached (20Gb)
Oracle
CMS Data feeds
15. Web server Web server Web server
Why RDBMS?
App bring you NEWS!!!
I server App server App server
5 years ago, fewer alternatives
Understand operations procedures
Memcached
Can easily recruit DBAs / devs
Developer/ops tools
Oracle
Business critical system: a safe choice
CMS Data feeds
27. Mutalisation of journalism
Mutualised news! content
No longer only broadcasting
User engagement & contribution:
journalism
data
software
Data curation / linked data
Support engaged developers with data and APIs
29. Mutualised news!
Platform strategy
Out: Release our data to the world via APIs
In: Rapidly build new functionality outside the core
Write: Ingest, store & present arbitrary data
31. Content API
Delivered using Apache Solr
Mutualised news!
Document oriented search engine
Loose schema:
records, fields, facets
Fields can be multi-value
Supports dynamic field generation
Can apply multiple facets in queries faster than RDBMS
36. Can perform complex queries, including full text search
Mutualised news!
Can filter results with facets (WHERE clause)
ANYTHING can be a facet.Very powerful.
On our dataset most queries are of a similar cost
Scales very well horizontally
Handles millions of documents
37. Mutualised news!
No transactions
Excellent for certain types of queries
Not truly general purpose
Schema design very important
Search index not really persistence
38. Core
Api
Web servers
Solr
App server
Solr
Memcached (20Gb)
Solr
rdbms Solr
Solr
M/Q Solr
CMS Cloud, EC2
39. API
Mutualised news!
Currently powering iPad app
Site components
External applications
Editors tools
More to follow
41. Application framework
Simple REST/ HTTP news! allows lightweight
Mutualised framework
development
Applications proxied for performance
Apps generally hosted in the cloud, hot deployment into
production
No RDBMs provided for storage
Can develop in news timeline
44. Some useful
characteristics
• Scale down as well as up
• Support rapid production-ready prototyping:
turn projects around in hours or days
• Handle massive traffic spikes
45. Desktop analysis
• Leaked BNP
membership list
• Load postcodes to
constituencies
mapping in to Redis
• Generate heatmaps
by looking up all
12,000 postcodes
51. Zeitgeist stores pre-
calculated results in BigTable
• Data comes in from stats system,
comments system and OneRiot real-time
search API
• AppEngine cron tasks populate task queues
• Task queues recalculate hotness levels
• “Live” BigTable queries are simple
SELECT / SORT
52. Live debate poll
• Over a million votes cast in an hour
• Stretched limits of BigTable / AppEngine
• Sharded counter pattern to handle writes
56. • Datablog was launched with no
development involvement at all - it’s a blog,
and a bunch of Google Docs Spreadsheets
• Retrieve data as CSV, XLS, JSON, Atom...
• “Make a copy” and run your own analysis