How does the guardian website scale?
With millions of page views per month, we need to think about scaling to an extreme level. But being Agile we did it as we went.
2. The Guardian - Some Figures
ABCe Audited (Dec 2009)
Unique Users - 36.9m per month, 1.8m per day
Page Impressions - 259m per month, 9.2m per day
Log file analysis
37m requests per day, 1.1bn requests per month - not
inlcuding images / static files
11. Phase 3
Memcached pages
More reduction in Appserver load
Must handle customisation outside of cache
Memcached for pages is filter
Page customisation is a higher filter
Time based decache only
Decache only on direct page edit
12. Getting a Scaling Solution
The problem isn't technical
It's all about the process
Agile doesn't scale well!
Onsite customer doesn't care about scaling
Dedicated 10% team to look at "platform" issues
Still Agile, Customer is Operations Team & Architects
(backend and frontend)
13. Scaling small apps rapidly
On Thursday 15th 2010 there was a historic UK event - a
televised national debate.
14. Poll Charts
Always sounds simple:
"Let people viewing the page vote at anytime whether they like
or dislike what the party leader is saying. Oh, and lets show it
with a real time graph"
Bad words here
anytime
real-time graph
17. The poll itself
Python
Google App Engine
An inhouse, inplatform cache
18. The Naive Implementation
class IncrLibDemRequest:
def get(self):
Poll.get().libdems += 1
Why?
Google App Engine has transaction locks, simultaneous
threads can't atomically increment a counter (duh)
If you wrap in a txn, all threads are serialised.
You just turned Googles massively parallel data center
into a very expensive file backed db
19. Our Implementation (Phase 1)
Sharded counters are the way to go
Follow the article at code.google.com/appengine on
sharded counters
Gives parallel counters
But beware....
22. Some interesting notes
Average of around 100-120 req/s
Peaked at 400 req/s
Total of nearly 1,000,000 requests
Surprisingly little cheating
Only 2000 requests
But...
24. Our Implementation (2)
Increase shards by factor of 10?
Completely reduces transaction failures
Each request still takes 200ms
The cost is the datastore write
Replace datastore with memcache?
Different architecture
vote does memcache atomic
increment/decrement
results get from memcache
cronjob 1/min reads from memcache and
writes to datastore
requests now take 20 ms