SlideShare a Scribd company logo
1 of 30
Download to read offline
Scaling the Guardian


Michael Brunton-Spall (@bruntonspall)

michael.brunton-spall@guardian.co.uk
The Guardian - Some Figures

 ABCe Audited (Dec 2009)
    Unique Users - 36.9m per month, 1.8m per day
    Page Impressions - 259m per month, 9.2m per day
 Log file analysis
    37m requests per day, 1.1bn requests per month - not
    inlcuding images / static files
Initial Architecture
Scaling Problems

 In memory cache is order of magnitude too small at 500Mb
Even Worse!




 Cache is local to appserver
   Adding an App Server makes the problem worse
Our Solution



            Memcached!
    or more accurately, a distributed cache
Our Solution
Phase 1

 Memcache object cache
   Massive reduction in number of DB calls




    No significant drop in DB Load
Phase 2

 Memcached query cache
   Massive reduction in DB Load
Phase 3
Phase 3

 Memcached pages
   More reduction in Appserver load
   Must handle customisation outside of cache
   Memcached for pages is filter
   Page customisation is a higher filter
   Time based decache only
   Decache only on direct page edit
Getting a Scaling Solution

  The problem isn't technical
  It's all about the process
  Agile doesn't scale well!
       Onsite customer doesn't care about scaling
       Dedicated 10% team to look at "platform" issues
       Still Agile, Customer is Operations Team & Architects
       (backend and frontend)
Scaling small apps rapidly

On Thursday 15th 2010 there was a historic UK event - a
televised national debate.
Poll Charts

Always sounds simple:

"Let people viewing the page vote at anytime whether they like
or dislike what the party leader is saying. Oh, and lets show it
with a real time graph"

Bad words here
   anytime
   real-time graph
Our coverage looked like this...
The poll itself
The poll itself

  Python
  Google App Engine
  An inhouse, inplatform cache
The Naive Implementation

class IncrLibDemRequest:
   def get(self):
     Poll.get().libdems += 1


Why?
  Google App Engine has transaction locks, simultaneous
  threads can't atomically increment a counter (duh)
  If you wrap in a txn, all threads are serialised.
      You just turned Googles massively parallel data center
      into a very expensive file backed db
Our Implementation (Phase 1)

Sharded counters are the way to go
   Follow the article at code.google.com/appengine on
   sharded counters
   Gives parallel counters
   But beware....
Our Results and Numbers
Our Results and Numbers
Some interesting notes

 Average of around 100-120 req/s
 Peaked at 400 req/s
 Total of nearly 1,000,000 requests
 Surprisingly little cheating
    Only 2000 requests

  But...
Request Duration




 Between 1 sec and 8 seconds!
 Causes
    Thread contention
    Not enough shards
Our Implementation (2)

 Increase shards by factor of 10?
     Completely reduces transaction failures
     Each request still takes 200ms
     The cost is the datastore write
  Replace datastore with memcache?
    Different architecture
        vote does memcache atomic
        increment/decrement
        results get from memcache
        cronjob 1/min reads from memcache and
        writes to datastore
    requests now take 20 ms
The Results?
The Results?
Some notes
  Total of around 2,727,000 requests
  Average of around 454 req/s
  Peaked at 750 req/s
Requests per Second




But...
Request Duration




 Average 1.2s at first
 Live deploy fixed to 300ms
Any Questions?




Michael Brunton-Spall (@bruntonspall)

michael.brunton-spall@guardian.co.uk

More Related Content

What's hot

1Spatial: Cardiff FME World Tour: Getting started with FME
1Spatial: Cardiff FME World Tour: Getting started with FME1Spatial: Cardiff FME World Tour: Getting started with FME
1Spatial: Cardiff FME World Tour: Getting started with FME1Spatial
 
Sydney Continuous Delivery Meetup May 2014
Sydney Continuous Delivery Meetup May 2014Sydney Continuous Delivery Meetup May 2014
Sydney Continuous Delivery Meetup May 2014Andreas Grabner
 
An Ops Primer to Productionalizing Datameer
An Ops Primer to Productionalizing DatameerAn Ops Primer to Productionalizing Datameer
An Ops Primer to Productionalizing DatameerColin Brown
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
How To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce Labs
How To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce LabsHow To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce Labs
How To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce LabsSauce Labs
 
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructurePercolate
 
Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)
Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)
Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)Bill Condo
 
Create awesome Azure Functions with PowerShell
Create awesome Azure Functions with PowerShellCreate awesome Azure Functions with PowerShell
Create awesome Azure Functions with PowerShellJaap Brasser
 
Automate it with Azure Functions
Automate it with Azure FunctionsAutomate it with Azure Functions
Automate it with Azure FunctionsJaap Brasser
 
Serving Up Testability with MockServer
Serving Up Testability with MockServerServing Up Testability with MockServer
Serving Up Testability with MockServerJames Kirkbride
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyAndreas Grabner
 
Building High Performance Web Applications
Building High Performance Web ApplicationsBuilding High Performance Web Applications
Building High Performance Web ApplicationsJeff Whelpley
 
Don't roll your own HTTP server
Don't roll your own HTTP serverDon't roll your own HTTP server
Don't roll your own HTTP serverNordic APIs
 
What I Learned from Optimizing Workspaces through Many Years
What I Learned from Optimizing Workspaces through Many YearsWhat I Learned from Optimizing Workspaces through Many Years
What I Learned from Optimizing Workspaces through Many YearsSafe Software
 
Amazon EKS: the good, the bad, and the ugly
Amazon EKS: the good, the bad, and the uglyAmazon EKS: the good, the bad, and the ugly
Amazon EKS: the good, the bad, and the uglyCloudOps2005
 
Automating everything with Microsoft Flow
Automating everything with Microsoft FlowAutomating everything with Microsoft Flow
Automating everything with Microsoft FlowJaap Brasser
 
180929_NextBuild_From_Java_to_Kotlin
180929_NextBuild_From_Java_to_Kotlin180929_NextBuild_From_Java_to_Kotlin
180929_NextBuild_From_Java_to_KotlinPaulien van Alst
 

What's hot (20)

1Spatial: Cardiff FME World Tour: Getting started with FME
1Spatial: Cardiff FME World Tour: Getting started with FME1Spatial: Cardiff FME World Tour: Getting started with FME
1Spatial: Cardiff FME World Tour: Getting started with FME
 
Sydney Continuous Delivery Meetup May 2014
Sydney Continuous Delivery Meetup May 2014Sydney Continuous Delivery Meetup May 2014
Sydney Continuous Delivery Meetup May 2014
 
An Ops Primer to Productionalizing Datameer
An Ops Primer to Productionalizing DatameerAn Ops Primer to Productionalizing Datameer
An Ops Primer to Productionalizing Datameer
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Zapier Demystified
Zapier DemystifiedZapier Demystified
Zapier Demystified
 
How To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce Labs
How To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce LabsHow To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce Labs
How To Combine Back-End 
 & Front-End Testing with BlazeMeter & Sauce Labs
 
Go with the flow!
Go with the flow!Go with the flow!
Go with the flow!
 
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless InfrastructureHow Percolate uses CFEngine to Manage AWS Stateless Infrastructure
How Percolate uses CFEngine to Manage AWS Stateless Infrastructure
 
Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)
Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)
Performance Testing w/ WebPage Test Private Instance (DrupalCamp Ohio)
 
Create awesome Azure Functions with PowerShell
Create awesome Azure Functions with PowerShellCreate awesome Azure Functions with PowerShell
Create awesome Azure Functions with PowerShell
 
Automate it with Azure Functions
Automate it with Azure FunctionsAutomate it with Azure Functions
Automate it with Azure Functions
 
Serving Up Testability with MockServer
Serving Up Testability with MockServerServing Up Testability with MockServer
Serving Up Testability with MockServer
 
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and ScalabiltyDocker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
Docker/DevOps Meetup: Metrics-Driven Continuous Performance and Scalabilty
 
Building High Performance Web Applications
Building High Performance Web ApplicationsBuilding High Performance Web Applications
Building High Performance Web Applications
 
Skype goes agile
Skype goes agileSkype goes agile
Skype goes agile
 
Don't roll your own HTTP server
Don't roll your own HTTP serverDon't roll your own HTTP server
Don't roll your own HTTP server
 
What I Learned from Optimizing Workspaces through Many Years
What I Learned from Optimizing Workspaces through Many YearsWhat I Learned from Optimizing Workspaces through Many Years
What I Learned from Optimizing Workspaces through Many Years
 
Amazon EKS: the good, the bad, and the ugly
Amazon EKS: the good, the bad, and the uglyAmazon EKS: the good, the bad, and the ugly
Amazon EKS: the good, the bad, and the ugly
 
Automating everything with Microsoft Flow
Automating everything with Microsoft FlowAutomating everything with Microsoft Flow
Automating everything with Microsoft Flow
 
180929_NextBuild_From_Java_to_Kotlin
180929_NextBuild_From_Java_to_Kotlin180929_NextBuild_From_Java_to_Kotlin
180929_NextBuild_From_Java_to_Kotlin
 

Similar to Scaling the guardian

Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsInside Analysis
 
Satisfying Business and Engineering Requirements: Client-server JavaScript, S...
Satisfying Business and Engineering Requirements: Client-server JavaScript, S...Satisfying Business and Engineering Requirements: Client-server JavaScript, S...
Satisfying Business and Engineering Requirements: Client-server JavaScript, S...Jason Strimpel
 
Making it fast: Zotonic & Performance
Making it fast: Zotonic & PerformanceMaking it fast: Zotonic & Performance
Making it fast: Zotonic & PerformanceArjan
 
Anton Lytunenko "Data Lake. Make data pleasant to swim in"
Anton Lytunenko "Data Lake. Make data pleasant to swim in"Anton Lytunenko "Data Lake. Make data pleasant to swim in"
Anton Lytunenko "Data Lake. Make data pleasant to swim in"Lviv Startup Club
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Softwareelliando dias
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Pythondidip
 
Magento performancenbs
Magento performancenbsMagento performancenbs
Magento performancenbsvarien
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software PerformanceGibraltar Software
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServicePoornima Vijayashanker
 
Developing a database server: software engineer's view
Developing a database server: software engineer's viewDeveloping a database server: software engineer's view
Developing a database server: software engineer's viewLaurynas Biveinis
 
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...John McCaffrey
 
Windy cityrails performance_tuning
Windy cityrails performance_tuningWindy cityrails performance_tuning
Windy cityrails performance_tuningJohn McCaffrey
 
Stress Test as a Culture
Stress Test as a CultureStress Test as a Culture
Stress Test as a CultureJoão Moura
 
Facebook, Robert Johnson
Facebook, Robert JohnsonFacebook, Robert Johnson
Facebook, Robert JohnsonFuenteovejuna
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsTechWell
 
WebPerformance: Why and How? – Stefan Wintermeyer
WebPerformance: Why and How? – Stefan WintermeyerWebPerformance: Why and How? – Stefan Wintermeyer
WebPerformance: Why and How? – Stefan WintermeyerElixir Club
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - TalkMatthias Noback
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented DesignRodrigo Campos
 
Performance Optimization
Performance OptimizationPerformance Optimization
Performance OptimizationNeha Thakur
 

Similar to Scaling the guardian (20)

Thinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters AnalyticsThinking Outside the Cube: How In-Memory Bolsters Analytics
Thinking Outside the Cube: How In-Memory Bolsters Analytics
 
Satisfying Business and Engineering Requirements: Client-server JavaScript, S...
Satisfying Business and Engineering Requirements: Client-server JavaScript, S...Satisfying Business and Engineering Requirements: Client-server JavaScript, S...
Satisfying Business and Engineering Requirements: Client-server JavaScript, S...
 
Making it fast: Zotonic & Performance
Making it fast: Zotonic & PerformanceMaking it fast: Zotonic & Performance
Making it fast: Zotonic & Performance
 
Anton Lytunenko "Data Lake. Make data pleasant to swim in"
Anton Lytunenko "Data Lake. Make data pleasant to swim in"Anton Lytunenko "Data Lake. Make data pleasant to swim in"
Anton Lytunenko "Data Lake. Make data pleasant to swim in"
 
The Yin and Yang of Software
The Yin and Yang of SoftwareThe Yin and Yang of Software
The Yin and Yang of Software
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
Magento performancenbs
Magento performancenbsMagento performancenbs
Magento performancenbs
 
Natural Laws of Software Performance
Natural Laws of Software PerformanceNatural Laws of Software Performance
Natural Laws of Software Performance
 
The Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web ServiceThe Evolution of a Scrappy Startup to a Successful Web Service
The Evolution of a Scrappy Startup to a Successful Web Service
 
Developing a database server: software engineer's view
Developing a database server: software engineer's viewDeveloping a database server: software engineer's view
Developing a database server: software engineer's view
 
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
 
Windy cityrails performance_tuning
Windy cityrails performance_tuningWindy cityrails performance_tuning
Windy cityrails performance_tuning
 
Stress Test as a Culture
Stress Test as a CultureStress Test as a Culture
Stress Test as a Culture
 
Facebook, Robert Johnson
Facebook, Robert JohnsonFacebook, Robert Johnson
Facebook, Robert Johnson
 
DevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More DefectsDevOps: Find Solutions, Not More Defects
DevOps: Find Solutions, Not More Defects
 
WebPerformance: Why and How? – Stefan Wintermeyer
WebPerformance: Why and How? – Stefan WintermeyerWebPerformance: Why and How? – Stefan Wintermeyer
WebPerformance: Why and How? – Stefan Wintermeyer
 
Advanced web application architecture - Talk
Advanced web application architecture - TalkAdvanced web application architecture - Talk
Advanced web application architecture - Talk
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
Performance Optimization
Performance OptimizationPerformance Optimization
Performance Optimization
 

Scaling the guardian

  • 1. Scaling the Guardian Michael Brunton-Spall (@bruntonspall) michael.brunton-spall@guardian.co.uk
  • 2. The Guardian - Some Figures ABCe Audited (Dec 2009) Unique Users - 36.9m per month, 1.8m per day Page Impressions - 259m per month, 9.2m per day Log file analysis 37m requests per day, 1.1bn requests per month - not inlcuding images / static files
  • 4. Scaling Problems In memory cache is order of magnitude too small at 500Mb
  • 5. Even Worse! Cache is local to appserver Adding an App Server makes the problem worse
  • 6. Our Solution Memcached! or more accurately, a distributed cache
  • 8. Phase 1 Memcache object cache Massive reduction in number of DB calls No significant drop in DB Load
  • 9. Phase 2 Memcached query cache Massive reduction in DB Load
  • 11. Phase 3 Memcached pages More reduction in Appserver load Must handle customisation outside of cache Memcached for pages is filter Page customisation is a higher filter Time based decache only Decache only on direct page edit
  • 12. Getting a Scaling Solution The problem isn't technical It's all about the process Agile doesn't scale well! Onsite customer doesn't care about scaling Dedicated 10% team to look at "platform" issues Still Agile, Customer is Operations Team & Architects (backend and frontend)
  • 13. Scaling small apps rapidly On Thursday 15th 2010 there was a historic UK event - a televised national debate.
  • 14. Poll Charts Always sounds simple: "Let people viewing the page vote at anytime whether they like or dislike what the party leader is saying. Oh, and lets show it with a real time graph" Bad words here anytime real-time graph
  • 15. Our coverage looked like this...
  • 17. The poll itself Python Google App Engine An inhouse, inplatform cache
  • 18. The Naive Implementation class IncrLibDemRequest: def get(self): Poll.get().libdems += 1 Why? Google App Engine has transaction locks, simultaneous threads can't atomically increment a counter (duh) If you wrap in a txn, all threads are serialised. You just turned Googles massively parallel data center into a very expensive file backed db
  • 19. Our Implementation (Phase 1) Sharded counters are the way to go Follow the article at code.google.com/appengine on sharded counters Gives parallel counters But beware....
  • 20. Our Results and Numbers
  • 21. Our Results and Numbers
  • 22. Some interesting notes Average of around 100-120 req/s Peaked at 400 req/s Total of nearly 1,000,000 requests Surprisingly little cheating Only 2000 requests But...
  • 23. Request Duration Between 1 sec and 8 seconds! Causes Thread contention Not enough shards
  • 24. Our Implementation (2) Increase shards by factor of 10? Completely reduces transaction failures Each request still takes 200ms The cost is the datastore write Replace datastore with memcache? Different architecture vote does memcache atomic increment/decrement results get from memcache cronjob 1/min reads from memcache and writes to datastore requests now take 20 ms
  • 27. Some notes Total of around 2,727,000 requests Average of around 454 req/s Peaked at 750 req/s
  • 29. Request Duration Average 1.2s at first Live deploy fixed to 300ms
  • 30. Any Questions? Michael Brunton-Spall (@bruntonspall) michael.brunton-spall@guardian.co.uk