SlideShare a Scribd company logo
1 of 23
Download to read offline
Yet 
                          Another 
                             Rails 
                           Scaling 
                      Presentation 
Ruby on Rails Meetup 
May 10, 2007 
Jared Friedman (jared@scribd.com) and 
Tikhon Bernstam (tikhon@scribd.com)
Should you bother with 
scaling? 
l  Well, it depends 

l  But if you’re launching a startup, probably 

l  The best way to launch a startup these days 
   is to get it on TechCrunch, Digg, Reddit, etc. 
l  You don’t get as much time to grow 
   organically as you used to 
l  You only get one launch – don’t want your 
   site to fall over
The Predecessors 
l  Other great places to look for info on this 
      poocs.net The Adventures of Scaling Rails 
l 
     http://poocs.net/2006/3/13/the­adventures­of­scaling­stage­1 


l  Stephen Kaes “Performance Rails” 
     http://railsexpress.de/blog/files/slides/rubyenrails2006.pdf 


l  RobotCoop blog and gems 
     http://www.robotcoop.com/articles/2006/10/10/the­software­and­hardware­that­runs­our­sites 


l  O’reilly book “High Performance MySQL” 
           It’s not rails, but it’s really useful
     l 
Big Picture 
l  This presentation will concentrate on what’s 
   different from previous writings, not a 
   comprehensive overview 
l  Available at http://www.scribd.com/blog
Who we are 
l  Scribd.com 

l  Like “YouTube for documents” 

l  Launched in March, 2007 

l  Handles ~1M requests per day
Key Points 
l  General architecture 

l  Use fragment caching! 

l  Rolling your own traffic analytics and some 
  SQL tips
Current Scribd architecture 
l  1 Web Server 

l  3 Database Servers 

l  3 Document conversion servers 

l  Test and backup machines 

l  Amazon S3
Server Hardware 
l  Dual, dual­core woodcrests at 3GHz 

l  16GB of memory 

l  4 15K SCSCI hard drives in a RAID 10 

l  We learned: disk speed is important 

l  Don't skimp; you’re not Google, and it's 
   easier to scale up than out 
l  Softlayer is a great dedicated hosting 
   company
Various software details 
l  CentOS 

l  Apache/Mongrel 

l  Memcached, RobotCoop’s memcache­client 

l  Stefan Kaes’ SQLSessionStore 
       Best way to store persistent sessions 
  l 

l  Monit, Capistrano 

l  Postfix
Fragment Caching 
    quot;We don’t use any page or fragment 
l 
   caching.quot; ­ robotcoop 
l  quot;Play with fragment caching ... no 
   improvement, changes were reverted at a 
   later time.quot; ­ poocs.net 
l  Well, maybe it's application specific 

l  Scribd uses fragment caching extensively, 
   enormous performance improvement
ScreenShot
How to Use Fragment Caching 
     Ignore all but the most frequently accessed pages 
l 
     Look for pieces of the page that don't change on 
l 
     every page view and are expensive to compute 
     Just wrap them in a 
l 
     <% cache('keyname‘) do %> 
         … 
      <% end %> 
     Do timing test before and afterwards; backtrack 
l 
     unless significant performance gains 
     We see > 10X
l 
Expiring fragments, 1. Time based 
l  You should really use memcached for storing 
    fragments 
       Better performance 
    l 
    l  Easier to scale to multiple servers 
    l  Most important: allows time­based expiration 
l  Use plugin http://agilewebdevelopment.com/plugins/memcache_fragments_with_time_expiry 
l  Dead easy: 
     <% cache 'keyname‘, :expire => 10.minutes do %> 
           ... 
    <% end %>
Expiring fragments, 2. Manually 

l  No need to serve stale data 

l  Just use: 

  Cache.delete(quot;fragment:/partials/whateverquot;) 
l  Clear fragments whenever data changes 

l  Again, easier with memcached
Traffic Analytics 
l  Google Analytics is nice, but there are a lot of 
  reasons to roll your own traffic analytics too 
       Can be much more powerful 
  l 

       You can write SQL to answer arbitrary questions 
  l 

       Can expose to users
  l 
Scribd’s analytics 
(screenshots)
Building traffic analytics, part 1 
     create_table “page_views” do |t| 
l 

          t.column “user_id”, :integer 
          t.column “request_url”, :string, :limit => 200 
          t.column “session”, :string, :limit => 32 
          t.column “ip_address”, :string, :limit => 16 
          t.column “referer”, :string, :limit => 200 
          t.column “user_agent”, :string, :limit => 200 
          t.column “created_at”, :timestamp 
     end 
     Add a whole bunch of indexes, depending on queries
l 
Building traffic analytics, part 2 

l  Create a PageView on every request 

l  We used a hand­built SQL query to take out 
   the ActiveRecord overhead on this 
l  Might try MySQL’s “insert delayed” 

l  Analytics queries are usually hand­coded 
   SQL 
l  Use “explain select” to make sure MySQL is 
   using the indexes you expect
Building Traffic Analytics, part 3 

l  Scales pretty well 

l  BUT analytics queries expensive, can clog up 
   main DB server 
l  Our solution: 
       use two DB servers in a master/slave setup 
  l 

       move all the analytics queries to the slave
  l 
Rails with multiple databases, part 1 
     quot;At this point in time there’s no facility in Rails to talk 
l 
     to more than one database at a time.quot; ­ Alex Payne, 
     Twitter developer 
     Well that's true 
l 
     But setting things up yourself is about 10 lines of 
l 
     code. 
     There are now also two great plugins for doing this: 
l 
     Magic multi­connections 
     http://magicmodels.rubyforge.org/magic_multi_conn 
     ections/ 
     Acts as read onlyable­ 
     http://rubyforge.org/frs/?group_id=3451
Rails with multiple databases, part 2 

l  At Scribd we use this to send pre­defined 
   expensive queries to a slave 
l  This can be very important for dealing with 
   lock contention issues 
l  You could also do automatic load balancing, 
   but synchronization becomes more 
   complicated (read a SQL book, not a Rails 
   issue)
Rails with multiple databases, code 
     In database.yml 
l 
     slave1: 
     host: 18.48.43.29  # your slave’s IP 
     database: production 
     username: root 
     password: pass 
     Define a model Slave1.rb 
l 
class Slave1 < ActiveRecord::Base 
   self.abstract_class = true 
   establish_connection :slave1 
end 
     When you need to run a query on the slave, just do 
l 
     Slave1.connection.execute(quot;select * from some_tablequot;)
Shameless Self­Promotion 
l  Scribd.com: VC­backed and hiring 

l  Just 3 people so far! >10 by end of year. 

l  Awesome salary/equity combination 

l  If you’re reading this, you’re probably the 
   right kind of person 
l  Building the world's largest open document 
   library 
l  Email: hackers@scribd.com

More Related Content

Viewers also liked

Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
Shane Johnson
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
AbDul ThaYyal
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
Ashish Kumar
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
Manish Singh
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
Rupsee
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
karan2190
 

Viewers also liked (17)

Distributed System Management
Distributed System ManagementDistributed System Management
Distributed System Management
 
Consistency Models in New Generation Databases
Consistency Models in New Generation DatabasesConsistency Models in New Generation Databases
Consistency Models in New Generation Databases
 
Consistency in Distributed Systems
Consistency in Distributed SystemsConsistency in Distributed Systems
Consistency in Distributed Systems
 
Chap 4
Chap 4Chap 4
Chap 4
 
Distributed systems and consistency
Distributed systems and consistencyDistributed systems and consistency
Distributed systems and consistency
 
The elements of scale
The elements of scaleThe elements of scale
The elements of scale
 
Scaling up food safety information transparency
Scaling up food safety information transparencyScaling up food safety information transparency
Scaling up food safety information transparency
 
3. distributed file system requirements
3. distributed file system requirements3. distributed file system requirements
3. distributed file system requirements
 
3. challenges
3. challenges3. challenges
3. challenges
 
Client-centric Consistency Models
Client-centric Consistency ModelsClient-centric Consistency Models
Client-centric Consistency Models
 
Distributed shared memory shyam soni
Distributed shared memory shyam soniDistributed shared memory shyam soni
Distributed shared memory shyam soni
 
message passing
 message passing message passing
message passing
 
Transparency - The Double-Edged Sword
Transparency - The Double-Edged SwordTransparency - The Double-Edged Sword
Transparency - The Double-Edged Sword
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
Distributed & parallel system
Distributed & parallel systemDistributed & parallel system
Distributed & parallel system
 
Distributed Systems
Distributed SystemsDistributed Systems
Distributed Systems
 
Unit 1 architecture of distributed systems
Unit 1 architecture of distributed systemsUnit 1 architecture of distributed systems
Unit 1 architecture of distributed systems
 

Similar to Scaling Scribd

Gmr Highload Presentation Revised
Gmr Highload Presentation RevisedGmr Highload Presentation Revised
Gmr Highload Presentation Revised
Ontico
 
Gmr Highload Presentation
Gmr Highload PresentationGmr Highload Presentation
Gmr Highload Presentation
Ontico
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
elliando dias
 
Django The Fun Framework
Django   The Fun FrameworkDjango   The Fun Framework
Django The Fun Framework
Yekmer Simsek
 
jQuery SUG Group Introduction
jQuery SUG Group IntroductionjQuery SUG Group Introduction
jQuery SUG Group Introduction
Andrew Chalkley
 
Single Page WebApp Architecture
Single Page WebApp ArchitectureSingle Page WebApp Architecture
Single Page WebApp Architecture
Morgan Cheng
 

Similar to Scaling Scribd (20)

Blueprint talk at Open Hackday London 2009
Blueprint talk at Open Hackday London 2009Blueprint talk at Open Hackday London 2009
Blueprint talk at Open Hackday London 2009
 
Happy Coding with Ruby on Rails
Happy Coding with Ruby on RailsHappy Coding with Ruby on Rails
Happy Coding with Ruby on Rails
 
Gmr Highload Presentation Revised
Gmr Highload Presentation RevisedGmr Highload Presentation Revised
Gmr Highload Presentation Revised
 
Gmr Highload Presentation
Gmr Highload PresentationGmr Highload Presentation
Gmr Highload Presentation
 
Using Wordpress 2009 04 29
Using Wordpress 2009 04 29Using Wordpress 2009 04 29
Using Wordpress 2009 04 29
 
High Performance Kick Ass Web Apps (JavaScript edition)
High Performance Kick Ass Web Apps (JavaScript edition)High Performance Kick Ass Web Apps (JavaScript edition)
High Performance Kick Ass Web Apps (JavaScript edition)
 
Capybara with Rspec
Capybara with RspecCapybara with Rspec
Capybara with Rspec
 
JSON Viewer XPATH Workbook
JSON Viewer XPATH WorkbookJSON Viewer XPATH Workbook
JSON Viewer XPATH Workbook
 
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey IntroductionseyCwinters Intro To Rest And JerREST and Jersey Introductionsey
Cwinters Intro To Rest And JerREST and Jersey Introductionsey
 
Django The Fun Framework
Django   The Fun FrameworkDjango   The Fun Framework
Django The Fun Framework
 
Sinatra
SinatraSinatra
Sinatra
 
Intro To Django
Intro To DjangoIntro To Django
Intro To Django
 
Oscon 20080724
Oscon 20080724Oscon 20080724
Oscon 20080724
 
jQuery SUG Group Introduction
jQuery SUG Group IntroductionjQuery SUG Group Introduction
jQuery SUG Group Introduction
 
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
Ruby on Rails Performance Tuning. Make it faster, make it better (WindyCityRa...
 
Windy cityrails performance_tuning
Windy cityrails performance_tuningWindy cityrails performance_tuning
Windy cityrails performance_tuning
 
Single Page WebApp Architecture
Single Page WebApp ArchitectureSingle Page WebApp Architecture
Single Page WebApp Architecture
 
Shifting Gears
Shifting GearsShifting Gears
Shifting Gears
 
Pump up the JAM with Gatsby
Pump up the JAM with GatsbyPump up the JAM with Gatsby
Pump up the JAM with Gatsby
 
Sinatra
SinatraSinatra
Sinatra
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 

Scaling Scribd