SlideShare a Scribd company logo
Billions of Hits:
                         Scaling Twitter
                         John Adams
                         Twitter Operations




Wednesday, May 5, 2010
John Adams                        @netik
                         •   Early Twitter employee (mid-2008)

                         •   Lead engineer: Outward Facing Services
                             (Apache, Unicorn, SMTP), Auth, Security

                         •   Keynote Speaker: O’Reilly Velocity 2009, 2010

                         •   Previous companies: Inktomi, Apple, c|net




Wednesday, May 5, 2010
Wednesday, May 5, 2010
Growth.

Wednesday, May 5, 2010
752%
                         2008 Growth
  source: comscore.com - (based only on www traffic, not API)


Wednesday, May 5, 2010
1358%
                         2009 Growth
  source: comscore.com - (based only on www traffic, not API)


Wednesday, May 5, 2010
12 th
                         most popular
 source: alexa.com (global ranking)


Wednesday, May 5, 2010
55M
                         Tweets per day
                         (640 TPS/sec, 1000 TPS/sec peak)
  source: twitter.com internal


Wednesday, May 5, 2010
600M
                         Searches/Day
  source: twitter.com internal


Wednesday, May 5, 2010
Web      API




Wednesday, May 5, 2010
25%




                Web                  API
                               75%




Wednesday, May 5, 2010
Operations
                         •   What do we do?

                             •   Site Availability

                             •   Capacity Planning (metrics-driven)

                             •   Configuration Management

                             •   Security

                             •   Much more than basic Sysadmin


Wednesday, May 5, 2010
What have we done?
                         •   Improved response time, reduced latency

                         •   Less errors during deploys (Unicorn!)

                         •   Faster performance

                         •   Lower MTTD (Mean time to Detect)

                         •   Lower MTTR (Mean time to Recovery)

                         •   We are an advocate to developers

Wednesday, May 5, 2010
Operations Mantra

                                          Move to
                  Find       Take
                                           Next
                 Weakest   Corrective
                                          Weakest
                  Point     Action
                                           Point


       Metrics +
    Logs + Science =        Process     Repeatability
       Analysis

Wednesday, May 5, 2010
Make an attack plan.
                 Symptom     Bottleneck   Vector     Solution

                                           HTTP
               Congestion    Network                More LB’s
                                          Latency
                  Timeline                Update      Better
                              Storage
                   Delay                   Delay    algorithm
                   Status                             Flock
                             Database     Delays
                  Growth                            Cassandra
                   Updates   Algorithm    Latency   Algorithms



Wednesday, May 5, 2010
Make an attack plan.
                 Symptom     Bottleneck   Vector     Solution

                                           HTTP
               Congestion    Network                More LB’s
                                          Latency
                  Timeline                Update      Better
                              Storage
                   Delay                   Delay    algorithm
                   Status                             Flock
                             Database     Delays
                  Growth                            Cassandra
                   Updates   Algorithm    Latency   Algorithms



Wednesday, May 5, 2010
Finding Weakness
                         •   Metrics + Graphs

                             •   Individual metrics are irrelevant

                             •   We aggregate metrics to find knowledge

                         •   Logs

                         •   SCIENCE!



Wednesday, May 5, 2010
Monitoring
                         •   Twitter graphs and reports critical metrics in
                             as near real time as possible

                         •   If you build tools against our API, you should
                             too.

                             •   RRD, other Time-Series DB solutions

                             •   Ganglia + custom gmetric scripts

                         •   dev.twitter.com - API availability

Wednesday, May 5, 2010
Analyze
                         •   Turn data into information

                             •   Where is the code base going?

                             •   Are things worse than they were?

                                 •   Understand the impact of the last software
                                     deploy

                                 •   Run check scripts during and after deploys

                         •   Capacity Planning, not Fire Fighting!

Wednesday, May 5, 2010
Data Analysis
                         •   Instrumenting the world pays off.

                         •   “Data analysis, visualization, and other
                             techniques for seeing patterns in data are
                             going to be an increasingly valuable skill set.
                             Employers take notice!”
                                   “Web Squared: Web 2.0 Five Years On”, Tim O’Reilly, Web 2.0 Summit, 2009




Wednesday, May 5, 2010
A New World for Admins
                         •   You’re not just a sysadmin anymore

                         •   Analytics - Graph what you can

                         •   Math, Analysis, Prediction, Linear Regression

                         •   For everyone, not just big sites.

                         •   You can do fantastic things.



Wednesday, May 5, 2010
Forecasting                        Curve-fitting for capacity planning
                                        (R, fityk, Mathematica, CurveFit)



                              unsigned int (32 bit)
                                Twitpocolypse



                  status_id

                                                      signed int (32 bit)
                                                        Twitpocolypse




                                                                  r2=0.99




Wednesday, May 5, 2010
Internal Dashboard




Wednesday, May 5, 2010
External API Dashbord




                         http://dev.twitter.com/status
Wednesday, May 5, 2010
What’s a Robot ?
                         •   Actual error in the Rails stack (HTTP 500)

                         •   Uncaught Exception

                         •   Code problem, or failure / nil result

                         •   Increases our exception count

                         •   Shows up in Reports

                         •   We’re on it!

Wednesday, May 5, 2010
What’s a Whale ?
                         •   HTTP Error 502, 503

                         •   Twitter has a hard and fast five second timeout

                         •   We’d rather fail fast than block on requests

                         •   Death of a long-running query (mkill)

                         •   Timeout



Wednesday, May 5, 2010
Whale Watcher
     •      Simple shell script,

           •      MASSIVE WIN by @ronpepsi

     •      Whale = HTTP 503 (timeout)

     •      Robot = HTTP 500 (error)

     •      Examines last 60 seconds of
            aggregated daemon / www logs

     •      “Whales per Second” > Wthreshold

           •      Thar be whales! Call in ops.




Wednesday, May 5, 2010
Deploy Watcher
                  Sample window: 300.0 seconds

                  First start time:
                  Mon Apr 5 15:30:00 2010 (Mon Apr   5 08:30:00 PDT 2010)
                  Second start time:
                  Tue Apr 6 02:09:40 2010 (Mon Apr   5 19:09:40 PDT 2010)

                  PRODUCTION APACHE: ALL OK
                  PRODUCTION OTHER: ALL OK
                  WEB0049 CANARY APACHE: ALL OK
                  WEB0049 CANARY BACKEND SERVICES: ALL OK
                  DAEMON0031 CANARY BACKEND SERVICES: ALL OK
                  DAEMON0031 CANARY OTHER: ALL OK



Wednesday, May 5, 2010
Feature “Darkmode”
                         •   Specific site controls to enable and disable
                             computationally or IO-Heavy site function

                         •   The “Emergency Stop” button

                         •   Changes logged and reported to all teams

                         •   Around 60 switches we can throw

                         •   Static / Read-only mode


Wednesday, May 5, 2010
request flow
                                   Load Balancers

                                 Apache mod_proxy

                                   Rails (Unicorn)

                         Flock      memcached        Kestrel

                                 MySQL      Cassandra

                                     Daemons
Wednesday, May 5, 2010
unicorn
                         •   A single socket Rails application Server

                             •   Workers pull worker, vs. Apache pushing work.

                         •   Zero Downtime Deploys (!)

                             •   Controlled, shuffled transfer to new code

                         •   Less memory, 30% less CPU

                         •   Shift from mod_proxy_balancer to mod_proxy_pass

                             •   HAProxy, Ngnix wasn’t any better. really.


Wednesday, May 5, 2010
Rails
                         •   Mostly only for front-end and mobile

                         •   Back end mostly Java, Scala, and Pure Ruby

                         •   Not to blame for our issues. Analysis found:

                             •   Caching + Cache invalidation problems

                             •   Bad queries generated by ActiveRecord, resulting in
                                 slow queries against the db

                             •   Queue Latency

                         •   Replication Lag

Wednesday, May 5, 2010
memcached
                         •   memcached isn’t perfect.

                             •   Memcached SEGVs hurt us early on.

                         •   Evictions make the cache unreliable for
                             important configuration data
                             (loss of darkmode flags, for example)

                         •   Network Memory Bus isn’t infinite

                         •   Segmented into pools for better performance

Wednesday, May 5, 2010
Loony
                         •   Central machine database (MySQL)

                             •   Python, Django, Paraminko SSH

                                 •   Paraminko - Twitter OSS (@robey)

                             •   Ties into LDAP groups

                         •   When data center sends us email, machine
                             definitions built in real-time


Wednesday, May 5, 2010
Murder
                         •   @lg rocks!

                         •   Bittorrent based replication for deploys

                         •   ~30-60 seconds to update >1k machines

                         •   P2P - Legal, valid, Awesome.




Wednesday, May 5, 2010
Kestrel
                   •     @robey

                   •     Works like memcache (same protocol)

                   •     SET = enqueue | GET = dequeue

                   •     No strict ordering of jobs

                   •     No shared state between servers

                   •     Written in Scala.

Wednesday, May 5, 2010
Asynchronous Requests
                         •   Inbound traffic consumes a unicorn worker

                         •   Outbound traffic consumes a unicorn worker

                         •   The request pipeline should not be used to
                             handle 3rd party communications or
                             back-end work.

                         •   Reroute traffic to daemons



Wednesday, May 5, 2010
Daemons
                         •   Daemons touch every tweet

                         •   Many different daemon types at Twitter

                         •   Old way: One daemon per type (Rails)

                             •   New way: Fewer Daemons (Pure Ruby)

                         •   Daemon Slayer - A Multi Daemon that could
                             do many different jobs, all at once.


Wednesday, May 5, 2010
Disk is the new Tape.
                         •   Social Networking application profile has
                             many O(ny) operations.

                         •   Page requests have to happen in < 500mS or
                             users start to notice. Goal: 250-300mS

                         •   Web 2.0 isn’t possible without lots of RAM

                         •   SSDs? What to do?



Wednesday, May 5, 2010
Caching
                         •   We’re the real-time web, but lots of caching
                             opportunity. You should cache what you get from us.

                         •   Most caching strategies rely on long TTLs (>60 s)

                         •   Separate memcache pools for different data types to
                             prevent eviction

                         •   Optimize Ruby Gem to libmemcached + FNV Hash
                             instead of Ruby + MD5

                         •   Twitter now largest contributor to libmemcached


Wednesday, May 5, 2010
MySQL
                         •   Sharding large volumes of data is hard

                         •   Replication delay and cache eviction produce
                             inconsistent results to the end user.

                         •   Locks create resource contention for popular
                             data




Wednesday, May 5, 2010
MySQL Challenges
                         •   Replication Delay

                             •   Single threaded. Slow.

                         •   Social Networking not good for RDBMS

                             •   N x N relationships and social graph / tree
                                 traversal

                             •   Disk issues (FS Choice, noatime, scheduling
                                 algorithm)

Wednesday, May 5, 2010
Relational Databases
                         not a Panacea
                         •   Good for:

                             •   Users, Relational Data, Transactions

                         •   Bad:

                             •   Queues. Polling operations. Social Graph.

                         •   You don’t need ACID for everything.



Wednesday, May 5, 2010
Database Replication
                         •   Major issues around users and statuses tables

                         •   Multiple functional masters (FRP, FWP)

                         •   Make sure your code reads and writes to the
                             write DBs. Reading from master = slow death

                             •   Monitor the DB. Find slow / poorly designed
                                 queries

                         •   Kill long running queries before they kill you
                             (mkill)

Wednesday, May 5, 2010
Flock
                                                       Flock
          •      Scalable Social Graph Store

          •      Sharding via Gizzard
                                                       Gizzard
          •      MySQL backend (many.)

          •      13 billion edges,
                 100K reads/second
                                               Mysql   Mysql     Mysql
          •      Recently Open Sourced




Wednesday, May 5, 2010
Cassandra
                         •   Originally written by Facebook

                         •   Distributed Data Store

                         •   @rk’s changes to Cassandra Open Sourced

                         •   Currently double-writing into it

                         •   Transitioning to 100% soon.



Wednesday, May 5, 2010
Lessons Learned
                         •   Instrument everything. Start graphing early.

                         •   Cache as much as possible

                         •   Start working on scaling early.

                         •   Don’t rely on memcache, and don’t rely on the
                             database

                         •   Don’t use mongrel. Use Unicorn.


Wednesday, May 5, 2010
Join Us!
               @jointheflock




Wednesday, May 5, 2010
Q&A
Wednesday, May 5, 2010
Thanks!
                         •   @jointheflock

                         •   http://twitter.com/jobs

                         •   Download our work

                             •   http://twitter.com/about/opensource




Wednesday, May 5, 2010

More Related Content

What's hot

A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
Nathan Bijnens
 
Diagnosing MySQL performance problems
Diagnosing  MySQL performance problemsDiagnosing  MySQL performance problems
Diagnosing MySQL performance problems
Justin Swanhart
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
Josh Patterson
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
Giivee The
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
Nathan Bijnens
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
David Pilato
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
David Pilato
 
Coscup
CoscupCoscup
Coscup
Giivee The
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
Supakit Kiatrungrit
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
Josh Patterson
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
Travis Oliphant
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
Uri Laserson
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
nathanmarz
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
Josh Patterson
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
Paco Nathan
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Databricks
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
Nathan Bijnens
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
Nathan Bijnens
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
Noam Shaish
 

What's hot (19)

A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013A real time architecture using Hadoop and Storm @ FOSDEM 2013
A real time architecture using Hadoop and Storm @ FOSDEM 2013
 
Diagnosing MySQL performance problems
Diagnosing  MySQL performance problemsDiagnosing  MySQL performance problems
Diagnosing MySQL performance problems
 
How to Build Deep Learning Models
How to Build Deep Learning ModelsHow to Build Deep Learning Models
How to Build Deep Learning Models
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
a real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxxa real-time architecture using Hadoop and Storm at Devoxx
a real-time architecture using Hadoop and Storm at Devoxx
 
Managing your Black Friday Logs NDC Oslo
Managing your  Black Friday Logs NDC OsloManaging your  Black Friday Logs NDC Oslo
Managing your Black Friday Logs NDC Oslo
 
Managing your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed LuxembourgManaging your black friday logs Voxxed Luxembourg
Managing your black friday logs Voxxed Luxembourg
 
Coscup
CoscupCoscup
Coscup
 
DrupalCon 2011 Highlight
DrupalCon 2011 HighlightDrupalCon 2011 Highlight
DrupalCon 2011 Highlight
 
Deep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the EnterpriseDeep Learning and Recurrent Neural Networks in the Enterprise
Deep Learning and Recurrent Neural Networks in the Enterprise
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)Python in the Hadoop Ecosystem (Rock Health presentation)
Python in the Hadoop Ecosystem (Rock Health presentation)
 
Runaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop itRunaway complexity in Big Data... and a plan to stop it
Runaway complexity in Big Data... and a plan to stop it
 
Deep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVecDeep Learning: DL4J and DataVec
Deep Learning: DL4J and DataVec
 
QCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark StreamingQCon São Paulo: Real-Time Analytics with Spark Streaming
QCon São Paulo: Real-Time Analytics with Spark Streaming
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
 
Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013Microsoft Big Data @ SQLUG 2013
Microsoft Big Data @ SQLUG 2013
 
A real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX LondonA real-time architecture using Hadoop and Storm @ JAX London
A real-time architecture using Hadoop and Storm @ JAX London
 
Spark streaming
Spark streamingSpark streaming
Spark streaming
 

Similar to Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)

The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
Treasure Data, Inc.
 
Treasure Data and Heroku
Treasure Data and HerokuTreasure Data and Heroku
Treasure Data and Heroku
Treasure Data, Inc.
 
Proactive Monitoring: Playing Offense for the Win
Proactive Monitoring: Playing Offense for the WinProactive Monitoring: Playing Offense for the Win
Proactive Monitoring: Playing Offense for the Win
Deborah Schalm
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
Jayant Mukherjee
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
David Walker
 
Fixing Twitter Velocity2009
Fixing Twitter Velocity2009Fixing Twitter Velocity2009
Fixing Twitter Velocity2009
John Adams
 
Realtime search at Yammer
Realtime search at YammerRealtime search at Yammer
Realtime search at Yammer
Boris Aleksandrovsky
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
Lucidworks (Archived)
 
Real-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky BorisReal-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky Boris
lucenerevolution
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
xlight
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
Roger Xia
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
liujianrong
 
Using Machine Learning on K8s Logs to Find Root Cause Faster
Using Machine Learning on K8s Logs to Find Root Cause FasterUsing Machine Learning on K8s Logs to Find Root Cause Faster
Using Machine Learning on K8s Logs to Find Root Cause Faster
LibbySchulze
 
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...
Patrick Chanezon
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational Intelligence
Inside Analysis
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
parker01
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
Peter Skomoroch
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
Alex Kozhemiakin
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
didip
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
oscon2007
 

Similar to Billions of hits: Scaling Twitter (Web 2.0 Expo, SF) (20)

The architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWSThe architecture of data analytics PaaS on AWS
The architecture of data analytics PaaS on AWS
 
Treasure Data and Heroku
Treasure Data and HerokuTreasure Data and Heroku
Treasure Data and Heroku
 
Proactive Monitoring: Playing Offense for the Win
Proactive Monitoring: Playing Offense for the WinProactive Monitoring: Playing Offense for the Win
Proactive Monitoring: Playing Offense for the Win
 
Big Data & Hadoop Introduction
Big Data & Hadoop IntroductionBig Data & Hadoop Introduction
Big Data & Hadoop Introduction
 
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - PresentationIOUG93 - Technical Architecture for the Data Warehouse - Presentation
IOUG93 - Technical Architecture for the Data Warehouse - Presentation
 
Fixing Twitter Velocity2009
Fixing Twitter Velocity2009Fixing Twitter Velocity2009
Fixing Twitter Velocity2009
 
Realtime search at Yammer
Realtime search at YammerRealtime search at Yammer
Realtime search at Yammer
 
Real Time Search at Yammer
Real Time Search at YammerReal Time Search at Yammer
Real Time Search at Yammer
 
Real-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky BorisReal-time Search at Yammer - By Aleksandrovsky Boris
Real-time Search at Yammer - By Aleksandrovsky Boris
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Using Machine Learning on K8s Logs to Find Root Cause Faster
Using Machine Learning on K8s Logs to Find Root Cause FasterUsing Machine Learning on K8s Logs to Find Root Cause Faster
Using Machine Learning on K8s Logs to Find Root Cause Faster
 
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...
AFCEA C4I Symposium: The 4th C in C4I Stands for Cloud:Factors Driving Adopti...
 
Agile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational IntelligenceAgile Data Rationalization for Operational Intelligence
Agile Data Rationalization for Operational Intelligence
 
Alex Wade, Digital Library Interoperability
Alex Wade, Digital Library InteroperabilityAlex Wade, Digital Library Interoperability
Alex Wade, Digital Library Interoperability
 
O'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data ExhaustO'Reilly Strata: Distilling Data Exhaust
O'Reilly Strata: Distilling Data Exhaust
 
Enterprise Search @EPAM
Enterprise Search @EPAMEnterprise Search @EPAM
Enterprise Search @EPAM
 
Super Sizing Youtube with Python
Super Sizing Youtube with PythonSuper Sizing Youtube with Python
Super Sizing Youtube with Python
 
Os Solomon
Os SolomonOs Solomon
Os Solomon
 

Recently uploaded

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
Jason Yip
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
ScyllaDB
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
Pablo Gómez Abajo
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
DanBrown980551
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
christinelarrosa
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
Fwdays
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
ScyllaDB
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
BibashShahi
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
operationspcvita
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
Fwdays
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
AlexanderRichford
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
leebarnesutopia
 

Recently uploaded (20)

[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
[OReilly Superstream] Occupy the Space: A grassroots guide to engineering (an...
 
Discover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched ContentDiscover the Unseen: Tailored Recommendation of Unwatched Content
Discover the Unseen: Tailored Recommendation of Unwatched Content
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Mutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented ChatbotsMutation Testing for Task-Oriented Chatbots
Mutation Testing for Task-Oriented Chatbots
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
LF Energy Webinar: Carbon Data Specifications: Mechanisms to Improve Data Acc...
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptxPRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
PRODUCT LISTING OPTIMIZATION PRESENTATION.pptx
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba"NATO Hackathon Winner: AI-Powered Drug Search",  Taras Kloba
"NATO Hackathon Winner: AI-Powered Drug Search", Taras Kloba
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
ScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking ReplicationScyllaDB Tablets: Rethinking Replication
ScyllaDB Tablets: Rethinking Replication
 
Principle of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptxPrinciple of conventional tomography-Bibash Shahi ppt..pptx
Principle of conventional tomography-Bibash Shahi ppt..pptx
 
The Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptxThe Microsoft 365 Migration Tutorial For Beginner.pptx
The Microsoft 365 Migration Tutorial For Beginner.pptx
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w..."What does it really mean for your system to be available, or how to define w...
"What does it really mean for your system to be available, or how to define w...
 
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
QR Secure: A Hybrid Approach Using Machine Learning and Security Validation F...
 
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance PanelsNorthern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
Northern Engraving | Modern Metal Trim, Nameplates and Appliance Panels
 
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdfLee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
Lee Barnes - Path to Becoming an Effective Test Automation Engineer.pdf
 

Billions of hits: Scaling Twitter (Web 2.0 Expo, SF)

  • 1. Billions of Hits: Scaling Twitter John Adams Twitter Operations Wednesday, May 5, 2010
  • 2. John Adams @netik • Early Twitter employee (mid-2008) • Lead engineer: Outward Facing Services (Apache, Unicorn, SMTP), Auth, Security • Keynote Speaker: O’Reilly Velocity 2009, 2010 • Previous companies: Inktomi, Apple, c|net Wednesday, May 5, 2010
  • 5. 752% 2008 Growth source: comscore.com - (based only on www traffic, not API) Wednesday, May 5, 2010
  • 6. 1358% 2009 Growth source: comscore.com - (based only on www traffic, not API) Wednesday, May 5, 2010
  • 7. 12 th most popular source: alexa.com (global ranking) Wednesday, May 5, 2010
  • 8. 55M Tweets per day (640 TPS/sec, 1000 TPS/sec peak) source: twitter.com internal Wednesday, May 5, 2010
  • 9. 600M Searches/Day source: twitter.com internal Wednesday, May 5, 2010
  • 10. Web API Wednesday, May 5, 2010
  • 11. 25% Web API 75% Wednesday, May 5, 2010
  • 12. Operations • What do we do? • Site Availability • Capacity Planning (metrics-driven) • Configuration Management • Security • Much more than basic Sysadmin Wednesday, May 5, 2010
  • 13. What have we done? • Improved response time, reduced latency • Less errors during deploys (Unicorn!) • Faster performance • Lower MTTD (Mean time to Detect) • Lower MTTR (Mean time to Recovery) • We are an advocate to developers Wednesday, May 5, 2010
  • 14. Operations Mantra Move to Find Take Next Weakest Corrective Weakest Point Action Point Metrics + Logs + Science = Process Repeatability Analysis Wednesday, May 5, 2010
  • 15. Make an attack plan. Symptom Bottleneck Vector Solution HTTP Congestion Network More LB’s Latency Timeline Update Better Storage Delay Delay algorithm Status Flock Database Delays Growth Cassandra Updates Algorithm Latency Algorithms Wednesday, May 5, 2010
  • 16. Make an attack plan. Symptom Bottleneck Vector Solution HTTP Congestion Network More LB’s Latency Timeline Update Better Storage Delay Delay algorithm Status Flock Database Delays Growth Cassandra Updates Algorithm Latency Algorithms Wednesday, May 5, 2010
  • 17. Finding Weakness • Metrics + Graphs • Individual metrics are irrelevant • We aggregate metrics to find knowledge • Logs • SCIENCE! Wednesday, May 5, 2010
  • 18. Monitoring • Twitter graphs and reports critical metrics in as near real time as possible • If you build tools against our API, you should too. • RRD, other Time-Series DB solutions • Ganglia + custom gmetric scripts • dev.twitter.com - API availability Wednesday, May 5, 2010
  • 19. Analyze • Turn data into information • Where is the code base going? • Are things worse than they were? • Understand the impact of the last software deploy • Run check scripts during and after deploys • Capacity Planning, not Fire Fighting! Wednesday, May 5, 2010
  • 20. Data Analysis • Instrumenting the world pays off. • “Data analysis, visualization, and other techniques for seeing patterns in data are going to be an increasingly valuable skill set. Employers take notice!” “Web Squared: Web 2.0 Five Years On”, Tim O’Reilly, Web 2.0 Summit, 2009 Wednesday, May 5, 2010
  • 21. A New World for Admins • You’re not just a sysadmin anymore • Analytics - Graph what you can • Math, Analysis, Prediction, Linear Regression • For everyone, not just big sites. • You can do fantastic things. Wednesday, May 5, 2010
  • 22. Forecasting Curve-fitting for capacity planning (R, fityk, Mathematica, CurveFit) unsigned int (32 bit) Twitpocolypse status_id signed int (32 bit) Twitpocolypse r2=0.99 Wednesday, May 5, 2010
  • 24. External API Dashbord http://dev.twitter.com/status Wednesday, May 5, 2010
  • 25. What’s a Robot ? • Actual error in the Rails stack (HTTP 500) • Uncaught Exception • Code problem, or failure / nil result • Increases our exception count • Shows up in Reports • We’re on it! Wednesday, May 5, 2010
  • 26. What’s a Whale ? • HTTP Error 502, 503 • Twitter has a hard and fast five second timeout • We’d rather fail fast than block on requests • Death of a long-running query (mkill) • Timeout Wednesday, May 5, 2010
  • 27. Whale Watcher • Simple shell script, • MASSIVE WIN by @ronpepsi • Whale = HTTP 503 (timeout) • Robot = HTTP 500 (error) • Examines last 60 seconds of aggregated daemon / www logs • “Whales per Second” > Wthreshold • Thar be whales! Call in ops. Wednesday, May 5, 2010
  • 28. Deploy Watcher Sample window: 300.0 seconds First start time: Mon Apr 5 15:30:00 2010 (Mon Apr 5 08:30:00 PDT 2010) Second start time: Tue Apr 6 02:09:40 2010 (Mon Apr 5 19:09:40 PDT 2010) PRODUCTION APACHE: ALL OK PRODUCTION OTHER: ALL OK WEB0049 CANARY APACHE: ALL OK WEB0049 CANARY BACKEND SERVICES: ALL OK DAEMON0031 CANARY BACKEND SERVICES: ALL OK DAEMON0031 CANARY OTHER: ALL OK Wednesday, May 5, 2010
  • 29. Feature “Darkmode” • Specific site controls to enable and disable computationally or IO-Heavy site function • The “Emergency Stop” button • Changes logged and reported to all teams • Around 60 switches we can throw • Static / Read-only mode Wednesday, May 5, 2010
  • 30. request flow Load Balancers Apache mod_proxy Rails (Unicorn) Flock memcached Kestrel MySQL Cassandra Daemons Wednesday, May 5, 2010
  • 31. unicorn • A single socket Rails application Server • Workers pull worker, vs. Apache pushing work. • Zero Downtime Deploys (!) • Controlled, shuffled transfer to new code • Less memory, 30% less CPU • Shift from mod_proxy_balancer to mod_proxy_pass • HAProxy, Ngnix wasn’t any better. really. Wednesday, May 5, 2010
  • 32. Rails • Mostly only for front-end and mobile • Back end mostly Java, Scala, and Pure Ruby • Not to blame for our issues. Analysis found: • Caching + Cache invalidation problems • Bad queries generated by ActiveRecord, resulting in slow queries against the db • Queue Latency • Replication Lag Wednesday, May 5, 2010
  • 33. memcached • memcached isn’t perfect. • Memcached SEGVs hurt us early on. • Evictions make the cache unreliable for important configuration data (loss of darkmode flags, for example) • Network Memory Bus isn’t infinite • Segmented into pools for better performance Wednesday, May 5, 2010
  • 34. Loony • Central machine database (MySQL) • Python, Django, Paraminko SSH • Paraminko - Twitter OSS (@robey) • Ties into LDAP groups • When data center sends us email, machine definitions built in real-time Wednesday, May 5, 2010
  • 35. Murder • @lg rocks! • Bittorrent based replication for deploys • ~30-60 seconds to update >1k machines • P2P - Legal, valid, Awesome. Wednesday, May 5, 2010
  • 36. Kestrel • @robey • Works like memcache (same protocol) • SET = enqueue | GET = dequeue • No strict ordering of jobs • No shared state between servers • Written in Scala. Wednesday, May 5, 2010
  • 37. Asynchronous Requests • Inbound traffic consumes a unicorn worker • Outbound traffic consumes a unicorn worker • The request pipeline should not be used to handle 3rd party communications or back-end work. • Reroute traffic to daemons Wednesday, May 5, 2010
  • 38. Daemons • Daemons touch every tweet • Many different daemon types at Twitter • Old way: One daemon per type (Rails) • New way: Fewer Daemons (Pure Ruby) • Daemon Slayer - A Multi Daemon that could do many different jobs, all at once. Wednesday, May 5, 2010
  • 39. Disk is the new Tape. • Social Networking application profile has many O(ny) operations. • Page requests have to happen in < 500mS or users start to notice. Goal: 250-300mS • Web 2.0 isn’t possible without lots of RAM • SSDs? What to do? Wednesday, May 5, 2010
  • 40. Caching • We’re the real-time web, but lots of caching opportunity. You should cache what you get from us. • Most caching strategies rely on long TTLs (>60 s) • Separate memcache pools for different data types to prevent eviction • Optimize Ruby Gem to libmemcached + FNV Hash instead of Ruby + MD5 • Twitter now largest contributor to libmemcached Wednesday, May 5, 2010
  • 41. MySQL • Sharding large volumes of data is hard • Replication delay and cache eviction produce inconsistent results to the end user. • Locks create resource contention for popular data Wednesday, May 5, 2010
  • 42. MySQL Challenges • Replication Delay • Single threaded. Slow. • Social Networking not good for RDBMS • N x N relationships and social graph / tree traversal • Disk issues (FS Choice, noatime, scheduling algorithm) Wednesday, May 5, 2010
  • 43. Relational Databases not a Panacea • Good for: • Users, Relational Data, Transactions • Bad: • Queues. Polling operations. Social Graph. • You don’t need ACID for everything. Wednesday, May 5, 2010
  • 44. Database Replication • Major issues around users and statuses tables • Multiple functional masters (FRP, FWP) • Make sure your code reads and writes to the write DBs. Reading from master = slow death • Monitor the DB. Find slow / poorly designed queries • Kill long running queries before they kill you (mkill) Wednesday, May 5, 2010
  • 45. Flock Flock • Scalable Social Graph Store • Sharding via Gizzard Gizzard • MySQL backend (many.) • 13 billion edges, 100K reads/second Mysql Mysql Mysql • Recently Open Sourced Wednesday, May 5, 2010
  • 46. Cassandra • Originally written by Facebook • Distributed Data Store • @rk’s changes to Cassandra Open Sourced • Currently double-writing into it • Transitioning to 100% soon. Wednesday, May 5, 2010
  • 47. Lessons Learned • Instrument everything. Start graphing early. • Cache as much as possible • Start working on scaling early. • Don’t rely on memcache, and don’t rely on the database • Don’t use mongrel. Use Unicorn. Wednesday, May 5, 2010
  • 48. Join Us! @jointheflock Wednesday, May 5, 2010
  • 50. Thanks! • @jointheflock • http://twitter.com/jobs • Download our work • http://twitter.com/about/opensource Wednesday, May 5, 2010