Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

How to Build a SaaS App With Twitter-like Throughput on Just 9 Servers

13,108 views

Published on

Velocity Conference 2011 presentation by New Relic CEO Lew Cirne. - New Relic’s multitenant, SaaS web application monitoring service collects and persists over 90,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds. In this presentation Lew Cirne discusses how good architecture and good tools can help you handle an extremely large amount of data while still providing extremely fast service. He shows you how we scale to support customer growth, how we monitor our system, and what traps to look out for.

Published in: Technology

How to Build a SaaS App With Twitter-like Throughput on Just 9 Servers

  1. How to build an app withTwitter-like throughputon just 9 servers...Lew Cirne, Founder & CEO - New Relic
  2. I’m Lew Cirne@sweetlew
  3. What our app doesAPM as a ServiceIn-app agent instrumentation (BCI, etc)150,000+ app processes monitored, globally (10K customers)Each process reports a few hundred metrics per minute5 Languages (Ruby, Java, PHP, .NET, Python)
  4. Each day we collect 20 billion measurements, from 150,000 application processes, for over 10,000 customers.
  5. Each day we collect 20 billion measurements, from 150,000 application processes, for over 10,000 customers. All on 9 servers.
  6. We capture “Timeslices” Each o ne is aboutResponse Time 250 bytes4 hours from 11:04 to 15:04Count: 1242 A single tweetAvg: 337 ms is about theMin: 0.63 msMax: 95669 ms same sizeStd Dev: 782
  7. timeslice insertion rate: 100K/second >7 billion rows per day Twitter peak insertion rate: 8K rows per second 9 Servers handle all data collection
  8. Collecting is one thing...• We provide realtime monitoring• One minute granularity• Data is almost always stale• Each user/account has different data• Page caching and other easy solutions don’t work for us.
  9. Our most popular page... age e Full P Averag Time: Load 2.4 Sec
  10. Our most popular page... age e Full P Averag Time: Load 2.4 Sec
  11. Main App Software stackUser Interface Data Collectors Data Store & REST API MySQL Servlets on Jetty Sharded by accounts Rails 2.3
  12. Simplified architecture... 9 Collector / Aggregator / DB’s Sustained 100K insertion rate per second SCustomer’s environment HTTP 24 Core Intel Nehalem 48 GB RAM SAS attached RAID 5 No Virtualization (either cloud or datacenter) 2 Web App Servers 12 Core Intel Nehalem 48 GB RAM
  13. Even more data!On May 17, we launched Real User Monitoring• Using Episodes to measure browser load time of every page view• Browser reports data to our ‘Beacon’ servers• Monitoring >1 Billion page views per week• Doubled our total inbound HTTP requests in a MONTH
  14. Beacon Architecture Response Time 0.15ms RUM Beacons Real User Asynchronously Browsers Billions of metrics from Servlets Capture and across the globe enqueue (in-memory) aggregate and forward Timeslices to our Collectors Over 1 Billion user sessionsmeasured for performance in first Currently at EC2 month.
  15. Challenges• Data Purging• Determining what to pre-aggregate• Large Accounts• MySQL Optimization and Tuning• I/O performance - (virtualized to dedicated) ...
  16. 5 Lessons Learned
  17. 1. Keep it simple
  18. 2. Less is more
  19. 3. Trendy != Reliable
  20. 4. Plan for scale
  21. s s ode Epi New
 Ja Relic va y ub5. Use the right technology Ngin x Je/y R Rails for a given task
  22. See New RelicMonitor New Relic at our booth

×