from disaster to stability
      the scaling challenges of my.opera.com
                       Surge 2010 – Version 3
1999
                                                                         5,500
                                   Ser...
2001
                                                                         5,500
                                   Ser...
2004
                                                                         5,500
                                   Ser...
2007
                                                                         5,500
                                   Ser...
2009
                                                                         5,500
                                   Ser...
the current beta
the situation
    2007
crashes every day

too many connections!!!

NFS volume of doom

Team?
monitoring
many improvements since then

➔   Efficient filesystem cache

➔   "Dogpile effect" AKA stampeding AKA ...

➔   Persistent ...
code profiling
[DML] time=1237308152, user=,
url=/tinh_yeu_cua_anh_b88/blog/index.dml/tag/...,
name=XWA::User, variable=ac...
top time-intensive modules
XWA::User::Sidebar           2024.919s   (27.2%, 0.28 s/call)
XWA::User                    1778...
many improvements since then

➔   YSlow?

➔   The Expires header is your friend!

➔   Hot MyISAM tables converted to InnoD...
jet profiler
3
scalability
1. avatars
Avatars - 2007

            75%
        /<user-name>/avatar.pl

/<user-name>/avatar.pl?xscale=8192 (!)
Avatars               wtf!?


my $sql = DBConnect('master');

my %user = $sql->get(
  "SELECT a.blob, a.filename,
   FROM ...
Avatars - reloaded
 ➔   Export to balanced fs (5 formats)

 ➔   Zero SQL queries

 ➔   Storage subsystem

 ➔   static.myop...
resources
                  (user uploads, binary blobs, ...)




             Pools
      or single servers




         ...
+                   x
➔   Load             ➔   HTTP::DAV

➔   Flexibility      ➔   Precomp URLs

➔   Static scales!
2. varnish
Varnish
Most popular RSS feeds

My Opera frontpage

Opera Mini approval

Datacenter emergencies
Varnish
Most popular RSS feeds

➔   /desktopteam/blog/

➔   Friends, Groups API

➔   No cookies (remove req.http.cookie)
Varnish
My Opera frontpage

➔   Danger, Will Robinson!

➔   Mangle cookies

➔   Accept-Language headers
Varnish
Opera Mini 5.0 approval

➔   Global coverage

➔   Traffic surge (5x peak, 2x over 24h)
IT NEEDS
            TO BE OUT
            TOMORROW
                 !!!




 THERE
WILL BE A
  PRESS
RELEASE !
Varnish
Opera Mini 5.0 approval

➔   Global coverage

➔   Traffic surge (5x peak, 2x over 24h)

➔   No problems!
Opera Mini “countup” traffic
   Submitted        Approved
   to Apple Store   April, 12th
   March, 23rd
Varnish
Datacenter emergencies
Datacenter emergencies



files.myopera.com




            DC1




                    User Files Storage SAN
Datacenter emergencies



files.myopera.com                       DC2



                LVS + Varnish servers


       DC...
~ 1Gbit/s!   Varnish
+                 x
➔   Load              ➔   Chainsaw!

➔   Flexibility       ➔   Purging

➔   Instant scaling
3. geodns
geodns
+                     x
➔   Prototype 1 week   ➔ Accuracy


➔   Geo-scaling        ➔   No DC feedback

➔   Redundant      ...
Next steps
➔   Search (Solr?)

➔   Batch activity feed

➔   Real connection pooling

➔   … and on ...
Remember!
➔   Team spirit is important

➔   Another level of indirection...

➔   Keep it simple

➔   Keep a log
the heroes
http://my.opera.com/devblog/about/
http://my.opera.com/devblog/
any questions?
                 ?
handout download:

  http://tinyurl.com/surge2010-cosimo



thanks!
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Surge 2010 - from disaster to stability - scaling my.opera.com
Upcoming SlideShare
Loading in...5
×

Surge 2010 - from disaster to stability - scaling my.opera.com

1,259

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,259
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Surge 2010 - from disaster to stability - scaling my.opera.com

  1. 1. from disaster to stability the scaling challenges of my.opera.com Surge 2010 – Version 3
  2. 2. 1999 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  3. 3. 2001 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  4. 4. 2004 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  5. 5. 2007 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  6. 6. 2009 5,500 Servers kUsers 2,500 1,640 887 257 205 430 1 10 50 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010
  7. 7. the current beta
  8. 8. the situation 2007
  9. 9. crashes every day too many connections!!! NFS volume of doom Team?
  10. 10. monitoring
  11. 11. many improvements since then ➔ Efficient filesystem cache ➔ "Dogpile effect" AKA stampeding AKA ... ➔ Persistent db + memcached connections ➔ Soft counters ➔ Profiling, profiling, …
  12. 12. code profiling [DML] time=1237308152, user=, url=/tinh_yeu_cua_anh_b88/blog/index.dml/tag/..., name=XWA::User, variable=active, type=module, elapsed=0.068473, host=my.opera.com [DML] time=1237308152, user=, url=/community/, name=XWA::User, variable=, type=module, elapsed=0.015935, host=my.opera.com [DML] ...
  13. 13. top time-intensive modules XWA::User::Sidebar 2024.919s (27.2%, 0.28 s/call) XWA::User 1778.445s (23.9%, 0.09 s/call) XWA::User::Journal 1121.224s (15.1%, 0.24 s/call) XWA::User::Album 321.522s ( 4.3%, 0.17 s/call) XWA::User::Journal::Search 223.477s ( 3.0%, 20.32 s/call) XWA::User::Comments 188.011s ( 2.5%, 0.05 s/call) XWA::Skins 180.486s ( 2.4%, 0.49 s/call) XWA::User::JournalArchive 159.525s ( 2.1%, 4.43 s/call) XWA::User::Posts 146.644s ( 2.0%, 0.45 s/call) XWA::User::Picture 141.324s ( 1.9%, 0.10 s/call) XWA::Albums 93.740s ( 1.3%, 2.04 s/call) XWA::Journals 92.390s ( 1.2%, 2.37 s/call)
  14. 14. many improvements since then ➔ YSlow? ➔ The Expires header is your friend! ➔ Hot MyISAM tables converted to InnoDB ➔ MySQL Master/Master setup ➔ Jet Profiler
  15. 15. jet profiler
  16. 16. 3 scalability
  17. 17. 1. avatars
  18. 18. Avatars - 2007 75% /<user-name>/avatar.pl /<user-name>/avatar.pl?xscale=8192 (!)
  19. 19. Avatars wtf!? my $sql = DBConnect('master'); my %user = $sql->get( "SELECT a.blob, a.filename, FROM avatars a, users u WHERE u.user=? AND u.id=a.user", $user); $req->print( $user{'blob'} );
  20. 20. Avatars - reloaded ➔ Export to balanced fs (5 formats) ➔ Zero SQL queries ➔ Storage subsystem ➔ static.myopera.com was born
  21. 21. resources (user uploads, binary blobs, ...) Pools or single servers URLs http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_o.png http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_t.jpg http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_m.jpg http://static.myopera.com/pool1/avatars/a4/754/a1b2c3d4e5f6.../<userid>_l.jpg
  22. 22. + x ➔ Load ➔ HTTP::DAV ➔ Flexibility ➔ Precomp URLs ➔ Static scales!
  23. 23. 2. varnish
  24. 24. Varnish Most popular RSS feeds My Opera frontpage Opera Mini approval Datacenter emergencies
  25. 25. Varnish Most popular RSS feeds ➔ /desktopteam/blog/ ➔ Friends, Groups API ➔ No cookies (remove req.http.cookie)
  26. 26. Varnish My Opera frontpage ➔ Danger, Will Robinson! ➔ Mangle cookies ➔ Accept-Language headers
  27. 27. Varnish Opera Mini 5.0 approval ➔ Global coverage ➔ Traffic surge (5x peak, 2x over 24h)
  28. 28. IT NEEDS TO BE OUT TOMORROW !!! THERE WILL BE A PRESS RELEASE !
  29. 29. Varnish Opera Mini 5.0 approval ➔ Global coverage ➔ Traffic surge (5x peak, 2x over 24h) ➔ No problems!
  30. 30. Opera Mini “countup” traffic Submitted Approved to Apple Store April, 12th March, 23rd
  31. 31. Varnish Datacenter emergencies
  32. 32. Datacenter emergencies files.myopera.com DC1 User Files Storage SAN
  33. 33. Datacenter emergencies files.myopera.com DC2 LVS + Varnish servers DC1 User Files Storage SAN
  34. 34. ~ 1Gbit/s! Varnish
  35. 35. + x ➔ Load ➔ Chainsaw! ➔ Flexibility ➔ Purging ➔ Instant scaling
  36. 36. 3. geodns
  37. 37. geodns
  38. 38. + x ➔ Prototype 1 week ➔ Accuracy ➔ Geo-scaling ➔ No DC feedback ➔ Redundant ➔ Monitoring
  39. 39. Next steps ➔ Search (Solr?) ➔ Batch activity feed ➔ Real connection pooling ➔ … and on ...
  40. 40. Remember! ➔ Team spirit is important ➔ Another level of indirection... ➔ Keep it simple ➔ Keep a log
  41. 41. the heroes http://my.opera.com/devblog/about/ http://my.opera.com/devblog/
  42. 42. any questions? ?
  43. 43. handout download: http://tinyurl.com/surge2010-cosimo thanks!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×