From Instability to Resilience: The Story of a Web Site

232 views

Published on

Video and slides synchronized, mp3 and slide download available at URL http://bit.ly/So63P6.

Richard Campbell shares his experiences evolving a web site from ordinary to resilient, the triage process, the quick-and-dirty solutions as well as the work to bring the site to true resiliency. Filmed at qconlondon.com.

Richard Campbell has spent more than 30 years playing around with microcomputers. Along the way he's done virtually every job you can imagine in the industry, from manufacturing to programming to consulting, training and writing. Richard is a Microsoft Regional Director, an ASP.NET MVP and a serial entrepreneur. Richard is a partner in PWOP Productions that creates a number of popular podcasts.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
232
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

From Instability to Resilience: The Story of a Web Site

  1. 1. InfoQ.com: News & Community Site • 750,000 unique visitors/month • Published in 4 languages (English, Chinese, Japanese and Brazilian Portuguese) • Post content from our QCon conferences • News 15-20 / week • Articles 3-4 / week • Presentations (videos) 12-15 / week • Interviews 2-3 / week • Books 1 / month Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations /resiliency-website http://www.infoq.com/presentati ons/nasa-big-data http://www.infoq.com/presentati ons/nasa-big-data
  2. 2. Presented at QCon London www.qconlondon.com Purpose of QCon - to empower software development by facilitating the spread of knowledge and innovation Strategy - practitioner-driven conference designed for YOU: influencers of change and innovation in your teams - speakers and topics driving the evolution and innovation - connecting and catalyzing the influencers and innovators Highlights - attended by more than 12,000 delegates since 2007 - held in 9 cities worldwide
  3. 3. From  Instability  to  Resilience:   The  Story  of  a  Web  Site   Richard  Campbell   @richcampbell  
  4. 4. Richard  Campbell   •  Background   –  First  laid  hands  on  a  microcomputer  in  1977,  it’s  been  all   downhill  from  there   –  Spent  the  last  fiGeen  years  helping  companies  scale  soGware   on  a  variety  of  plaIorms   •  Currently   –  Architect  and  Consultant   –  Organizer  of  DevIntersecNon   –  Rabid  Podcaster  
  5. 5. Podcasts   For .NET Developers First published 2002 Two shows a week 955 episodes in the archive For IT Pros First published 2007 Once a week 357 episodes in the archive For Tablet Developers First Published 2011 Once a week 126 episodes so far
  6. 6. What  is  Resilience?  
  7. 7. Resilience  DefiniNon  
  8. 8. The  Web  Site  Story   •  VerNcal  market  e-­‐commerce  system   •  More  than  200,000  discrete  items   •  Considered  mature  (version  3!)   •  Busier  than  ever,  but  making  less  money   •  Increasing  tech  support  calls  and  complaints  
  9. 9. DiagnosNc  InstrumentaNon   •  Performance  Monitor   – CPU  uNlizaNon   – Requests/Sec   – Requests  Queued   – .NET  Heap  Allocated  
  10. 10. IniNal  Diagnosis   •  CPU  uNlizaNon  high,  not  pinned   •  20-­‐30  Requests/sec,  steady   •  Request  Queued  jumps  occasionally   •  .NET  Heap  Climbs  to  800MB,  then  dumps   – Memory  leak?   •  Worker  process  recycling  every  20  minutes  
  11. 11. Taking  AcNon   •  Speed  of  response  vs.  comprehensive   response   •  Must  have  measurements  before  and  aGer   •  Addressing  a  memory  leak  with  more   memory!  
  12. 12. First  AcNons  &  Results   •  Switch  to  64  Bit  OS  (but  compile  to  32  bit)   •  Added  4GB  of  Memory  (for  a  total  of  8GB)   •  Memory  leak  conNnued   – Just  took  longer   – Dump  occurred  a  2GB  (maximum  pool  size)   – Worker  process  recycling  every  two  hours   •  Complaint  level  drops  dramaNcally  
  13. 13. Character  of  the  Web  Site   •  Accuracy   •  Reliability   •  Scalability   •  Performance   •  Where  does  Resilience  fit?  
  14. 14. Bejer  Living  through  Hardware   •  Adding  more  web  servers   – Spreading  failure  around   •  Moving  to  the  Cloud   – Paying  for  failure  by  the  hour  
  15. 15. Increasing  Understanding   •  What  is  causing  this  memory  leak?   – Using  .NET  Memory  Profiling  to  analyze   •  Most  memory  consumed  by  cache   <foreshadow>   •  NoNced  large  blocks  of  staNc  memory  (session   objects)     </foreshadow>  
  16. 16. When  Cache  Runs  Amok   •  Caching  added  to  improve  performance   •  Cache  objects  being  created  around  searches   •  What  are  the  chances  of  that  cache  object   ever  being  used  again?   •  How  do  you  expire  a  cache  object?    
  17. 17. InstrumenNng  Cache   •  Wrapped  cache  objects  in  a  dicNonary   abstracNon   •  Kept  counters  for  re-­‐use   •  Two  hours  of  run  Nme  (one  worker  process   recycle)  results  in  95%  of  cache  objects  never   reused    
  18. 18. Fixing  Cache   •  Don’t  allow  free  form  search  cache   •  Don’t  let  the  users  populate  cache  at  all   •  Treat  the  real  problem  
  19. 19. Valuing  Resiliency   •  Nothing  like  failure  to  help  determine  value   •  Owning  your  ROI   •  Spending  CapEx  to  reduce  OpEx  
  20. 20. Adding  More  Resiliency   •  Bend  further   – MulNple  servers  provide  redundancy   •  ElasNcity   – Can  we  expand  or  shrink  based  on  need?   •  Bejer  feedback   – How  do  we  know  when  we’re  bent,  when  to   expand  and  when  to  shrink?  
  21. 21. Clusters  of  Joy   •  Successful  failover  is  hardware,  soGware    &   configuraNon  together   •  If  it  hasn’t  been  tested,  it  doesn’t  work   •  RouNne  failover  is  your  friend!  
  22. 22. The  Plague  of  State   •  Geqng  session  out  of  the  web  server   •  The  bajle  of  performance  over  scalability   •  As  lijle  state  as  possible  (and  no  less)  
  23. 23. InstrumenNng  ProducNon   •  The  only  source  of  truth   •  How  can  you  build  soGware  that  can  test  in   producNon  without  impacNng  the  user?   •  InstrumentaNon  driving  features  
  24. 24. Building  Resiliency   •  Resiliency  is  a  journey,  not  a  desNnaNon   •  Know  the  character  of  your  system   •  Work  from  facts,  not  conjecture  to  move  the   system  forward  
  25. 25. Watch the video with slide synchronization on InfoQ.com! http://www.infoq.com/presentations/resiliency -website

×