Building Large Scale Services - LISA 2013

581 views

Published on

Yahoo! Service Engineers (SE) specialize in bridging the gap between system administration and development. SEs are tasked with delivering a reliable, consistent quality service through the use of best practices. They must understand network, OS, hardware, and customer use cases; and dive deep into the application internals.

In this talk, Jennifer will describe her journey with the Sherpa service at Yahoo! and lessons learned about building a reliable, consistent, and high-quality service from scratch.

The key takeaway from this talk will be to educate practitioners on successful strategies and pitfalls when building out a service.

Published in: Technology, Business
  • Be the first to comment

Building Large Scale Services - LISA 2013

  1. 1. B u i l d i n g   L a r g e   S c a l e   S e r v i c e s   PRESENTED  BY  Jennifer  Davis⎪  November  8,  2013  
  2. 2. Twitter: @sigje Email: sigje@yahoo.com
  3. 3. SysAdmin Controls all the things 3   11/11/13  
  4. 4. Shared Dependencies 4   11/11/13  
  5. 5. The Reality…   5   11/11/13  
  6. 6. The Dream… 6   11/11/13  
  7. 7. How?
  8. 8. Define Core Principles § Common     ›  CollaboraGon  across  teams,  companies,  industry,   define  standards   ›  Incident,  Problem,  Change,  Config,  Release   management   § DisGnct   ›  Specifics  to  an  applicaGon  or  service   ›  Availability,  Service,  Business  ConGnuity,  Capacity     8   11/11/13  
  9. 9. Kill the Myths § Stupid  User     9   11/11/13  
  10. 10. Kill the Myths   § Stupid  User   § System  Admin  ==  Operator     10   11/11/13  
  11. 11. CHEF nosql TCP/ IP security 11   perl Failing unix Gracefully operability mysql bash puppet ruby SKILLS 11/11/13  
  12. 12. 12   11/11/13  
  13. 13. Kill the Myths   § Stupid  User   § System  Admin  ==  Operator   § Words  have  a  common  universal  implicit  meaning       13   11/11/13  
  14. 14. 14   11/11/13  
  15. 15. Learn to Modulate your Message       15   11/11/13  
  16. 16. Team Manager 16   Customer 11/11/13  
  17. 17. Team § People  working  towards  common  goal.   § Different  roles.     § Different  views.   § Same  objecGves.   17   11/11/13  
  18. 18.  Image  Credit:  Kyle  LaGno   18   11/11/13  
  19. 19. Team Sugges/on:  Don’t  talk  about  the  “devs”  request,   talk  about  Elaine’s  request.     19   11/11/13  
  20. 20. Team Sugges/on:  Don’t  talk  about  the  “devs”  request,   talk  about  Elaine’s  request.     Sugges/on:  Verify  that  your  team  has  the  same   vision.   20   11/11/13  
  21. 21. Understand the vision. §  Are  there  other  opGons,  open  source  or  not  within  the  company?   §  Are  there  other  opGons  outside  the  company?   §  Is  EVERYONE  on  the  same  page  about  what  the  service  is?   21   11/11/13  
  22. 22. Vision Statement §  Clear  statement  about  the  problem  that  the  service  is  solving.   ›  DirecGon   ›  IdenGty  management   ›  Team  cohesion   New  product?  Be  part  of  creaGng  that  vision!   22   11/11/13  
  23. 23. Sherpa’s Vision ..  Distributed  replicated  eventually  consistent  key  value  store  that  had  a  focus   on  scalability  ..     23   11/11/13  
  24. 24. My Job §  §  §  §  §  §  24   Examine  soaware   Define  risk   Communicate  cost  of  risks     MiGgate  risks   IdenGfy  events   Manage  events   11/11/13  
  25. 25. Fragile Platforms are Bad. 25   11/11/13  
  26. 26. Change is inevitable §  Products  pivot  based  on  needs.   §  Requirements  change  and  evolve.   §  Know  core  issues.   26   11/11/13  
  27. 27. Know Core Issues §  Limit  the  scope  of  focus.     27   11/11/13  
  28. 28. Know Core Issues §  Limit  the  scope  of  focus.   §  Focus  on  the  biggest  prioriGes.     28   11/11/13  
  29. 29. Know Core Issues §  Limit  the  scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?     29   11/11/13  
  30. 30. Know Core Issues §  Limit  the  scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?   ›  IdenGfy  the  key  “Gme”  elements.     30   11/11/13  
  31. 31. Know Core Issues §  Limit  the  scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?   ›  IdenGfy  the  key  “Gme”  elements.   ›  Talk  to  them.  IdenGfy  their  key  terms.  “Enhancements”,  “Defects”     31   11/11/13  
  32. 32. Know Core Issues §  Limit  the  scope  of  focus.   §  Focus  on  the  biggest  prioriGes.   ›  Understand  Development  Methodology:  Waterfall,  Scrum,  ?   ›  IdenGfy  the  key  “Gme”  elements.   ›  Talk  to  them.  IdenGfy  their  key  terms.  “Enhancements”,  “Defects”   ›  Establish  the  “Top”  list.       32   11/11/13  
  33. 33. Create checklists §  §  §  §  Not  because  people  are  dumb.   Not  only  because  of  automaGon.   When  things  break,  knowing  what  needs  focus.   During  normal  maintenance,  can  idenGfy  “not  OK”.   ›  33   Audit  checklists  for  deployment  through  staging  environment.   11/11/13  
  34. 34. Know Outputs §  §  §  §  34   IdenGfy  components.   Well  defined  protocols  between  components.   Expected  Inputs.   Expected  Outputs.   11/11/13  
  35. 35. 35   11/11/13  
  36. 36. 36   11/11/13  
  37. 37. 37   11/11/13  
  38. 38. 38   11/11/13  
  39. 39. 39   11/11/13  
  40. 40. Know State Transitions Explicitly. §  When  component  is  installed  but  not  ready   40   11/11/13  
  41. 41. Know State Transitions Explicitly. §  When  component  is  installed  but  not  ready   §  When  the  colo  is  going  away   §  Go  through  What  If  Scenarios.   ›  41   Document  them.   11/11/13  
  42. 42. Know choke points explicitly. §  Memory   §  Disk   §  Bandwidth   Now  and  in  6  months.   JIT?   42   11/11/13  
  43. 43. Failure will happen. §  §  §  §  43   There  are  no  0  failure  systems.    “Give  me  the  brain”  documentaGon  so  that  anyone  can  be  the  brain.   Repeatable/Reliable  failure  handling.   Run  fire  drills.  Really.     11/11/13  
  44. 44. 44   11/11/13  
  45. 45. System Administration is Gardening. §  No  guarantee  of  resources.   §  Only  guarantee  is  change.   45   11/11/13  
  46. 46. System Administration is Gardening. §  Nurture  relaGonships.   ›  ›  Be  trusGng  and  trustworthy.   ›  46   Be  authenGc.   Have  integrity.   11/11/13  
  47. 47. Success At Scale is Collaboration & Cooperation across Teams.
  48. 48. Decreasing Value 48   11/11/13  
  49. 49. # of Support Engineers 8 6 # of Support Engineers 4 2 0 49   Jan Apr Jul Oct 11/11/13  
  50. 50. # of Support Engineers 6 5 4 # of Support Engineers 3 2 1 0 50   Jan Apr Jul Oct 11/11/13  
  51. 51. 51   11/11/13  
  52. 52. Documentation is not the cure. § DocumentaGon  doesn’t  guarantee   understanding.   ›  OperaGons  Sandbox  Environment   § Don’t  spend  Gme  at  the  end  documenGng.   52   11/11/13  
  53. 53. 53   11/11/13  
  54. 54. Summary
  55. 55. Be Expendable. Feed your brain. 55   11/11/13  
  56. 56. Acknowledgements •  •  •  •  •  •  •    56   hkp://www.flickr.com/photos/levork     hkp://www.flickr.com/photos/puggles   hkp://www.flickr.com/photos/byteorder   hkp://www.flickr.com/photos/egoant   hkp://www.flickr.com/photos/happymonkey   Kyle  LaGno     Greg  Connor   11/11/13  
  57. 57. Thanks! sigje@yahoo.com http://www.slideshare.net/sigje/ presentation-lisa 57   11/11/13  

×