• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
469
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
13
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. To Cloud
 or Not to Cloud?
 
 Greg Lindahl, CTO
 
 @glindahl – greg@blekko.com"
  • 2. About  Us   •  Web-­‐scale  search  engine  with  our  own  crawl  &  index   •  Public  launch,  November  2010   •  $60  M  raised     •  800  servers,  16  PB  spinning  rust,  ½  PB  flash  disk  
  • 3. blekko.com  
  • 4. izik  –  tablet  search  
  • 5. The  wiring  diagram   Web   Crawler   Extractor   Ranker   Indexer   Lookup   Query   Analyzer   Front  End  Query   SERP   DIG  KB  
  • 6. Hijacking  a  meetup  topic   •  Original  topic  was  “virtualizaUon  or  not”   •  But  really,  virtualizaUon  is  an  implementaUon   detail  these  days   – cloud  =>  virtual   – virtual  =>  public  or  private  cloud  (probably)   •  This  talk:  Public  cloud  vs.  not   •  I’m  trying  to  list  a  bunch  of  things  that  you   should  think  about  …  your  situaUon  probably   differs  from  mine  
  • 7. The  quesUon   •  It’s  2007,  and  your  CEO  asks  you:   Should  our  new  startup  use  this  newfangled   cloud  compuUng  stuff  or  not?  
  • 8. Why  cloud  at  all?   •  Flexible   – prototyping  &  development   – tesUng  at  scale   – scale  up  for  high  usage  and  back  down  later   •  Turns  CapEx  into  OpEx   – startups  prefer  paying  over  Ume   – “money  tomorrow  is  cheaper  than  money  today”,   if  you’re  successful   {btw,  plenty  of  banks  will  loan  against  equipment.}  
  • 9. Cloud  win  examples   •  CommonCrawl.org  has  a  web  crawl  dataset  on   EC2   – Map/Reduce  job  to  read  the  whole  thing  is  ~  $50   •  Fewer  ops  people  is  actually  true   •  Your  company  can  change  direcUon  
  • 10. OK,  so  what’s  bad?   •  Examine  the  curve  of  Amazon’s  pricing  over   Ume  and  per  volume   •  People  think  it’s  a  low-­‐priced  product,  but  it’s   not.   •  It’s  value  priced.   •  Not  enough  compeUUon,  yet,  to  really  drive   Amazon’s  margins  down   •  This  is  good  for  Amazon,  maybe  not  for  you.  
  • 11. 6  Reasons  to  not  use  Amazon   •  Economy  of  scale  in  your  favor?   •  Your  max::min  raUo  is  not  large  enough   •  Cloud  IOPs  are  expensive   •  Data  is  heavy  if  you  use  a  lot  of  local  disk   •  SSDs  are  overpriced   •  RaUo  of  disk  capacity  or  bandwidth  ::  ssd  ::   memory  ::  compute  may  not  be  ideal  for  you  
  • 12. Economy  of  scale   •  “Amazon  has  100s  of  thousands  of  servers,  so   they  can  run  them  cheaper  than  I  can.”   •  But:   – you  pay  retail,  not  wholesale  price   – there  are  diminishing  returns  with  size   •  At  some  point,  it’s  cheaper  to  do  it  yourself   •  100  servers?  50  servers?                              {  blekko  had  700  at  launch…  }  
  • 13. Your  max::min  raUo  is  not  big  enough   •  Maybe  you  use  100x  as  many  servers  some   days?   – Cloud  is  for  you!   •  How  long  do  your  usage  spikes  last?   •  Can  you  predict  them  far  enough  in  advance?   •  How  long  does  it  take  you  to  spin  up  a  new   node?   {blekko’s  day::night  is  only  2x}  
  • 14. Cloud  IOPs  are  expensive   •  I/O  OperaUons  are  expensive  to  start  with   – “spinning  rust”  disks  only  seek  so  much   •  Networked  storage  has  low  bandwidth   compared  to  10  apached  disks   – 1  Gbyte/sec  sustained  –  woah!   •  Networked  disks  are  more  expensive  than   local   – beper  failure  behavior,  whether  I  want  it  or  not  
  • 15. Data  is  heavy  if  you  use  a  lot  of  local  disk   •  I  mean:  it  takes  a  loooooong  Ume  to  copy  a   few  tbytes  of  data  onto  your  local  disk  over   the  network   – 1  gigabit:  ½  tbyte/hour   – 10  gigabit:  5  tbytes/hour   – even  filling  your  ½  tbyte  SSD  is  kinda  slow   •  Slow  spin-­‐up/down  of  nodes  hurts  your  ability   to  flex  up  and  down  
  • 16. SSDs  are  overpriced  (by  cloud  providers)   •  SSDs  are  completely  awesome  for  read-­‐heavy   analyUcs  queries   •  SSDs  wear  out  with  writes   •  No  cloud  provider  charges  a  fee  for  writes?   •  Instead,  they  assume  all  their  customers  are   average   •  …  and  so  they  charge  way  too  much  to   customers  who  are  smart  about  not  wriUng   too  much   {  blekko  is  great  at  not  wriUng  to  our  SSDs  }  
  • 17. RaUos  available  might  not  fit  your  usage     •  Amazon  tries  prepy  hard:   –  high  memory,  high-­‐CPU,  GPU,  high  I/O,  high-­‐storage   –  weirder  ones  are  less  flexible   •  It’s  sUll  easy  to  not  fit  into  that  set  of  cookie   cupers   •  Not  firng  ==  wasted  money   –  idle  resources  that  you’ve  paid  for   –  moves  the  break-­‐even  point  to  smaller  node  count     {  blekko  crawler  nodes:  10  local  disks  (capacity,   bandwidth,  seeks),  2  ssds,  96  gigs  ram}    
  • 18. So…   •  For  us,  it  was  easy  to  predict  the  right  answer   •  Our  SWAG  for  launch  day  was  600  servers   – and  our  enUre  index  in  SSD   – and  we  can’t  scale  down  from  that   •  Amazon  wasn’t  renUng  SSDs  yet   •  If  you’re  going  to  run  your  own  servers,  you   need  to  start  early  
  • 19. How  about  you?   •  RT  analyUcs  is  a  complicated  subject   •  Two  main  thrusts   – Pre:  pre-­‐compute  aggregate  numbers,  query   those   – Mem:  sUck  a  subset  of  your  big  data  that  fits  into   ram  or  ssd,  do  complicated  queries  against  those   {  blekko  only  does  Pre  }  
  • 20. Pre   •  Needs  to  be  wired  into  your  stream  of  data   generaUon,  e.g.  your  webserver   •  Summary  data  can  be  prepy  small   •  Doesn’t  really  maper  where  you  put  it   •  Not  much  impact  on  the  cloud/no-­‐cloud   decision   {  blekko  pre-­‐computes  a  lot  of  things  using   “combinators”  in  our  home-­‐grown  NoSQL,     opUonally  stuffing  them  into  our  SSD  caching   system  }  
  • 21. SERVER 1 PROCESS 1 PROCESS 2 SERVER 2 PROCESS 1 PROCESS 2 DISK 1 DISK 2 DISK 3 +4 +3 +4 +7 +11 +11+11 +7 +7+7 +18 +18 +18 Combinators  reduce  the  total  work  
  • 22. Mem   •  Even  a  decimated  subset  of  your  fresh  data   can  involve  a  lot  of  write  bandwidth   – SomeUmes  referred  to  as  “high  velocity”   •  High  BW  probably  needs  to  go  nearby  your  big   data  store   •  AnalyUcs  probably  isn’t  going  to  influence  the   cloud/not-­‐cloud  decision  
  • 23. Discuss!   •  Discuss   •  For  more  about  blekko’s  setup:   – 3  part  blog  series  at  highscalability.com   – Please  search  [high  scalability  blekko]  in  your   search  engine  of  choice   – greg@blekko.com  -­‐-­‐-­‐  @glindahl