[G2]fa ce deview_2012
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,421
On Slideshare
805
From Embeds
616
Number of Embeds
2

Actions

Shares
Downloads
34
Comments
0
Likes
0

Embeds 616

http://deview.kr 613
http://local.deview.kr 3

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Flash-­‐Based  Extended  Cache     for  Higher  Throughput  and  Faster  Recovery Woon-­‐hak  Kang,  Sang-­‐won  Lee,  and  Bongki  Moon  12.  9.  19. 1
  • 2. Outline•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   2
  • 3. Outline•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   3
  • 4. IntroducIon•  Flash  Memory  Solid  State  Drive(SSD )   –  NAND  flash  memory  based  non-­‐volaI le  storage  •  CharacterisIcs   –  No  mechanical  parts   •  Low  access  latency  and  High  random  IOP S   –  MulI-­‐channel  and  mulI-­‐plane   •  Intrinsic  parallelism,  high  concurrency   –  No  overwriIng   •  Erase-­‐before-­‐overwriIng   •  Read  cost  <<  Write  cost   –  Limited  life  span   •  #  of  erasures  of  the  flash  block   4 Image  from  :  hXp://www.legitreviews.com/arIcle/1197/2/
  • 5. IntroducIon(2)•  IOPS  (IOs  Per  Second)  maXers  in  OLTP  •  IOPS/$:  SSDs  >>  HDDs   –  e.g.  SSD    63  (=  28,495  IOPS  /  450$)  vs.  HDD  1.7  (=  409  IOPS  /  240$)  •  GB/$:  HDDs  >>  SSDs   –  e.g.  SSD    0.073  (=  32GB  /  440$)  vs.    HDD  0.617  (=  146.8GB  /  240$)  •  Therefore,  it  is  more  sensible  to  use  SSDs  to  su pplement  HDDs,  rather  than  to  replace  them   –  SSDs  as  cache  between  RAM  and    HDDs   –  To  provide  both  the  performance  of  SSDs  and  the  c apacity  of  HDDs  as  liXle  cost  as  possible   5
  • 6. IntroducIon(3) •  A  few  exisIng  flash-­‐based  cache  schemes   –  e.g.  Oracle  Exadata,  IBM,  MS   –  Pages  cached  in  SSDs  are  overwriXen;  the  write  paXern  in  SS Ds  is  random   •  Write  bandwidth  disparity  in  SSDs   –  e.g.  random  write  (25MB/s  =  6,314  x  4KBs/s  )  vs.  sequenIal  w rite  (243MB/s)  vs.     4KB  Random  Throughput  ( Ra=o  Sequen=al/Random Sequen=al  Bandwidth  (MBPS) IOPS)  write Read Write Read WriteSSD  mid  A 28,495 6,314 251 243 9.85SSD  mid  B 35,601 2,547 259 80 8.04HDD  Single     409 343 156 154 114.94HDD  Single  (x8) 2,598 2,502 848 843 86.25 6
  • 7. IntroducIon(4)  •  FaCE  (Flash  as  Cache  Extension)  –  main  contribuIons   –  Write-­‐opImized  flash  cache  scheme:  e.g.  3x  higher  throughput  t han  the  exisIng  ones   –  Faster  database  recovery  support  by  exploiIng  the  non-­‐volaIle  c ache  pages  in  SSDs  for  recovery:  e.g.  4x  faster  recovery  Ime DRAM Random  Read   Sequen=al  Write   (Low  cost) (à  High  throughput) Random   Read Non-­‐vola=lity  of  flash   SSD cache  for  recovery   Random   (faster  recovery) Write HDD 7
  • 8. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   8
  • 9. Related  work•  How  to  adopt  SSDs  in  the  DBMS  area?  1.  SSD  as  faster  disk   –  VLDB  ‘08,  Koltsidas  et  al.,  “Flashing  up  the  Storage  Layer”   –  VLDB  ’09,  Canim  et  al.  “An  Object  Placement  Advisor  for  DB2  Usin g  Solid  State  Storage”   –  SIGMOD  ‘08,  Lee  et  al.,  "A  Case  for  Flash  Memory  SSD  in  Enterpris e  Database  ApplicaIons"    2.  SSD  as  DRAM  buffer  extension   –  VLDB  ’10,  Canim  et  al.,  “SSD  Bufferpool  extensions  for  Database  s ystems”   –  SIGMOD  ’11,  Do  et  al.,  “Turbocharging    DBMS  Buffer  Pool  Using  SS Ds” 9
  • 10. Lazy  Cleaning  (LC)  [SIGMOD’11]  •  Cache  on  exit  •  Write-­‐back  policy  •  LRU-­‐based  SSD  cache  replacement  policy   –  To  incur  almost  random  writes  against  SSD  •  No  efficient  recovery  mechanism  provided Flash  hit Random  writes  Evict RAM Buffer (LRU) Flash memory SSD Fetch  on  miss Stage  out  dirty  pages HDD 10
  • 11. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choices   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   11
  • 12. FaCE:  Design  Choices1.  When  to  cache  pages  in  SSD?  2.  What  pages  to  cache  in  SSD?  3.  Sync  policy  b/w  SSD  and  HDD  4.  SSD  Cache  Replacement  Policy   12
  • 13. Design  Choices:  When/What/Sync  Policy •  When  :  on  entry  vs.  on  exit   •  What  :  clean  vs.  dirty  vs.  both   •  Sync  policy  :  write-­‐thru  vs.  write-­‐back   On  exit  :  dirty  pages  athe  ell  as   Sync  policy  :  for   s  w performance,  write-­‐back  sync   clean  pages Evict     RAM Buffer Flash as Cache Extension (LRU)Fetch  on  miss Stage  out  dirty  pages   HDD 13
  • 14. Design  Choices:  SSD  Cache  Replacement  Policy•  What  to  do  when  a  page  is  evicted  from  DRAM  buffe r  and  SSD  cache  is  full  •  LRU  vs.  FIFO  (First-­‐In-­‐First-­‐Out)   –  Write  miss:  LRU-­‐based  vicIm  selecIon,  write-­‐back  if  dirt y  vicIm,  and  overwrite  the  old  vicIm  page  with  the  new  page  being  evicted   –  Write  hit:  overwrite  the  old  copy  in  flash  cache  with  the   updated  page  being  evicted   Random   writes   Evict     RAM Buffer against  SSD Flash as Cache Extension (LRU) HDD 14
  • 15. Design  Choices:  SSD  Cache  Replacement  Policy•  LRU  vs.  FIFO  (First-­‐In-­‐First-­‐Out)   –  VicIms  are  chosen  from  the  rear  end  of  flash  cache   :  “sequenIal  writes”  against  SSD   –  Write  hit  :  no  addiIonal  acIon  is  taken  in  order  not   to  incur  random  writes.   •  mulIple  versions  in  SSD  cache   Evict     RAM Buffer Flash as Cache Extension (LRU) Multi-Version FIFO (mvFIFO) HDD 15
  • 16. Write  ReducIon  in  mvFIFO•  Example   –  Reduce  three  writes  to  HDD  to  one  Versions  of  Page  P Mul=ple   Choose   Invalidated     Invalidated   Write-­‐back   version Discard Vic=m version to  HDD Page  P-­‐v2 Page  P-­‐v1 Page  P-­‐v3 Evict     RAM Buffer Flash as Cache Extension (LRU) HDD 16
  • 17. Design  Choices:  SSD  Cache  Replacement  Policy•  LRU  vs.  FIFO   LRU FIFO Write  paXern Random Sequen=al Write  performance Low High   #  of  copy  pages Single MulIple Space  uIlizaIon High Low Hit  raIo  &  write  reducIon High Low •  Trade-­‐off  :  hit-­‐raIo  <>  write  performance   –  Write  performance  benefit  by  FIFO  >>  Performance  gain  from  higher  hit  raIo  by  LRU   17
  • 18. mvFIFO:  Two  OpImizaIons•  Group  Replacement  (GR)       –  MulIple  pages  are  replaced  in  a  group  in  order  to  exploi t  the  internal  parallelism  in  modern  SSDs   –  Replacement  depth  is  limited  by  parallelism  size  (chann el  *  plane)   –  GR  can  improve  SSD  I/O  throughput  •  Group  Second  Chance  (GSC)       –  GR  +  Second  chance   –  if  a  vicIm  candidate  page  is  valid  and  referenced,  will  re -­‐enque  the  vicIm  to  SSD  cache   •  A  variant  of  “clock”  replacement  algorithm  for  the  FaCE   –  GSC  can  achieve  higher  hit  raIo  and  more  write  reducI ons   18
  • 19. Group  Replacement  (GR)  •  Single  group  read  from  SSD  (64/128  pages)  •  Batch  random  writes  to  HD RAM Check  valid  and   dirty  flag   D   Flash  Cache  •  Single  group  write  to  SSD   becomes  FULL 2.  Evict     RAM Buffer Flash as Cache Extension (LRU)1.  Fetch  on  miss HDD 19
  • 20. Group  Second  Chance  (GSC)  •  GR  +  Second  Chance   reference  bit  is  ON Check  reference  bit,   RAM if  true  galid  them   Check  vave  and   dirty  flag 2nd  chance Flash  Caches   become  FULL 2.  Evict     RAM Buffer Flash as Cache Extension (LRU)1.  Fetch  on  miss HDD 20
  • 21. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   21
  • 22. Recovery  Issues  in  SSD  Cache•  With  write-­‐back  sync  policy,  many  recent  copies  of  data  pages  ar e  kept  in  SSD,  not  in  HDD.  •  Therefore,  database  in  HDD  is  in  an  inconsistent  state  ayer  syste m  failure     New  version   of  page  PRAM SSD Mapping Information (Metadata) Crash Inconsistent   as Cache Extension Flash state Old  version   of  page  P HDD 22
  • 23. Recovery  Issues  in  SSD  Cache•  With  write-­‐back  sync  policy,  many  recent  copies  of  data  pages  are  kept   in  SSD,  not  in  HDD.  •  Therefore,  database  in  HDD  is  in  an  inconsistent  state  ayer  system  failu re    •  In  this  situa=on,  one  recovery  approach  with  flash  cache  is  to  view  da tabase  in  harddisk  as  the  only  persistent  DB  [SIGMOD  11]   –  Periodically  checkpoint  updated  pages  from  SSD  cache  as  well  as  DRAM  bu ffer  to  HDD     New  version   of  page  PRAM SSD Mapping Information Excessive   Checkpoint Checkpoint Checkpoint   Flash as Cache Extension Cost Persistent     DB HDD Old  version   of  page  P 23
  • 24. Recovery  Issues  in  SSD  Cache(2)•  Fortunately,  because  SSDs  are  non-­‐vola=le,  pages  cached  in  SSD  are  al ive  even  ayer  system  failure.  •  SSD  mapping  informaIon  has  gone  •  Two  approaches  for  recovering  metadata.   1.  Rebuild  lost  metadata  by  scanning  the  whole  pages  cached  in  SSD  (Naïve  approach)  –  Time-­‐consuming  scanning   2.  Write  metadata  persistently  whenever  metadata  is  changed  [DaMon  11]  –  Run-­‐Ime  overhead  for  managing  metadata  persistently   New  version   of  page  PRAM SSD Mapping Information Full  Scanning Flash as Cache Extension Flush  every  update Persistent     DB HDD Old  version   of  page  P 24
  • 25. Recovery  in  FaCE•  Metadata  checkpoinIng   –  Because  a  data  page  entering  SSD  cache  is  wriXen  t o  the  rear  in  chronological  order,  metadata  can  be   wriXen  regularly  in  a  single  large  segment   64K     RAM Recovery  :   SSD Metadata page   info. Mapping Segment Scanning   segment Periodically  checkpoint   Flash as Cache Extension Crash metadata HDD 25
  • 26. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   26
  • 27. Experimental  Set-­‐Up•  FaCE  ImplementaIon  in  PostgreSQL   –  3  funcIons  in  buffer  mgr.  :  bufferAlloc(),  getFreeBuffer(),  buff erSync()   –  2  funcIons  in  bootstrap  for  recovery  :  startupXLOG(),  initBuff erPool()  •  Experiment  Setup   –  Centos  Linux   –  Intel  Core  i7-­‐860  2.8  GHz  (quad  core)  and  4G  DRAM   –  Disks  :  8  RAIDed  15k  rpm  Seagate  SAS  HDDs  (146.8GB)   –  SSD  :  Samsung  MLC  (256GB)  •  Workloads   –  TPC-­‐C  with  500  warehouses  (50GB)  and  50  concurrent  clients   –  BenchmarkSQL   27
  • 28. TransacIon  Throughput HDD  only   SSD  only   LC   FaCE   FaCE+GR   FaCE+GSC   7000   3.9x 6000   FaCE+GSC FaCE+GR 3.1x 5000  Transac=ons  Per  minute 2.6x 4000   FaCE-­‐basic 2.6x 2.4x 2.1x 3000   1.5x LC 2000   SSD  only 1000   HDD  only 0   4   8   12   16   20   24   28   |Flash  cache|/|Database|  (%) 28
  • 29. Hit  RaIo,  Write  ReducIon,  and  I/O  Throughput Flash  Cache  Hit  Ra=o 100   95   90   LC 85   FaCE+GSCHit  ra=o  (%) 80   75   FaCE-­‐basic 70   FaCE+GR 65   60   2GB   4GB   6GB   8GB   10GB   Flash  cache  size LC   FaCE   FaCE+GR   FaCE+GSC   29
  • 30. Hit  RaIo,  Write  ReducIon,  and  I/O  Throughput Write  Reduc=on  Ra=o  By  Flash  Cache Write  Reduc=on  Ra=o   100   Flash  Cache  Hit  Ra=o By  Flash  Cache 100   100   90   95   90   90   80   85   80   LC Ra=o(%)  Hit  ra=o  (%) Ra=o(%)   70   80   70   FaCE+GSC 75   60   60   FaCE-­‐basic 70   50   50   FaCE+GR 65   60   40   40   2GB   2GB   4GB   6GB   4GB  8GB   10GB   6GB   2GB   4GB   8GB   6GB   8GB   10GB   10GB   Flash  cache  size Flash  cache  size Flash  cache  size LC   FaCE   FaCE+GR   FaCE+GSC   FaCE   LC   FaCE+GR   LC   FaCE+GSC   FaCE   FaCE+GR   FaCE+GSC   30
  • 31. Hit  RaIo,  Write  ReducIon,  and  I/O  Throughput Write  Reduc=on  Ra=o   Throughput  of  4KB-­‐page  I/O Throughput  of  4KB-­‐page   Flash  Cache  Hit  Ra=o 16000   Throughput  of  4KB-­‐page   By  Flash  Cache I/O 100   I/O 14000   100   FaCE+GSC 16000   16000   95   14000   90   12000   14000   90   12000   FaCE+GRThroughput  (4KB) 10000   80   12000   Throughput  (4KB) 85   10000   Hit  ra=o  (%) Throughput  (4KB) Ra=o(%)   10000   80  8000   70   8000   FaCE-­‐basic 8000   75  6000   6000   60   6000   70  4000   4000   4000   65  2000   LC 50   2000   2000   60   40   0   0   2GB   4GB   6GB   8GB   10GB   2GB   4GB   0   6GB  8GB   10GB   6GB   2GB   4GB   6GB   8GB   10GB   2GB   4GB   8GB   10GB   Flash  cache  size Flash  cache  size 4GB   6GB   2GB   8GB   10GB   Flash  cache  size Flash  cache  size Flash  cache  size LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   31
  • 32. Recovery  Performance•  4.4x  faster  recovery  than  HDD  only  approac h   Metadata  recovery  :  2 redo  Ime  :me  :  823 redo  I  186 32
  • 33. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   33
  • 34. Conclusion•  We  presented  a  low-­‐overhead  caching  method   called  FaCE  that  uIlizes  flash  memory  as  an  ext ension  to  a  DRAM  buffer  for  a  recoverable  data base.  •  FaCE  can  maximized  the  I/O  throughput  of  a  fla sh  caching  device  by  turning  small  random  writ es  to  large  sequenIal  ones  •  Also,  FaCE  takes  advantage  of  the  non-­‐volaIlity  of  flash  memory  to  accelerate  the  system  resta rt  from  a  failure.   34
  • 35. QnA