Your SlideShare is downloading. ×
0
Flash-­‐Based	  Extended	  Cache	  	                     for	  Higher	  Throughput	  and	  Faster	  Recovery              ...
Outline•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choice	     –   ...
Outline•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choice	     –   ...
IntroducIon•  Flash	  Memory	  Solid	  State	  Drive(SSD   )	      –  NAND	  flash	  memory	  based	  non-­‐volaI       le	...
IntroducIon(2)•  IOPS	  (IOs	  Per	  Second)	  maXers	  in	  OLTP	  •  IOPS/$:	  SSDs	  >>	  HDDs	      –  e.g.	  SSD	  	 ...
IntroducIon(3)   •  A	  few	  exisIng	  flash-­‐based	  cache	  schemes	                –  e.g.	  Oracle	  Exadata,	  IBM,	...
IntroducIon(4)	  •  FaCE	  (Flash	  as	  Cache	  Extension)	  –	  main	  contribuIons	      –  Write-­‐opImized	  flash	  c...
Contents•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choice	     –  ...
Related	  work•  How	  to	  adopt	  SSDs	  in	  the	  DBMS	  area?	  1.  SSD	  as	  faster	  disk	       –     VLDB	  ‘08,...
Lazy	  Cleaning	  (LC)	  [SIGMOD’11]	  •  Cache	  on	  exit	  •  Write-­‐back	  policy	  •  LRU-­‐based	  SSD	  cache	  re...
Contents•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choices	     – ...
FaCE:	  Design	  Choices1.  When	  to	  cache	  pages	  in	  SSD?	  2.  What	  pages	  to	  cache	  in	  SSD?	  3.  Sync	 ...
Design	  Choices:	  When/What/Sync	  Policy •  When	  :	  on	  entry	  vs.	  on	  exit	   •  What	  :	  clean	  vs.	  dirt...
Design	  Choices:	  SSD	  Cache	  Replacement	  Policy•  What	  to	  do	  when	  a	  page	  is	  evicted	  from	  DRAM	  b...
Design	  Choices:	  SSD	  Cache	  Replacement	  Policy•  LRU	  vs.	  FIFO	  (First-­‐In-­‐First-­‐Out)	      –  VicIms	  a...
Write	  ReducIon	  in	  mvFIFO•  Example	     –  Reduce	  three	  writes	  to	  HDD	  to	  one	  Versions	  of	  Page	  P ...
Design	  Choices:	  SSD	  Cache	  Replacement	  Policy•  LRU	  vs.	  FIFO	                                             LRU...
mvFIFO:	  Two	  OpImizaIons•  Group	  Replacement	  (GR)	  	  	      –  MulIple	  pages	  are	  replaced	  in	  a	  group	...
Group	  Replacement	  (GR)	  •  Single	  group	  read	  from	  SSD   	  (64/128	  pages)	  •  Batch	  random	  writes	  to...
Group	  Second	  Chance	  (GSC)	  •  GR	  +	  Second	  Chance	                         reference	  bit	  is	  ON          ...
Contents•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choice	     –  ...
Recovery	  Issues	  in	  SSD	  Cache•  With	  write-­‐back	  sync	  policy,	  many	  recent	  copies	  of	  data	  pages	 ...
Recovery	  Issues	  in	  SSD	  Cache•  With	  write-­‐back	  sync	  policy,	  many	  recent	  copies	  of	  data	  pages	 ...
Recovery	  Issues	  in	  SSD	  Cache(2)•  Fortunately,	  because	  SSDs	  are	  non-­‐vola=le,	  pages	  cached	  in	  SSD...
Recovery	  in	  FaCE•  Metadata	  checkpoinIng	     –  Because	  a	  data	  page	  entering	  SSD	  cache	  is	  wriXen	  ...
Contents•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choice	     –  ...
Experimental	  Set-­‐Up•  FaCE	  ImplementaIon	  in	  PostgreSQL	      –  3	  funcIons	  in	  buffer	  mgr.	  :	  bufferAllo...
TransacIon	  Throughput                                                    HDD	  only	         SSD	  only	          LC	   ...
Hit	  RaIo,	  Write	  ReducIon,	  and	  I/O	  Throughput                                                                Fl...
Hit	  RaIo,	  Write	  ReducIon,	  and	  I/O	  Throughput                                                              Writ...
Hit	  RaIo,	  Write	  ReducIon,	  and	  I/O	  Throughput                                                                  ...
Recovery	  Performance•  4.4x	  faster	  recovery	  than	  HDD	  only	  approac   h	                Metadata	  recovery	  ...
Contents•  IntroducIon	  •  Related	  work	  •  Flash	  as	  Cache	  Extension	  (FaCE)	     –    Design	  choice	     –  ...
Conclusion•  We	  presented	  a	  low-­‐overhead	  caching	  method	     called	  FaCE	  that	  uIlizes	  flash	  memory	  ...
QnA
Upcoming SlideShare
Loading in...5
×

[G2]fa ce deview_2012

1,222

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,222
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
40
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "[G2]fa ce deview_2012"

  1. 1. Flash-­‐Based  Extended  Cache     for  Higher  Throughput  and  Faster  Recovery Woon-­‐hak  Kang,  Sang-­‐won  Lee,  and  Bongki  Moon  12.  9.  19. 1
  2. 2. Outline•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   2
  3. 3. Outline•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   3
  4. 4. IntroducIon•  Flash  Memory  Solid  State  Drive(SSD )   –  NAND  flash  memory  based  non-­‐volaI le  storage  •  CharacterisIcs   –  No  mechanical  parts   •  Low  access  latency  and  High  random  IOP S   –  MulI-­‐channel  and  mulI-­‐plane   •  Intrinsic  parallelism,  high  concurrency   –  No  overwriIng   •  Erase-­‐before-­‐overwriIng   •  Read  cost  <<  Write  cost   –  Limited  life  span   •  #  of  erasures  of  the  flash  block   4 Image  from  :  hXp://www.legitreviews.com/arIcle/1197/2/
  5. 5. IntroducIon(2)•  IOPS  (IOs  Per  Second)  maXers  in  OLTP  •  IOPS/$:  SSDs  >>  HDDs   –  e.g.  SSD    63  (=  28,495  IOPS  /  450$)  vs.  HDD  1.7  (=  409  IOPS  /  240$)  •  GB/$:  HDDs  >>  SSDs   –  e.g.  SSD    0.073  (=  32GB  /  440$)  vs.    HDD  0.617  (=  146.8GB  /  240$)  •  Therefore,  it  is  more  sensible  to  use  SSDs  to  su pplement  HDDs,  rather  than  to  replace  them   –  SSDs  as  cache  between  RAM  and    HDDs   –  To  provide  both  the  performance  of  SSDs  and  the  c apacity  of  HDDs  as  liXle  cost  as  possible   5
  6. 6. IntroducIon(3) •  A  few  exisIng  flash-­‐based  cache  schemes   –  e.g.  Oracle  Exadata,  IBM,  MS   –  Pages  cached  in  SSDs  are  overwriXen;  the  write  paXern  in  SS Ds  is  random   •  Write  bandwidth  disparity  in  SSDs   –  e.g.  random  write  (25MB/s  =  6,314  x  4KBs/s  )  vs.  sequenIal  w rite  (243MB/s)  vs.     4KB  Random  Throughput  ( Ra=o  Sequen=al/Random Sequen=al  Bandwidth  (MBPS) IOPS)  write Read Write Read WriteSSD  mid  A 28,495 6,314 251 243 9.85SSD  mid  B 35,601 2,547 259 80 8.04HDD  Single     409 343 156 154 114.94HDD  Single  (x8) 2,598 2,502 848 843 86.25 6
  7. 7. IntroducIon(4)  •  FaCE  (Flash  as  Cache  Extension)  –  main  contribuIons   –  Write-­‐opImized  flash  cache  scheme:  e.g.  3x  higher  throughput  t han  the  exisIng  ones   –  Faster  database  recovery  support  by  exploiIng  the  non-­‐volaIle  c ache  pages  in  SSDs  for  recovery:  e.g.  4x  faster  recovery  Ime DRAM Random  Read   Sequen=al  Write   (Low  cost) (à  High  throughput) Random   Read Non-­‐vola=lity  of  flash   SSD cache  for  recovery   Random   (faster  recovery) Write HDD 7
  8. 8. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   8
  9. 9. Related  work•  How  to  adopt  SSDs  in  the  DBMS  area?  1.  SSD  as  faster  disk   –  VLDB  ‘08,  Koltsidas  et  al.,  “Flashing  up  the  Storage  Layer”   –  VLDB  ’09,  Canim  et  al.  “An  Object  Placement  Advisor  for  DB2  Usin g  Solid  State  Storage”   –  SIGMOD  ‘08,  Lee  et  al.,  "A  Case  for  Flash  Memory  SSD  in  Enterpris e  Database  ApplicaIons"    2.  SSD  as  DRAM  buffer  extension   –  VLDB  ’10,  Canim  et  al.,  “SSD  Bufferpool  extensions  for  Database  s ystems”   –  SIGMOD  ’11,  Do  et  al.,  “Turbocharging    DBMS  Buffer  Pool  Using  SS Ds” 9
  10. 10. Lazy  Cleaning  (LC)  [SIGMOD’11]  •  Cache  on  exit  •  Write-­‐back  policy  •  LRU-­‐based  SSD  cache  replacement  policy   –  To  incur  almost  random  writes  against  SSD  •  No  efficient  recovery  mechanism  provided Flash  hit Random  writes  Evict RAM Buffer (LRU) Flash memory SSD Fetch  on  miss Stage  out  dirty  pages HDD 10
  11. 11. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choices   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   11
  12. 12. FaCE:  Design  Choices1.  When  to  cache  pages  in  SSD?  2.  What  pages  to  cache  in  SSD?  3.  Sync  policy  b/w  SSD  and  HDD  4.  SSD  Cache  Replacement  Policy   12
  13. 13. Design  Choices:  When/What/Sync  Policy •  When  :  on  entry  vs.  on  exit   •  What  :  clean  vs.  dirty  vs.  both   •  Sync  policy  :  write-­‐thru  vs.  write-­‐back   On  exit  :  dirty  pages  athe  ell  as   Sync  policy  :  for   s  w performance,  write-­‐back  sync   clean  pages Evict     RAM Buffer Flash as Cache Extension (LRU)Fetch  on  miss Stage  out  dirty  pages   HDD 13
  14. 14. Design  Choices:  SSD  Cache  Replacement  Policy•  What  to  do  when  a  page  is  evicted  from  DRAM  buffe r  and  SSD  cache  is  full  •  LRU  vs.  FIFO  (First-­‐In-­‐First-­‐Out)   –  Write  miss:  LRU-­‐based  vicIm  selecIon,  write-­‐back  if  dirt y  vicIm,  and  overwrite  the  old  vicIm  page  with  the  new  page  being  evicted   –  Write  hit:  overwrite  the  old  copy  in  flash  cache  with  the   updated  page  being  evicted   Random   writes   Evict     RAM Buffer against  SSD Flash as Cache Extension (LRU) HDD 14
  15. 15. Design  Choices:  SSD  Cache  Replacement  Policy•  LRU  vs.  FIFO  (First-­‐In-­‐First-­‐Out)   –  VicIms  are  chosen  from  the  rear  end  of  flash  cache   :  “sequenIal  writes”  against  SSD   –  Write  hit  :  no  addiIonal  acIon  is  taken  in  order  not   to  incur  random  writes.   •  mulIple  versions  in  SSD  cache   Evict     RAM Buffer Flash as Cache Extension (LRU) Multi-Version FIFO (mvFIFO) HDD 15
  16. 16. Write  ReducIon  in  mvFIFO•  Example   –  Reduce  three  writes  to  HDD  to  one  Versions  of  Page  P Mul=ple   Choose   Invalidated     Invalidated   Write-­‐back   version Discard Vic=m version to  HDD Page  P-­‐v2 Page  P-­‐v1 Page  P-­‐v3 Evict     RAM Buffer Flash as Cache Extension (LRU) HDD 16
  17. 17. Design  Choices:  SSD  Cache  Replacement  Policy•  LRU  vs.  FIFO   LRU FIFO Write  paXern Random Sequen=al Write  performance Low High   #  of  copy  pages Single MulIple Space  uIlizaIon High Low Hit  raIo  &  write  reducIon High Low •  Trade-­‐off  :  hit-­‐raIo  <>  write  performance   –  Write  performance  benefit  by  FIFO  >>  Performance  gain  from  higher  hit  raIo  by  LRU   17
  18. 18. mvFIFO:  Two  OpImizaIons•  Group  Replacement  (GR)       –  MulIple  pages  are  replaced  in  a  group  in  order  to  exploi t  the  internal  parallelism  in  modern  SSDs   –  Replacement  depth  is  limited  by  parallelism  size  (chann el  *  plane)   –  GR  can  improve  SSD  I/O  throughput  •  Group  Second  Chance  (GSC)       –  GR  +  Second  chance   –  if  a  vicIm  candidate  page  is  valid  and  referenced,  will  re -­‐enque  the  vicIm  to  SSD  cache   •  A  variant  of  “clock”  replacement  algorithm  for  the  FaCE   –  GSC  can  achieve  higher  hit  raIo  and  more  write  reducI ons   18
  19. 19. Group  Replacement  (GR)  •  Single  group  read  from  SSD  (64/128  pages)  •  Batch  random  writes  to  HD RAM Check  valid  and   dirty  flag   D   Flash  Cache  •  Single  group  write  to  SSD   becomes  FULL 2.  Evict     RAM Buffer Flash as Cache Extension (LRU)1.  Fetch  on  miss HDD 19
  20. 20. Group  Second  Chance  (GSC)  •  GR  +  Second  Chance   reference  bit  is  ON Check  reference  bit,   RAM if  true  galid  them   Check  vave  and   dirty  flag 2nd  chance Flash  Caches   become  FULL 2.  Evict     RAM Buffer Flash as Cache Extension (LRU)1.  Fetch  on  miss HDD 20
  21. 21. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   21
  22. 22. Recovery  Issues  in  SSD  Cache•  With  write-­‐back  sync  policy,  many  recent  copies  of  data  pages  ar e  kept  in  SSD,  not  in  HDD.  •  Therefore,  database  in  HDD  is  in  an  inconsistent  state  ayer  syste m  failure     New  version   of  page  PRAM SSD Mapping Information (Metadata) Crash Inconsistent   as Cache Extension Flash state Old  version   of  page  P HDD 22
  23. 23. Recovery  Issues  in  SSD  Cache•  With  write-­‐back  sync  policy,  many  recent  copies  of  data  pages  are  kept   in  SSD,  not  in  HDD.  •  Therefore,  database  in  HDD  is  in  an  inconsistent  state  ayer  system  failu re    •  In  this  situa=on,  one  recovery  approach  with  flash  cache  is  to  view  da tabase  in  harddisk  as  the  only  persistent  DB  [SIGMOD  11]   –  Periodically  checkpoint  updated  pages  from  SSD  cache  as  well  as  DRAM  bu ffer  to  HDD     New  version   of  page  PRAM SSD Mapping Information Excessive   Checkpoint Checkpoint Checkpoint   Flash as Cache Extension Cost Persistent     DB HDD Old  version   of  page  P 23
  24. 24. Recovery  Issues  in  SSD  Cache(2)•  Fortunately,  because  SSDs  are  non-­‐vola=le,  pages  cached  in  SSD  are  al ive  even  ayer  system  failure.  •  SSD  mapping  informaIon  has  gone  •  Two  approaches  for  recovering  metadata.   1.  Rebuild  lost  metadata  by  scanning  the  whole  pages  cached  in  SSD  (Naïve  approach)  –  Time-­‐consuming  scanning   2.  Write  metadata  persistently  whenever  metadata  is  changed  [DaMon  11]  –  Run-­‐Ime  overhead  for  managing  metadata  persistently   New  version   of  page  PRAM SSD Mapping Information Full  Scanning Flash as Cache Extension Flush  every  update Persistent     DB HDD Old  version   of  page  P 24
  25. 25. Recovery  in  FaCE•  Metadata  checkpoinIng   –  Because  a  data  page  entering  SSD  cache  is  wriXen  t o  the  rear  in  chronological  order,  metadata  can  be   wriXen  regularly  in  a  single  large  segment   64K     RAM Recovery  :   SSD Metadata page   info. Mapping Segment Scanning   segment Periodically  checkpoint   Flash as Cache Extension Crash metadata HDD 25
  26. 26. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   26
  27. 27. Experimental  Set-­‐Up•  FaCE  ImplementaIon  in  PostgreSQL   –  3  funcIons  in  buffer  mgr.  :  bufferAlloc(),  getFreeBuffer(),  buff erSync()   –  2  funcIons  in  bootstrap  for  recovery  :  startupXLOG(),  initBuff erPool()  •  Experiment  Setup   –  Centos  Linux   –  Intel  Core  i7-­‐860  2.8  GHz  (quad  core)  and  4G  DRAM   –  Disks  :  8  RAIDed  15k  rpm  Seagate  SAS  HDDs  (146.8GB)   –  SSD  :  Samsung  MLC  (256GB)  •  Workloads   –  TPC-­‐C  with  500  warehouses  (50GB)  and  50  concurrent  clients   –  BenchmarkSQL   27
  28. 28. TransacIon  Throughput HDD  only   SSD  only   LC   FaCE   FaCE+GR   FaCE+GSC   7000   3.9x 6000   FaCE+GSC FaCE+GR 3.1x 5000  Transac=ons  Per  minute 2.6x 4000   FaCE-­‐basic 2.6x 2.4x 2.1x 3000   1.5x LC 2000   SSD  only 1000   HDD  only 0   4   8   12   16   20   24   28   |Flash  cache|/|Database|  (%) 28
  29. 29. Hit  RaIo,  Write  ReducIon,  and  I/O  Throughput Flash  Cache  Hit  Ra=o 100   95   90   LC 85   FaCE+GSCHit  ra=o  (%) 80   75   FaCE-­‐basic 70   FaCE+GR 65   60   2GB   4GB   6GB   8GB   10GB   Flash  cache  size LC   FaCE   FaCE+GR   FaCE+GSC   29
  30. 30. Hit  RaIo,  Write  ReducIon,  and  I/O  Throughput Write  Reduc=on  Ra=o  By  Flash  Cache Write  Reduc=on  Ra=o   100   Flash  Cache  Hit  Ra=o By  Flash  Cache 100   100   90   95   90   90   80   85   80   LC Ra=o(%)  Hit  ra=o  (%) Ra=o(%)   70   80   70   FaCE+GSC 75   60   60   FaCE-­‐basic 70   50   50   FaCE+GR 65   60   40   40   2GB   2GB   4GB   6GB   4GB  8GB   10GB   6GB   2GB   4GB   8GB   6GB   8GB   10GB   10GB   Flash  cache  size Flash  cache  size Flash  cache  size LC   FaCE   FaCE+GR   FaCE+GSC   FaCE   LC   FaCE+GR   LC   FaCE+GSC   FaCE   FaCE+GR   FaCE+GSC   30
  31. 31. Hit  RaIo,  Write  ReducIon,  and  I/O  Throughput Write  Reduc=on  Ra=o   Throughput  of  4KB-­‐page  I/O Throughput  of  4KB-­‐page   Flash  Cache  Hit  Ra=o 16000   Throughput  of  4KB-­‐page   By  Flash  Cache I/O 100   I/O 14000   100   FaCE+GSC 16000   16000   95   14000   90   12000   14000   90   12000   FaCE+GRThroughput  (4KB) 10000   80   12000   Throughput  (4KB) 85   10000   Hit  ra=o  (%) Throughput  (4KB) Ra=o(%)   10000   80  8000   70   8000   FaCE-­‐basic 8000   75  6000   6000   60   6000   70  4000   4000   4000   65  2000   LC 50   2000   2000   60   40   0   0   2GB   4GB   6GB   8GB   10GB   2GB   4GB   0   6GB  8GB   10GB   6GB   2GB   4GB   6GB   8GB   10GB   2GB   4GB   8GB   10GB   Flash  cache  size Flash  cache  size 4GB   6GB   2GB   8GB   10GB   Flash  cache  size Flash  cache  size Flash  cache  size LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   LC   FaCE   FaCE+GR   FaCE+GSC   31
  32. 32. Recovery  Performance•  4.4x  faster  recovery  than  HDD  only  approac h   Metadata  recovery  :  2 redo  Ime  :me  :  823 redo  I  186 32
  33. 33. Contents•  IntroducIon  •  Related  work  •  Flash  as  Cache  Extension  (FaCE)   –  Design  choice   –  Two  opImizaIons  •  Recovery  in  FaCE  •  Performance  EvaluaIon  •  Conclusion   33
  34. 34. Conclusion•  We  presented  a  low-­‐overhead  caching  method   called  FaCE  that  uIlizes  flash  memory  as  an  ext ension  to  a  DRAM  buffer  for  a  recoverable  data base.  •  FaCE  can  maximized  the  I/O  throughput  of  a  fla sh  caching  device  by  turning  small  random  writ es  to  large  sequenIal  ones  •  Also,  FaCE  takes  advantage  of  the  non-­‐volaIlity  of  flash  memory  to  accelerate  the  system  resta rt  from  a  failure.   34
  35. 35. QnA
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×