Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   i	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   ii	
  
	
  
©	
  2012	
  Hired	
  Brains	
  In...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   1	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   2	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   3	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   4	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   5	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   6	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   7	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   8	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   9	
  
	
  
©	
  2012	
  Hired	
  Brains	
  Inc...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   10	
  
	
  
©	
  2012	
  Hired	
  Brains	
  In...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   11	
  
	
  
©	
  2012	
  Hired	
  Brains	
  In...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   12	
  
	
  
©	
  2012	
  Hired	
  Brains	
  In...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   13	
  
	
  
©	
  2012	
  Hired	
  Brains	
  In...
On	
  the	
  Persistence	
  of	
  Memory	
  (in	
  Database	
  Systems)	
   14	
  
	
  
©	
  2012	
  Hired	
  Brains	
  In...
Upcoming SlideShare
Loading in …5
×

Persistence of memory: In-memory Is Not Often the Answer

1,197 views

Published on

An examination of how in-memory databases require careful revision of application code to ensure getting the bang for the buck.

Published in: Data & Analytics
  • Thanks for posting. Br. Ali Hosseini
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Persistence of memory: In-memory Is Not Often the Answer

  1. 1. On  the  Persistence  of  Memory  (in  Database  Systems)   i     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved         On  the  Persistence  of  Memory…   In  Database  Systems       Picture  credit  Creative  Commons       By  Neil  Raden   Hired  Brains,  Inc.                                              December,  2012      
  2. 2. On  the  Persistence  of  Memory  (in  Database  Systems)   ii     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     Table  of  Contents     Executive  Summary   1   The  Basics   1   Database  Memory  and  Processing  Models   3   In-­‐Memory  Database   4   Why  is  in-­‐memory,  a  fairly  old  concept,  interesting  again?   6   Limitations  of  iMDB   8   Cost   8   Persistence   8   Volume   9   Dual-­Purpose  OLTP  and  Analytics   9   Not  so  “green”   10   The  Hybrid  DBMS   10   Compare  and  Contrast   12   Conclusion   12   ABOUT  THE  AUTHOR   14  
  3. 3. On  the  Persistence  of  Memory  (in  Database  Systems)   1     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     Executive  Summary     Recent  drop  in  computer  memory  prices  and  the  introduction  of  early  implementations   of  In-­‐Memory  database  solutions  have  recently  raised  the  level  of  interest  in  in-­‐memory   databases,  but  the  topic  of  in-­‐memory  databases  is  not  new.  In  fact,  there  are  literally   dozens  of  in-­‐memory  database  products,  some  in  production  for  decades,  but  due  to   the  prohibitive  cost  differential  between  memory-­‐based  systems  and  disk-­‐based   systems,  none  have  found  a  place  beyond  certain  niche  markets.  But  the  drastic  and   remarkable  (there  is  hardly  a  word  to  describe  it)  drop  in  the  cost  of  memory  combined   with  an  equally  remarkable  growth  in  density  and  capacity  is  driving  the  discussion  into   the  mainstream  of  computing  architectures.       For  the  purposes  of  discussion,  we  refer  to  in-­‐memory  databases  systems  as  iMDB  and   current  relational  database  systems  incorporating  large  memory  models  with  attached   storage  (including  traditional  magnetic  disk  and  solid-­‐state  devices)  as  hybrid-­‐DBMS.   Though  the  discussion  is  occasionally  technical,  our  conclusions  are  that:   • iMDB  are  leveraging  lower-­‐cost  RAM  for  storage  but  still  lack  persistence  and   data  scalability  while  limiting  the  types  of  solutions  supported  by  iMDB   architecture.   • Hybrid-­‐DBMS  is  a  proven  technology  and  provides  high  performance  and  flexible   architecture  to  support  a  variety  of  analytics  applications.       The  Basics    
  4. 4. On  the  Persistence  of  Memory  (in  Database  Systems)   2     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved   All  database  management  systems  (DBMS),  in  fact,  virtually  all  programs  in  conventional   computing  environments  behave  exactly  the  same  way.  A  central  processing  unit  (CPU)   performs  a  single  very  low-­‐level  instruction  on  a  single  piece  of  data.  While  complex   application  programs  like  DBMS  have  many  layers  of  functionality  and  can  be  described   logically  as  a  set  of  higher-­‐level  interworking  pieces,  the  CPU  has  utterly  no  insight  into   this,  it  just  chugs  along  one  instruction  at  a  time.  If  you  were  to  sit  inside  a  CPU  and   watch  its  stream  of  sequential  processes,  you  would  be  unable  to  determine  what  the   controlling  program  was  doing.  So  database  software,  or  really,  any  software,  is  just  a   logical  structure  that  encapsulates  all  of  the  smaller  steps.  When  things  get  calculated,   they  bear  no  resemblance  to  the  whole.  A  CPU  doesn’t  know  what  a  join  or  an  index  is.       How  those  bits  of  work  are  presented  to  the  CPU  is  the  heart  of  the  application  design.   In  other  words,  though  there  is  no  difference  in  how  CPU’s  execute  from  one  application   to  another,  the  order  of  those  instructions  is  the  key  to  performance.     Each  step  in  execution  is  composed  of  a  single  instruction  and  a  single  piece  of  data   (though  today’s  CPU’s  are  composed  of  multiple  “cores,”  essentially  multiple  CPU’s  on  a   single  chip).  The  instruction  and  the  data  have  to  be  presented  to  the  CPU  through   memory,  either  system  RAM  or  a  memory  cache  on  the  CPU  itself.  It  makes  no   difference  if  the  application  is  “in-­‐memory”  or  disk-­‐based,  the  CPU  has  to  be  presented   with  the  instruction  (actually,  the  “instruction  set”  is  burned  into  the  CPU,  what  is   presented  to  it  is  an  instruction  for  which  instruction  to  execute).  For  this  reason,  an  in-­‐ memory  architecture,  where  all  instructions  and  data  are  in  RAM  should,  in  theory,   provide  superior  performance  compared  to  DBMS  that  must  fetch  data  from  remote   mechanical  disk  drives.       Solid-­‐state  drives  (SSD)  mentioned  above  use  solid-­‐state  memory  chips,  typically  flash   memory  (NAND),  instead  of  spinning  magnetic  disk  drives.  Flash  memory  (NAND),  is  less   expensive  and  slower  than  RAM/SRAM,  but  it  is  non-­‐volatile,  meaning,  it  retains  data  
  5. 5. On  the  Persistence  of  Memory  (in  Database  Systems)   3     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     without  power.  It  does  not  lose  data  in  the  case  of  a  system  shutdown.  RAM  is  volatile   and  must  be  powered  continuously  and  requires  backup,  typically  conventional  disk   drives  for  reliability.       One  could  say  that  a  DBMS  with  SSD  instead  of  traditional  disks  could  be  an  in-­‐memory   device,  but  there  are  two  fundamental  differences.  First,  the  “memory”  chips  of  an  SSD   are  part  of  a  disk  drive  “card”  or  assembly  that  uses  the  same  block  addressing  as  the   disks  it  replaces.  In  other  words,  even  though  the  seek  time  finding  data  on  an  SSD  is  at   least  an  order  of  magnitude  greater  than  a  spinning  magnetic  disk  (this  is  a   generalization),  there  is  still  a  call  for  external  data,  handled  by  the  disk  controller  and   passed  to  RAM.  An  interesting  arrangement,  typically  used  for  add-­‐on  accelerators,  not   primary  database  operations  are  SSD’s  constructed  from  SRAM,  boosting  the  seek  time   on  the  drives.  This  is  a  special-­‐purpose  architecture  and  very  expensive  and  not  further   considered  here.     Database  Memory  and  Processing  Models     To  clear  up  confusion  between  various  models  for  memory  in  databases,  it’s  useful  to   describe  the  predominant  versions.  There  is  a  difference  between  memory  models  in   database  systems  for  processing.  The  two  predominant  memory  models  for  the  most   common  database  systems  are  shared  memory  and  shared  nothing.  In  both,  memory  is   used  only  for  processing,  not  for  persistent  storage.  This  is  the  essential  difference   between  today’s  iMDBs  and  more  conventional  on-­‐disk  or  hybrid  systems.       In  the  shared  memory  model,  all  database  operations  use  the  same  single  aggregation   of  memory  and  the  system  allocates  its  memory  and  processing  tasks.  All  memory  is   available  to  every  processor.  In  a  shared  nothing  system,  each  separate  node  of   processors  and  memory  do  their  own  work  in  parallel  and  are,  typically,  controlled  by  a  
  6. 6. On  the  Persistence  of  Memory  (in  Database  Systems)   4     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved   master  node  (which  can  be  physical  or  virtual).  In  reality,  nodes  in  a  shared  nothing   environment  may,  themselves,  operate  as  independent  shared  memory  nodes.  But  in   neither  case  is  data  stored  in  memory  until  it  is  called  for.  The  exception  is  when  data  is   cached  (frequently  used  data  in  “pinned”  in  memory),  but  it  is  still  volatile  and  the  data   can  be  flushed  at  any  time.       iMDB  operate  more  or  less  like  a  shared  memory  systems,  but  everything,  including   operating  systems,  software  programs  (executables),  workspace,  indexes  and  data  are   stored  in  RAM.  When  these  systems  are  scaled  out  with  multiple  nodes  connected  by  a   network,  they  operate  more  like  a  grid  or  distributed  network  than  like  a  true  MPP-­‐ engineered  system.  However,  concepts  of  shared  memory  versus  shared  disk  (shared   nothing)  are  a  little  obsolete  now  as  CPU’s  themselves  are  multi-­‐core,  meaning,  the   processors  themselves  are  capable  of  parallel  processing,  provided  the  software   program  (DBMs)  has  been  designed  to  take  advantage  of  it)     This  description  is  a  simplification  and  there  are  many  exceptions,  but  in  general,  no   database  management  system  stores  data  persistently  in  memory  except,  of  course,   iMDB.  The  difference  between  the  various  memory  models  described  above  is  how   memory  is  used  for  processing  data.     In-­‐Memory  Database     It  is  an  unassailable  truth  that  data  processed  from  memory  is  orders  of  magnitude   faster  than  retrieving  it  from  a  disk  drive,  but  that  is  only  a  small  part  of  the  story.   Historically,  CPU  processors  have  been  “I/O  bound,”  meaning  they  spent  a  significant   amount  of  time  waiting  for  the  requested  data  to  arrive,  requiring  extreme   countermeasures  in  software  design  to  minimize  the  latency.  With  data  streaming  to   processors  at  the  speed  of  random-­‐access  memory  (RAM),  just  the  opposite  situation  
  7. 7. On  the  Persistence  of  Memory  (in  Database  Systems)   5     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     can  occur  –  the  CPU’s  may  become  flooded  with  data   and  unable  to  process  as  quickly  as  it  is  presented.  The   point  cannot  be  stressed  enough  –  merely  boosting  the   available  RAM  does  not  guarantee  smooth,  faster   executions  of  existing  programs.  This  turn  of  events  calls   for  careful  engineering  and  balance.  In  other  words,   performance  of  complex  applications  is  rarely  resolved   by  changing  one  thing,  it  usually  requires  rethinking  of  the  whole  approach.  The  result  is   that  software  migration  to  in-­‐memory  usually  requires  a  great  deal  of  re-­‐work;  It  is  not   just  move  and  drop.      Even  the  notion  of  iMDB  is  a  bit  of  a  misnomer  as  there   is  still  the  requirement  for  separate  conventional  storage   devices  for  mirroring  everything  for  persistence,  and   keeping  the  iMDB  refreshed  and  reliable.  Systems  can   fail,  which  means  in-­‐memory  systems  still  have  to   maintain  multiple  copies  of  the  data,  and  a  complete   reload  if  the  system  fails.  Adding  all  of  these  factors   together  can  make  the  effort  quite  expensive  despite  the   seemingly  reasonable  price  of  memory  today  (though  at  multiple  terabytes,  you  will  feel   the  pinch).  In  addition,  to  make  maximum  use  of  RAM,  all  database  systems  use   compression  of  data,  to  one  degree  or  another.  IMDBs  typically  employ  aggressive   compression  algorithms  to  maximize  the  amount  of  data  that  can  be  put  in  working   memory.  Back-­‐up  of  an  iMDB  is  usually  lightly-­‐  or  un-­‐compressed  so  it  can  be  read  by   other  processes,  among  other  reasons.  Assuming  a  realistic  3.5x  compression  for  an   iMDB  (not  all  RAM  is  available  for  the  data),  the  back-­‐up  drives  will  need  to  be  5X  the   size  of  RAM,  and  there  may  be  multiple  archives,  and  the  backups  themselves  will  likely   be  mirrored.  With  even  average-­‐sized  analytical  data  warehouses  today  running  about   50  terabytes  (there  are,  of  course  much  larger  ones),  an  iMDB  to  accommodate  those   The  point  cannot  be   stressed  enough  –  merely   boosting  the  available   RAM  does  not  guarantee   smooth,  faster  executions   of  existing  programs.     Even  the  notion  of  iMDB   is  a  bit  of  a  misnomer  as   there  is  still  the   requirement  for  separate   conventional  storage   devices  for  mirroring   everything  for   persistence,  and  keeping   the  iMDB  refreshed  and     reliable.  
  8. 8. On  the  Persistence  of  Memory  (in  Database  Systems)   6     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved   would  need  75-­‐100TB  of  separate  disk  drives  to  handle  back-­‐ups,  snapshots,  logs  and   staging  areas.     Another  thing  to  consider  is  that  a  database  still  has  to  perform  all  of  the  database   functions,  from  loading  data  to  presenting  it  as  the  result  of  a  query.  Conventional   relational  database  technology,  including  those  platforms  are  that  are  designed   specifically  for  data  warehousing  and  analytical  work,  as  opposed  to  transactional   processing,  must  employ  a  host  of  services  to  be  useful  to  an  enterprise  including:   • Workload  management  for  efficient  management  of  the  resources     • Security   • Reliability     • High  availability   • Use  of  performance  statistics  for  query  optimization.       They  must  also  support,  in  addition  to  traditional  row-­‐based  schema,  columnar   organization  of  the  data  which  is  particularly  effective  for  wide  tables  with  many   attributes,  but  it  is  less  effective  with  more  normalized  schema  and  has  some  serious   drawbacks  in  the  ability  to  update  the  database  in  real-­‐time.  But  columnar  orientation  is   not  a  feature  limited  to  iMDBs  –  most  analytical  database  systems  incorporate  or  even   operate  solely  in  columnar  mode.       Why  is  in-­‐memory,  a  fairly  old  concept,  interesting  again?       iMDBs  have  been  used  for  quite  some  time  but  they  have  always  been  limited  primarily   by  three  factors:  The  cost  of  memory,  size  of  database,  and  the  persistence  of  data.   Today,  a  dollar  will  buy  500  to  1000  times  as  much  memory  as  it  did  in  1995,  and  the   capacity  per  square  inch  of  the  chips  has  increased  in  inverse  proportion.  Memory   speeds  increased  as  well,  though  not  as  dramatically.  If  the  amount  of  data  that  could  
  9. 9. On  the  Persistence  of  Memory  (in  Database  Systems)   7     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     be  stored  in  early  in-­‐memory  systems  was  too  small  for  most  applications,  1000  times   more  memory  might  be  enough  for  in-­‐memory  to  be  feasible.         This  extremely  simplified  diagram  depicts  the  essential  (but  certainly  not  all)  differences   between  an  iMDB  and  a  hybrid-­‐DBMS.  iMDB  maximizes  the  use  of  RAM  but  uses   essentially  the  same  hardware  architecture  of  2  CPU’s  with  levels  of  on-­‐board  cache,   and  RAM  for  holding  the  entire  database,  the  database  software,  working  space,  caches   and  embedded  functionality.  The  only  difference  in  the  hybrid-­‐DBMS  is  less  reliance  on   RAM  and  the  ability  to  address  vastly  greater  amounts  of  data  from  the  storage   subsystem.  The  hybrid-­‐DBMS  has  documented  databases  of  greater  than  a  petabyte.   iDBMS  typically  scale  out  to  16  servers  with  up  to  1  terabyte  of  RAM  each,  but  with  a   significant  amount  of  RAM  taken  up  with  operating  system,  working  memory,  etc,.   Therefore  even  with  5x  compression,  the  maximum  amount  of  uncompressed  data  per  
  10. 10. On  the  Persistence  of  Memory  (in  Database  Systems)   8     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved   server  is  no  more  than  40Tb.  Given  the  expense  of  these  large  iMDB  systems,  scaling  out   to  sizes  that  are  needed  today  is  difficult.     Limitations  of  iMDB     In-­‐memory  databases  are  constrained  by  key  overwhelming  limitations:       • No  matter  how  inexpensive  RAM  is  today  compared  to  historical  cost,  it  is  still   considerably  more  expensive  than  its  alternatives  limiting  its  useful  for   enterprise  level  systems.     • Data  cannot  persist  in  memory  indefinitely.  It  is  inevitable  that  something  will   fail,  which  requires  mechanisms  to  protect  the  data  that  can  erode  the  value   proposition.     • With  today’s  data  volumes,  it  is  still  not  practical  to  use  an  in-­‐memory  approach   for  a  data  warehouse.   • iMDB  rely  on  the  system  being  up  24/7.     Cost   Though  RAM  is  10,000  times  faster  to  read  than  a  mechanical  disk  drive,  data  volumes   today  are  enormous  and  growing.  A  petabye-­‐sized  in  memory  database  would  cost   more  than  $5  million,  perhaps  twice  that.    SSD  for  that  capacity  would  cost  1/5  to  1/10   the  price.  And  a  hybrid-­‐DBMS,  hot/warm/cold  hierarchical  storage  architecture  would   cost  far  less  than  that.   Persistence   In-­‐memory  architecture  still  requires  conventional  storage.  RAM  is  volatile  and  if   something  fails,  or  even  just  hiccups,  there  can  be  data  loss.  Therefore,  everything  in   memory  has  to  have  a  copy  on  less  volatile  storage  devices.  Updating  the  memory   requires  log  files,  “Snapshots”  and  “checkpoints  which  can  slow  down  processing).    
  11. 11. On  the  Persistence  of  Memory  (in  Database  Systems)   9     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     Volume   In-­‐memory  cannot  economically,  or  even  practically,  scale  to  the  volumes  of  today’s   data  warehouses.  Ten  years  ago,  a  terabyte-­‐size  data  warehouse  was  remarkable,  but   today,  there  are  dozens,  perhaps  even  more  than  a  hundred  greater  than  a  petabyte,   one  thousand  times  larger.  Projections  are  that  this  growth  rate  is  not  diminishing.   Dual-­Purpose  OLTP  and  Analytics   Some  iMDB  products  promise  the  ability  to  perform  OLTP  and  analytical  processing  on   the  same  platform,  with  the  same  data.  This  would  be  a  real  advantage  as  it  would   alleviate  need  to  extract  and  transform  data  from  operational  systems  and  provide   analytical  support  without  additional.  Unfortunately,  this  is  currently  impossible.       iMDB  platforms  generally  cannot  support  OLTP  because  they  have  to  wait  for  a   transaction  to  complete  on  disk  to  be  ACID  compliant.  When  data  is  updated  in   memory,  it  is  held  in  log  files  usually  stored  on  SSD  drives.    iMDB  platforms  use  this  disk   based  “persistent”  layer  to  “weather”  a  node  failure,  which,  in  a  narrow  sense,  suggests   they  have  ACID  properties.    When  the  iMDB  node  comes  back  up  (after  the  failed  part  is   replaced  or  the  cold  standby  node  takes  over),  the  data  that  is  resident  on  the  disk   “persistent”  layer  is  reloaded  back  into  memory.    It  can  be  done  in  one  of  two  ways  –   “Lazy”,  where  the  data  is  reloaded  as  queries  enter  the  system  and  request  a  specific   table  (which  doesn’t  really  make  sense  since  the  iMDB  appears  in  memory  as  one   dimensional  table),  or  “Full”  where  queries  must  wait  until  all  the  data  is  reloaded.    In   both  cases,  the  log  files  stored  on  disk  or  flash  have  to  be  read  and  applied1 .         There  are  features  to  handle  different  kinds  of  failure,  though.  Both  the  SSD  area  and   Disk  Persistent  layer  have  RAID  capability  to  cover  for  a  disk  failure.  So,  if  a  node   has  a  problem,  but  keeps  power,  then  all  “may  be”  ok.    It  is  an  “error  dependent”  issue.    If  there  is  a  problem  with  a  memory  chip,  it  is  unlikely  the  data  will  survive  -­‐-­‐  requiring  a                                                                                                                    
  12. 12. On  the  Persistence  of  Memory  (in  Database  Systems)   10     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved   total  reload..  If  a  node  loses  power,  then  a  total  reload  of  all  the  data  that  was  on  that   node  is  required.   Not  so  “green”   At  a  time  when  most  vendors  are  formulating  a  “green  ”  message,  it  turns  out  that   iMDBs  require  a  lot  of  power,  considerably  more  than  spinning  drives  and  significantly   more  than  solid-­‐state  drives  (SSD  –  more  on  this  below))  RAM  is  volatile  and  needs  to  be   powered  24/7  if  the  data  is  to  persist.     The  Hybrid  DBMS   IMDB  vendors  often  portray  disk-­‐based  systems  as  dinosaurs  that  have  outlived  their   usefulness,  but  in  fact,  they  are  the  result  of  30  years  of  research  and  development  by   some  of  the  most  brilliant  minds  in  the  technology  industry  and  have  hardly  been   standing  still.  In  the  same  way  relational  database  technology  gradually  gained  new   hardware  capabilities  and  evolved  to  become  hybrid-­‐DMBS,  it  seems  likely  that  the   major  database  vendors  will  continue  to  evolve  to  leverage  the  advantages  of  more   memory  over  disk  drives.  The  dramatic  cost  reductions  of  memory  have  benefits  that   accrue  to  hybrid-­‐DMBSs  too  –  solid-­‐state  disk  drives  replacing  traditional  magnetic   drives  with  improvements  in  I/O  speed.  Teradata  Virtual  Storage  for  example   automatically  manages  the  movement  of  the  hot  and  the  cold  data.  Large  memory   models  are  common,  too,  even  if  the  persistent  data  remains  on  attached  storage   instead  of  completely  in  memory.     Another  consideration  is  that  for  most  database  applications,  there  is  a  clear  difference   between  hot  and  cold  data.  In  other  words,  data  that  is  used  at  the  moment  as  opposed   to  data  that  is  use  less  frequently.  This  tilts  the  decision  between  disk-­‐only  and  in-­‐ memory  to  an  in-­‐between  alternative,  a  hybrid  scheme  with  large  memory,  SDD  drives,   and  less  expensive  slower  HDD  for  warm  or  cold  data.  Hybrid-­‐DBMS  leverage  the  speed   of  SSD  to  reduce  query  response  time  delays  by  cutting  the  painful  delay  times   introduced  by  lengthy  I/O  queues  in  HDD  storage.    A  query  requires  many  I/O  
  13. 13. On  the  Persistence  of  Memory  (in  Database  Systems)   11     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     operations  to  complete  so  the  time  spent  with  I/O  requests  in  storage  queues  has  a   direct  impact.  Not  only  does  the  speed  and  parallel  channel  capability  of  SSD  result  in   40X  faster  I/O  completions,  but  the  queue  in  the  HDD  are  shortened  by  aiming  80%  of   I/O  at  the  SSD,  this  can  result  up  to  a  60X  improvement  in  average  response  times.     A  Hybrid  scheme  requires  not  only  a  physical  assemblage  of  devices,  but  also  an   intelligent  data  manager  that  continually  and  transparently  optimizes  the  architecture   by  moving  data  to  its  best  location.  The  figure  below  represents  Teradata’s  version  of   such  as  system.2       Notice  that  in  this  scheme,  each  node  is  balanced  with  a  combination  of  CPU’s  and  their   characteristics,  the  amount  of  RAM  and  the  storage  devices.  This  provides  for  optimum   balance  between  processing,  memory  and  addressable  storage  which  leads  to  optimal   performance.  It  does,  however,  somewhat  limit  configuration  flexibility  as  the  drives   and  CPU’s  are  fixed.                                                                                                                     2  Teradata  are  working  on  extending  the  data  management  to  the  memory  layer      
  14. 14. On  the  Persistence  of  Memory  (in  Database  Systems)   12     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved   Compare  and  Contrast   Today  there  are  two  ways  to  store  data  electronically:  on  arrays  of  solid-­‐state  memory   chips  (on  either  a  memory  bus  or  on  SSD)  or  on  magnetic  disk  drives.  Solid-­‐state  chips   are  obviously  faster  than  magnetic  drives  (although  in  some  cases,  the  differential  can   be  overcome  with  good  platform  design  and  workload  management).    Solid-­‐state  chips   are  considerably  more  expensive  than  magnetic  drives,  and  volatile  RAM  chips  are   considerably  more  expensive  (and  faster)  than  non-­‐volatile  RAM.  We  can’t  see  the   future  with  perfect  clarity,  but  it  is  likely  for  the  foreseeable  future,  this  stratification  of   memory  and  storage  will  not  change,  even  as  the  price/performance  of  each  continues   to  improve.  The  faster  RAM  chips  will  remain  volatile,  making  full  in-­‐memory  databases   impractical  for  most  uses.     iMDB  lack  the  balance  of  CPU  and  storage  could  lead  to  flooding  of  the  CPU’s.  iMDB   trades  the  potential  for  I/O  latency  with  the  very  real  possibility  of  RAM  out-­‐performing   the  processors.  Without  I/O  bottleneck,  processors  can  become  saturated.  This  is   something  that  the  software  developers  should  be  aware  of,  and  design  for,  but  given   the  relative  recency  of  certain  iMDB’s,  these  features  may  not  be  well  developed.  It   may  be  the  case  that  client  applications  may  need  to  be  rewritten  to  not  only  take   advantage  of  the  memory  resources  but  to  keep  them  from  bogging  down.   iMDB  rely  on  large  banks  of  very  fast,  expensive  RAM,  but  also  on  the  other  types  of   memory  and  storage  to  operate  for  high  availability  and  for  backup.  Hybrid-­‐DBMS  relies   on  the  same  collection  of  memory  and  storage  types,  but  in  different  proportion.  A   hybrid  system  uses  solid-­‐state  memory  judiciously  and  attempts  to  keep  as  much  data   pinned  in  memory  as  possible  for  active  work,  but  relies  on  only  one  mechanism  for   persistent  storage.     Conclusion    
  15. 15. On  the  Persistence  of  Memory  (in  Database  Systems)   13     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     iMDB  vendors  claim  that  In-­‐Memory  will  replace  traditional  hybrid-­‐DBMS,  unless  they   are  new  laws  of  physics,  holding  persistent  data  for  months  or  years  simply  isn’t  feasible   without  resorting  to  a  hybrid  in-­‐memory  and  disk-­‐based  system.  In  a  way,  one  can  think   of  an  iMDB  as  merely  an  accelerator  for  a  conventional  database  because  it  cannot   meet  the  requirements  durability  on  its  own.     On  the  other  hand,  hybrid-­‐DMBS  are  based  on  proven  data  warehousing  technologies   and  offer  flexible  architectures  and  deliver  high  performance  with  automatic  storage   management.   It  would  be  easy  to  predict  that  iMDBs,  and  that  includes  DBMS  with  all  SSD  drives,  will   eventually  overtake  disk-­‐based  systems.  However,  the  cost  of  memory  will  still  be   greater,  no  matter  what  it  is,  than  disk  drives  and  though  it  is  impossible  to  predict,  the   amount  of  data  captured  and  analyzed  will  continue  to  grow  at  a  rate  faster  than  the   price/Gb  of  memory.        
  16. 16. On  the  Persistence  of  Memory  (in  Database  Systems)   14     ©  2012  Hired  Brains  Inc.    All  Rights  Reserved     ABOUT  THE  AUTHOR         Neil  Raden,  based  in  Santa  Fe,  NM,  is  an  industry  analyst  and  active  consultant,  widely   published  author  and  speaker  and  the  founder  of  Hired  Brains,  Inc.,   http://www.hiredbrains.com.  Hired  Brains  provides  consulting,  systems  integration  and   implementation  services  in  Data  Warehousing,  Business  Intelligence,  “big  data:,  Decision   Automation  and  Advanced  Analytics  for  clients  worldwide.  Hired  Brains  Research   provides  consulting,  market  research,  product  marketing  and  advisory  services  to  the   software  industry.     Neil  was  a  contributing  author  to  one  of  the  first  (1995)  books  on  designing  data   warehouses  and  he  is  more  recently  the  co-­‐author  of  Smart  (Enough)  Systems:  How  to   Deliver  Competitive  Advantage  by  Automating  Hidden  Decisions,  Prentice-­‐Hall,  2007.  He   welcomes  your  comments  at  nraden@hiredbrains.com  or  at  his  blog  at  Competing  on   Decisions.        

×