• Save
Greenplum Database on HDFS
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Greenplum Database on HDFS

on

  • 5,568 views

 

Statistics

Views

Total Views
5,568
Views on SlideShare
5,388
Embed Views
180

Actions

Likes
13
Downloads
0
Comments
0

5 Embeds 180

http://eventifier.co 108
http://eventifier.com 59
https://twitter.com 10
http://www.eventifier.co 2
http://www.twylah.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Greenplum Database on HDFS Presentation Transcript

  • 1. Greenplum Database on HDFS (GOH) Presenter: Lei Chang lei.chang@emc.com© Copyright 2012 EMC Corporation. All rights reserved. 1
  • 2. Outline   •  Introduc/on   •  Architecture   •  Features   •  Performance  study  © Copyright 2012 EMC Corporation. All rights reserved. 2
  • 3. EMC  Greenplum  Unified  Analy/cs  Pla@orm  © Copyright 2012 EMC Corporation. All rights reserved. 3
  • 4. GOH  use  cases   •  All  customers  of  Greenplum  who  want  to  minimize  the  amount  of   duplicate  storage  that  they  have  to  buy  for  analy/cs     –  managing  scale  much  easier  if  you  focus  on  the  growth  of  one  pool  than   having  many  fragmented  pools.     •  For  customers  who  want  the  func/onality  of  GPDB  with  the  generality  and   storage  provided  by  their  HBase  store.     •  Poten/al  Ability  to  plug  various  storage  such  as  Isilon,  Atoms,  MapR   Filesystem,  CloudStore,  GPFS,  Lustre,  PVFS  and  Ceph  to  GPDB/Hadoop   soQware  stack  © Copyright 2012 EMC Corporation. All rights reserved. 4
  • 5. Master host GPDB Interconnect Segment Segment (Mirror) Segment Segment Segment Segment Segment Segment (Mirror) Segment Segment (Mirror) (Mirror) (Mirror) Segment host Segment host Segment host Segment host Segment host Meta Ops Read/Write Tables in HDFS filespace Namenode B Datanode replication Datanode Datanode Rack1 Rack2© Copyright 2012 EMC Corporation. All rights reserved. 5
  • 6. GOH  features   •  A  pluggable  storage  layer.  If  a  new  file  system  can  support  the   full  seman/c  of  HDFS  interface,  then  the  file  system  can  be   added  as  GPDB  AO  table  storage.   •  ASributed  filespace   •  HDFS  filespaces  are  na/vely  supported   •  Full  transac/on  support  for  AO  tables  on  HDFS.     •  HDFS  trunca/on  capability  to  support  the  transac/on   capability  of  GOH.     •  HDFS  na/ve  C  interface  to  eliminate  the  concurrency   limita/on  of  current  java  JNI  based  client.   •  All  current  GPDB  func/onality:  fault  tolerance  et  al.    © Copyright 2012 EMC Corporation. All rights reserved. 6
  • 7. Pluggable  storage:  user  interface   CREATE  FUNCTION  open_func  AS  (  obj_file  ,  link_smybol  )       CREATE  FILESYSTEM  filesystemname  [OWNER  ownername]  (                                                                    connect  =  connect_func,                                                                    open  =  open_func,                                                                      close  =  close_func,                                                                    read  =  read_func,                                                                    write  =  write_func,                                                                    seek  =  seek_func,                                      ...   )    © Copyright 2012 EMC Corporation. All rights reserved. 7
  • 8. ASributed  filespaces   •  The  number  of  replicas  for  the  table  in  the  filespace   •  Whether  mirroring  is  supported  for  the  tables  stored  in  the   filespace   •  Other  aSributes…  © Copyright 2012 EMC Corporation. All rights reserved. 8
  • 9. Example  SQL   CREATE  FILESPACE  goh  ON  HDFS   (            1:  hdfs://name-­‐node/users/changl1/gp-­‐data/gohmaster/gpseg-­‐1,            2:  hdfs://name-­‐node/users/changl1/gp-­‐data/goh/gpseg0,            3:  hdfs://name-­‐node/users/changl1/gp-­‐data/goh/gpseg1,   )  WITH  (NUMREPLICA  =  3,  MIRRORING  =  false);    © Copyright 2012 EMC Corporation. All rights reserved. 9
  • 10. Transac/on  support   •  When  a  load  transac/on  is  aborted,  there  will  be  some   garbage  data  leQ  at  the  end  of  file.  For  HDFS  like  systems,   data  cannot  be  truncated  or  overwriSen.  Thus,  we  need  some   methods  to  process  the  par/al  data  to  support  transac/on.     –  Op/on  1:  Load  data  into  a  separate  HDFS  file.  Unlimited  number  of   files.   –  Op/on  2:  Use  metadata  to  records  the  boundary  of  garbage  data,  and   implements  a  kind  of  vacuum  mechanism.   –  Op/on  3:  Implement  HDFS  trunca/on.  © Copyright 2012 EMC Corporation. All rights reserved. 10
  • 11. HDFS  C  client:  why     •  libhdfs  (Current  HDFS  c  client)  is  based  on  JNI.  It  is  difficult  to   make  GOH  support  a  large  number  of  concurrent  queries.     •  Example:   –  6  segments  on  each  segment  hosts   –  50  concurrent  queries   –  each  query  may  have  12  or  more  QE  processes  that  do  scan   –  there  will  be  about  600  processes  that  start  600  JVMs  to  access  HDFS.     –  If  each  JVM  uses  500MB  memory,  the  JVMs  will  consume  600  *  500M   =  300G  memory.     –  Thus  naïve  usage  of  libhdfs  is  not  suitable  for  GOH.  Currently  we  have   three  op/ons  to  solve  this  problem  © Copyright 2012 EMC Corporation. All rights reserved. 11
  • 12. HDFS  client:  three  op/ons   •  Op/on  1:  use  HDFS  FUSE.  HDFS  FUSE  introduces  some   performance  overhead.  And  the  scalability  is  not  verified  yet.   •  Op/on  3:  implement  a  webhdfs  based  C  client.  webhdfs  is   based  on  HTTP.  It  also  introduces  some  costs.  Performance   should  be  benchmarked.  Webhdfs  based  method  has  several   benefits,  such  as  ease  to  implementa/on  and  low   maintenance  cost.   •  Op/on  2:  implement  a  C  RPC  interface  that  directly   communicates  with  NameNode  and  DataNode.  Many  changes   when  the  RPC  protocol  is  changed.   •  Currently,  we  implemented  op/on  2  and  op/on  3.  © Copyright 2012 EMC Corporation. All rights reserved. 12
  • 13. HDFS  truncate   •  API   –  truncate  (DistributedFileSystem)  -­‐  truncate  a  file  to  a  specified  length   –  void  truncate(Path  src,  long  length)  throws  IOExcep/on;   •  Seman/cs   –  Only  single  writer/Appender/Truncater  is  allowed.  Users  can  only  call   truncate  on  closed  files.   –  HDFS  guarantees  the  atomicity  of  a  truncate  opera/on.  That  is,  it   succeeds  or  fails.  It  does  not  leave  the  file  in  an  undefined  state.   –  Concurrent  readers  may  read  content  of  a  file  that  will  be  truncated   by  a  concurrent  truncate  opera/on.  But  they  must  be  able  to  read  all   the  data  that  are  not  affected  by  the  concurrent  truncate  opera/on.  © Copyright 2012 EMC Corporation. All rights reserved. 13
  • 14. HDFS  truncate  implementa/on  (HDFS-­‐3107)   •  Get  the  lease  of  the  to-­‐be-­‐truncated  file  (F)   •  If  truncate  is  at  block  boundary   –  Delete  the  tail  blocks  as  an  atomic  opera/on.     •  If  truncate  is  not  at  block  boundary   –  Copy  the  last  block  (B)  of  the  result  file  (R)  to  a  temporary  file  (T).   •  Otherwise,  If  truncate  is  not  at  block  boundary   •  Remove  the  tail  blocks  of  file  F  (including  B,  B+1,  …),  concat  F   and  T,  get  R.   •  Release  the  lease  for  the  file  © Copyright 2012 EMC Corporation. All rights reserved. 14
  • 15. Performance  study  (to  be  added)  © Copyright 2012 EMC Corporation. All rights reserved. 15
  • 16. Thank  you!  © Copyright 2012 EMC Corporation. All rights reserved. 16