A	
  Performance	
  Comparison	
  of	
  Container-­‐based	
  
Virtualiza8on	
  Systems	
  for	
  MapReduce	
  Clusters	
  ...
Outline	
  
• 
• 
• 
• 
• 

Introduc8on	
  
Container-­‐based	
  Virtualiza8on	
  
MapReduce	
  
Evalua8on	
  
Conclusion	...
Introduc8on	
  
• 

• 

• 

Virtualiza8on	
  	
  
•  Allows	
  resources	
  to	
  be	
  shared	
  
•  Hardware	
  independ...
Container-­‐based	
  Virtualiza8on	
  
•  A	
  group	
  o	
  processes	
  on	
  	
  a	
  Linux	
  box,	
  put	
  together	...
• 

Container-­‐based	
  Virtualiza8on	
  
	
  
Each	
  container	
  has:	
  
• 

Its	
  own	
  network	
  interface	
  (a...
• 

Container-­‐based	
  Virtualiza8on	
  
	
  
Implements	
  Linux	
  Namespaces	
  
Mount	
  –	
  moun8ng/unmou8ng	
  fil...
• 

Container-­‐based	
  Systems	
  
	
  
Linux-­‐VServer	
  

Implements	
  its	
  own	
  features	
  in	
  Linux	
  kern...
Hypervisor-­‐	
  vs	
  Container-­‐based	
  Systems	
  
Hypervisor	
  

Container	
  

Different	
  Kernel	
  OS	
  

Singl...
MapReduce	
  
• 

• 

MapReduce	
  	
  
•  A	
  parallel	
  programming	
  model	
  
•  Simplicity,	
  efficiency	
  and	
  ...
MapReduce	
  and	
  Containers	
  
• 

• 

Apache	
  Mesos	
  
•  Shares	
  a	
  cluster	
  between	
  mul8ple	
  different...
Evalua8on	
  
• 

Experimental	
  Environment	
  	
  
• 
• 
• 
• 

• 

Hadoop	
  cluster	
  composed	
  by	
  4	
  nodes	
...
HDFS	
  Evalua8on	
  
Semngs:	
  
•  Replica8on	
  of	
  3	
  blocks	
  
•  File	
  size	
  from	
  100	
  MB	
  to	
  
30...
HDFS	
  Evalua8on	
  

	
  
• 
	
  

All	
  of	
  Container-­‐based	
  
systems	
  obtained	
  
performance	
  results	
  ...
NameNode	
  Evalua8on	
  using	
  NNBench	
  
•  	
  	
  	
  Generates	
  opera8ons	
  on	
  1000	
  files	
  on	
  HDFS	
 ...
MapReduce	
  Evalua8on	
  using	
  MRBench	
  
Na8ve	
  
Execu8on	
  Time	
  	
  

• 

LXC	
  

OpenVZ	
  

VServer	
  

1...
Analyzing	
  Performance	
  with	
  WordCount	
  
180

•  30	
  GB	
  of	
  input	
  data	
  
•  The	
  peak	
  of	
  perf...
Analyzing	
  Performance	
  with	
  TeraSort	
  

•  A	
  HDFS	
  block	
  size	
  of	
  64MB	
  
	
  

140

120

Executio...
Performance	
  Isola8on	
  
Base	
  line	
  	
  
applica8on	
  

Base	
  line	
  	
  
applica8on	
  

Stress	
  Test	
  

...
Performance	
  Isola8on	
  
	
  

CPU	
  
LXC	
  

Memory	
  

I/O	
  

Fork	
  Bomb	
  

0%	
  

8.3%	
  

5.5%	
  

0%	
...
Conclusions	
  
•  we	
  found	
  that	
  all	
  container-­‐based	
  systems	
  reach	
  a	
  near-­‐na8ve	
  performance...
Future	
  Work	
  
•  We	
  plan	
  to	
  study	
  the	
  performance	
  isola8on	
  at	
  the	
  network-­‐level	
  
•  W...
Thank	
  you	
  for	
  your	
  aUen8on!	
  
Upcoming SlideShare
Loading in …5
×

A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

1,240 views

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,240
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
16
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A Performance Comparison of Container-based Virtualization Systems for MapReduce Clusters

  1. 1. A  Performance  Comparison  of  Container-­‐based   Virtualiza8on  Systems  for  MapReduce  Clusters     Miguel  G.  Xavier,  Marcelo  V.  Neves,  Cesar  A.  F.  De  Rose   miguel.xavier@acad.pucrs.br   Faculty  of  Informa8cs,  PUCRS   Porto  Alegre,  Brazil     February  13,  2014  
  2. 2. Outline   •  •  •  •  •  Introduc8on   Container-­‐based  Virtualiza8on   MapReduce   Evalua8on   Conclusion    
  3. 3. Introduc8on   •  •  •  Virtualiza8on     •  Allows  resources  to  be  shared   •  Hardware  independence,  availability,  isola8on  and  security   •  BeUer  manageability   •  Widely  used  in  datacenters/cloud  compu8ng   MapReduce  Cluster  and  Virtualiza8on     •  Usage  scenarios   •  BeUer  resource  sharing   •  Cloud  Compu8ng   However,  hypervisor-­‐based  technologies  in  MapReduce  environments  has   tradi8onally  been  avoided  
  4. 4. Container-­‐based  Virtualiza8on   •  A  group  o  processes  on    a  Linux  box,  put  together  in  a   •  •  •  isolated  environment   A  lightweight  virtualiza8on  layer     Non  virtualized  drivers   Shared  opera8ng  system   Guest Processes Guest Processes Guest Processes Guest Processes Guest OS Guest OS Virtualization Layer Virtualization Layer Host OS Host OS Hardware Hardware Container-based Virtualization Hypervisor-Based Virtualization
  5. 5. •  Container-­‐based  Virtualiza8on     Each  container  has:   •  Its  own  network  interface  (and  IP  Address)   •  •  •  •  Bridged,  routed  …   Its  own  filesystem   Isola8on  (security)   •  container  A  and  B  can’t  see  each  other   Isola8on  (resource  usage)   •  RAM,  CPU,  I/O   •  Current  systems   •  Linux-­‐Vserver,  OpenVZ,  LXC        
  6. 6. •  Container-­‐based  Virtualiza8on     Implements  Linux  Namespaces   Mount  –  moun8ng/unmou8ng  file  systems   UTS  –  hostname,  domainname   IPC  –  SysV  message  queues,  semaphore,  memory  segments   Network  –  IPv4/IPv6  stacks,  rou8ng,  firewall,  /proc/net,   sock   •  PID  –  Own  set  of  pids   Chroot  is  filesystem  namespace     •  •  •  •  •  Current  systems   •  Linux-­‐Vserver,  OpenVZ,  LXC        
  7. 7. •  Container-­‐based  Systems     Linux-­‐VServer   Implements  its  own  features  in  Linux  kernel     limits  the  scope  of  the  file  system  from  different  processes   through  the  tradi8onal  chroot   •  OpenVZ   •  •  •  Linux  Containers  (LXC)   •  Based  on  CGroups  
  8. 8. Hypervisor-­‐  vs  Container-­‐based  Systems   Hypervisor   Container   Different  Kernel  OS   Single  Kernel   Device  Emula8on   Syscall   Many  FS  caches   Single  FS  cache   Limits  per  machine   Limits  per  process   High  Performance  Overhead   Low  Performance  Overhead  
  9. 9. MapReduce   •  •  MapReduce     •  A  parallel  programming  model   •  Simplicity,  efficiency  and  high  scalability   •  It  has  become  a  de  facto  standard  for  large-­‐scale  data  analysis     MapReduce  has  also  aUracted  the  aUen8on  of  the  HPC   community   •  Simpler  approach  to  address  the  parallelism  problem   •  Highly  visible  case  where  MapReduce  has  been  successfully   used  by  companies  like  Google,  Yahoo!,  Facebook  and   Amazon  
  10. 10. MapReduce  and  Containers   •  •  Apache  Mesos   •  Shares  a  cluster  between  mul8ple  different  frameworks   •  Creates  another  level  of  resource  management   •  Management  is  taken  away  from  cluster’s  RMS   Apache  YARN   •  Hadoop  Next  Genera8on   •  BeUer  job  scheduling/monitoring   •  Uses  virtualiza8on  to  share  a  cluster  among  different   applica8ons      
  11. 11. Evalua8on   •  Experimental  Environment     •  •  •  •  •  Hadoop  cluster  composed  by  4  nodes     Two  processors  with  8  cores  (without  threads)  per  node   16GB  of  memory  per  node   146GB  of  disksize  per  node   Analyze  of  the  best  results  of  performance   •  Through  micro-­‐benchmarks     •  •  •  •  •  HDFS  evalua8on  (TestDFSIO)   NameNode  evalua8on  (NNBench)   MapReduce  evalua8on  (MRBench)   Through  macro-­‐benchmarks  (WordCount,  TeraBench)     Analyze  of  best  results  of  isola8on   •  Through  IBS  benchmark   At  least  50  execu8ons  were  performed  for  each  experiment   •   
  12. 12. HDFS  Evalua8on   Semngs:   •  Replica8on  of  3  blocks   •  File  size  from  100  MB  to   3000  MB       •  •  •    All  Container-­‐based  systems   have  performance  similar  to   na8ve     Results  o  OpenVZ  represents   loss  of  3Mbps   It  is  due  to  the  CFQ  scheduler     30 25 Throughput (Mbps) •  20 lxc nativa 15 ovz vserver 10 5 0 0 1000 2000 File size (Bytes) 3000
  13. 13. HDFS  Evalua8on     •    All  of  Container-­‐based   systems  obtained   performance  results  similar   to  na8ve     Linux-­‐VServer  uses  a   Physical-­‐based  network   30 25 Throughput (Mbps)   •  20 lxc nativa 15 ovz vserver 10 5 0 0 1000 2000 File size (Bytes) 3000
  14. 14. NameNode  Evalua8on  using  NNBench   •       Generates  opera8ons  on  1000  files  on  HDFS   Na8ve   VServer   0.51     0.52   0.51   0.49   Create/Write  (ms)   •  •  OpenVZ   Open/Read  (ms)   •  •  LXC   54.65   56.89   51.96   48.90   NNBench  benchmark  was  chosen  to  evaluate  the  NameNode  component   Linux-­‐VServer  reaches  a  latency  at  a  average  of  48ms,  while  LXC  obtained  the   worst  result  at  an  average  of  56ms   The  differences  are  not  so  significant  if  the  numbers  are  considered   However,  the  strengths  are  that  no  excep8on  was  observed  during  the  high   HDFS  management  stress,  and  that  all  systems  were  able  to  respond   effec8vely  as  the  na8ve  
  15. 15. MapReduce  Evalua8on  using  MRBench   Na8ve   Execu8on  Time     •  LXC   OpenVZ   VServer   14251       13577   14304     13614     The  results  obtained  from  MRBench  show  that  MR  layer  suffers  no  substan8al   effect  while  running  on  different  container-­‐based  virtualiza8on  systems  
  16. 16. Analyzing  Performance  with  WordCount   180 •  30  GB  of  input  data   •  The  peak  of  performance   degrada8on  from  OpenVZ   is  explained  by  the  I/O   scheduler  overhead   160 140 Execution Time (seconds)   120 Native 100 LXC OpenVZ 80 VServer 60 40 20 0 Wordcount
  17. 17. Analyzing  Performance  with  TeraSort   •  A  HDFS  block  size  of  64MB     140 120 Execution Time (seconds) •  Standard  map/reduce  sort     •  Steps:   •  Generates  30  GB  of  input   data   •  Run  on  such  input  data.     100 Native 80 LXC OpenVZ VServer 60 40 20 0 Terasort
  18. 18. Performance  Isola8on   Base  line     applica8on   Base  line     applica8on   Stress  Test   Container   A   Container   A   Container   B   Execu8on  Time     Execu8on  Time     Performance  degrada8on  (%)    
  19. 19. Performance  Isola8on     CPU   LXC   Memory   I/O   Fork  Bomb   0%   8.3%   5.5%   0%   •  We  chose  LXC    as  the  representa8ve  of  the  container-­‐based  virtualiza8on  to  be   evaluated   •  The  limits    of  the  CPU  usage  per  container  is  working  well   •  no  significant  impact  was  noted.     •  a  liUle  performance  degrada8on  needs  to  be  taken  into  account     •  The  fork  bomb  stress  test  reveals  that  the  LXC  has  a  security  subsystem  that   ensure  feasibility  
  20. 20. Conclusions   •  we  found  that  all  container-­‐based  systems  reach  a  near-­‐na8ve  performance  for   MapReduce  workloads     •  the  results  of  performance  isola8on  reveled  that  the  LXC  has  improved  its   capabili8es  of  restrict  resources  among  containers     •  although  some  works  are  already  taking  advantages  of  container-­‐based   systems  on  MR  clusters   •  this  work  demonstrated  the  benefits  of  using  container-­‐based  systems  to   support  MapReduce  clusters  
  21. 21. Future  Work   •  We  plan  to  study  the  performance  isola8on  at  the  network-­‐level   •  We  plan  to  study  the  scalability  while  increasing  the  number  of   nodes   •  We  plan  to  study  aspects  regarding  the  green  compu8ng,  such  as   the  trade-­‐off  between  performance  and  energy  consump8on    
  22. 22. Thank  you  for  your  aUen8on!  

×