A	
  comparison	
  of	
  performance	
  
between	
  KVM	
  and	
  Docker	
  
instances	
  in	
  OpenStack	
High	
  Energy	
  Accelerator	
  Research	
  Organiza�on	
  (KEK),	
  Japan	
  
1	
HEPiX	
  Fall	
  2015	
  Workshop	
  at	
  BNL	
Wataru	
  Takase	
  
KEK	
  site	
  will	
  become	
  Cloudy	
2	
  Integrate	
  private	
  cloud	
  into	
  batch	
  service	
  
  Deploy	
  CVMFS	
  Stratum	
  0	
  and	
  1	
  
  Start	
  online	
  storage	
  service	
  
Batch	
  
scheduler	
Mediator	
 Private	
  cloud	
Job	
Resource	
  
request	
 Request	
  for	
  
launching	
  
instances	
Resource	
  pool	
Launch	
  
Instances	
WN	
WN	
WN	
WN	
Job	
Batch	
  worker	
  nodes	
WN	
WN	
WN	
WN	
CVMFS	
mount	
mount	
Resource	
  
request	
Talk	
  focuses	
  on	
  this	
  part
Performance	
  Inves�ga�on	
  of	
  Cloud	
  Is	
  virtual	
  machine	
  performance	
  good?	
  
  What	
  about	
  container	
  technology?	
  
  What	
  about	
  concurrency	
  impact	
  on	
  
performance?	
  
  Measured	
  KVM	
  and	
  Docker	
  performance	
  in	
  
OpenStack	
  by using	
  Rally	
  
– Cloud	
  performance	
  
– Instance	
  performance	
  
3
KVM?	
  Docker?	
4	
OpenStack	
  (IaaS)	
Docker	
  driver	
Libvirt	
  driver	
h�ps://www.docker.com/wha�sdocker	
Each	
  VM	
  runs	
  on	
  virtual	
  hardware	
Containers	
  share	
  host	
  kernel	
  and	
  
hardware	
  	
  KVM	
  :	
  VM	
  hypervisor	
  
  Docker	
  :	
  Container	
  manger	
Bare	
  Metal	
  Driver	
VMWare	
  Driver	
Xen	
  API	
  Driver
What	
  is	
  Rally?	
  Benchmarking	
  tool	
  for	
  
OpenStack	
  
  Generates	
  real	
  workload	
  
  Provides	
  more	
  than	
  100	
  
test	
  scenarios:	
  
–  Boot	
  server	
  and	
  migrate	
  it	
  
–  Create	
  image	
  and	
  boot	
  
server	
  
–  Create	
  volume	
  and	
  a�ach	
  
to	
  server	
  
–  …	
  
5	
---
NovaServers.boot_and_delete_server:
-
args:
image:
name: ”centos-cloud:7.1"
flavor:
name: "m1.xsmall"
min_sleep: 60
max_sleep: 60
runner:
type: "constant"
times: 32
concurrency: 32
context:
users:
tenants: 1
users_per_tenant: 32	
Example_of_rally_benchmark_input.yaml
Test	
  environment	
6	
Physical	
  server	
	
 OS	
	
 Kernel	
	
 CPU	
	
CPU	
  
cores	
	
RAM	
  
(GB)	
	
Disk	
  
(GB)	
	
OpenStack	
  
controller	
  
CentOS	
  7.1.1503	
 3.10.0-­‐229	
Intel(R)	
  Xeon(R)	
  CPU	
  
E5649	
  x2	
24*	
 32	
 300	
OpenStack	
  
compute	
CentOS	
  7.1.1503	
 3.10.0-­‐229	
Intel(R)	
  Xeon(R)	
  CPU	
  
E5-­‐2630	
  v3	
  x2	
32*	
 64	
 3800	
Rally	
 CentOS	
  7.1.1503	
 3.10.0-­‐229	
AMD	
  Opteron(TM)	
  
Processor	
  6212	
8	
 16	
 1700	
OS	
	
 Kernel	
	
 vCPU	
	
RAM	
  
(GB)	
	
Disk	
  
(GB)	
	
CentOS	
  7.1.1503	
 3.10.0-­‐229	
 1	
 1.8	
 10	
  OpenStack	
  Kilo	
  (RDO)	
  
  1	
  controller	
  +	
  1	
  compute	
  node	
  
  nova-­‐network	
  
  Rally	
  (2d874a7)	
  
  Sysbench	
  0.4.12	
  
Instance	
  image	
  and	
  flavor	
*	
  HT	
  is	
  enabled
Test	
  environment	
7	
File	
  system	
	
 XFS	
  on	
  LVM	
  on	
  hardware	
  RAID	
  5	
IO	
  scheduler	
	
 Deadline	
Clocksource	
	
 TSC	
QEMU	
	
 1.5.3	
libvirt	
	
 1.2.8	
Docker	
	
 1.6.2	
Nova	
  Docker	
  driver	
	
 nova-­‐docker	
  stable/kilo	
  (d556444)	
Image	
  format	
	
 qcow2	
Block	
  device	
  deriver	
	
 VirtIO	
Cache	
  mode	
	
 none	
File	
  system	
	
 XFS	
Clocksource	
	
 TSC	
Storage	
  driver	
	
 OverlayFS	
Compute	
  node	
KVM	
Docker
Benchmark	
  Scenarios	
1.  Measure	
  cloud	
  performance	
  
–  Boot	
  a	
  server	
  and	
  then	
  delete	
  
2.  Measure	
  instance	
  performance	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=cpu)	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=memory,	
  memory-­‐oper=read)	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=memory,	
  memory-­‐oper=write)	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=fileio,	
  file-­‐test-­‐mode=seqrd)	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=fileio,	
  file-­‐test-­‐mode=rndrd)	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=fileio,	
  file-­‐test-­‐mode=seqwr)	
  
–  Boot	
  a	
  server	
  and	
  run	
  Sysbench	
  (test=fileio,	
  file-­‐test-­‐mode=rndwr)	
  
  Each	
  scenario	
  launches	
  32	
  instances	
  
–  Change	
  number	
  of	
  concurrent	
  requests	
  from	
  1	
  to	
  32	
  
8
Boot	
  a	
  server	
  and	
  then	
  delete	
9	
Building	
OpenStack	
Request	
x	
  32	
Building	
OpenStack	
Request	
  Concurrent	
  requests:	
  32	
Building	
Building	
Request	
Request	
...	
...	
Building	
Request	
  Concurrent	
  requests:	
  	
  1	
Wai�ng	
  for	
  
ping,	
  ssh	
Networking	
  
�me	
Ac�ve	
Wai�ng	
  for	
  
ping,	
  ssh	
Networking	
  
�me	
Wai�ng	
  for	
  
ping,	
  ssh	
Wai�ng	
  for	
  ping,	
  ssh	
Wai�ng	
  for	
  
ping,	
  ssh	
Wait	
  
60	
  sec	
Dele�ng	
Dele�ng	
  �me	
Building	
  �me	
Wait	
  
60	
  sec	
Wait	
  
60	
  sec	
Wait	
  
60	
  sec	
Wait	
  
60	
  sec	
Dele�ng	
Dele�ng	
Dele�ng	
Dele�ng	
Dele�ng	
  �me	
Building	
  �me	
Deleted
Build	
  a	
  server	
10	
  At	
  high	
  concurrency	
  KVM	
  is	
  around	
  20%	
  be�er	
  
N:	
  96	
  
Error	
  bar:	
  SD
11	
Wait	
  for	
  Networking	
Boot	
  OS	
  
+	
  cloud-­‐init	
N:	
  96	
  
Error	
  bar:	
  SD	
cloud-­‐init	
  
+ Start	
  sshd
Building	
  +	
  Networking	
12	
N:	
  96	
  
Error	
  bar:	
  SD
13	
Delete	
  a	
  server	
 N:	
  96	
  
Error	
  bar:	
  SD
Instance	
  Performance	
  Comparison	
14	
Wai�ng	
  for	
  
ping,	
  ssh	
Building	
OpenStack	
Request	
x	
  32	
Wai�ng	
  for	
  
ping,	
  ssh	
Building	
OpenStack	
Request	
  Concurrent	
  requests:	
  32	
Wai�ng	
  for	
  
ping,	
  ssh	
Building	
Wai�ng	
  for	
  
ping,	
  ssh	
Building	
Request	
Request	
...	
...	
Wai�ng	
  for	
  
ping,	
  ssh	
Building	
Request	
  Concurrent	
  requests:	
  1	
Sysbench	
Sysbench	
Sysbench	
Sysbench	
Sysbench	
Wait	
Measure	
  
performance	
Dele�ng	
Wait	
Wait	
Wait	
Wait	
Dele�ng	
Dele�ng	
Measure	
  
performance
test=cpu,	
  cpu-­‐max-­‐prime=20000,	
  num-­‐
threads=1	
15	
  At	
  low	
  concurrency	
  KVM	
  is	
  2-­‐7%	
  worse	
  than	
  na�ve	
  
  If	
  No.	
  of	
  concurrent	
  requests	
  >	
  2,	
  Docker	
  is	
  2%	
  worse	
  than	
  
na�ve	
  
N:	
  32	
  
Error	
  bar:	
  SD
16	
  test=memory,	
  memory-­‐oper=read,	
  memory-­‐block-­‐size=1K,	
  
memory-­‐total-­‐size=100G,	
  max-­‐�me=300,	
  num-­‐threads=1	
  
  test=memory,	
  memory-­‐oper=write,	
  memory-­‐block-­‐size=1K,	
  
memory-­‐total-­‐size=100G,	
  max-­‐�me=300,	
  num-­‐threads=1	
  
  At	
  low	
  concurrency	
  KVM	
  is	
  3-­‐10%	
  worse	
  than	
  na�ve	
  
  Docker	
  is	
  2-­‐5%	
  worse	
  than	
  na�ve	
  (concurrent	
  requests:	
  1-­‐16)	
  
N:	
  32	
  
Error	
  bar:	
  SD
17	
  test=fileio,	
  file-­‐test-­‐mode=seqrd,	
  file-­‐block-­‐size=4K,	
  file-­‐total-­‐size=8G,	
  file-­‐
num=128,	
  file-­‐extra-­‐flags=direct,	
  max-­‐�me=300,	
  num-­‐threads=1	
  
  test=fileio,	
  file-­‐test-­‐mode=rndrd,	
  file-­‐block-­‐size=4K,	
  file-­‐total-­‐size=8G,	
  file-­‐
num=128,	
  file-­‐extra-­‐flags=direct,	
  max-­‐�me=300,	
  num-­‐threads=1	
  
  At	
  low	
  concurrency	
  KVM	
  sequen�al	
  read	
  is	
  60-­‐70%	
  worse	
  than	
  na�ve	
  
  KVM	
  random	
  read	
  is	
  several	
  %	
  worse	
  than	
  na�ve	
  
  Docker	
  achieves	
  na�ve	
  performance	
  
N:	
  32	
  
Error	
  bar:	
  SD
18	
  test=fileio,	
  file-­‐test-­‐mode=seqwr,	
  file-­‐block-­‐size=4K,	
  file-­‐total-­‐size=8G,	
  file-­‐
num=128,	
  file-­‐extra-­‐flags=direct,	
  max-­‐�me=300,	
  num-­‐threads=1	
  
  test=fileio,	
  file-­‐test-­‐mode=rndwr,	
  file-­‐block-­‐size=4K,	
  file-­‐total-­‐size=8G,	
  file-­‐
num=128,	
  file-­‐extra-­‐flags=direct,	
  max-­‐�me=300,	
  num-­‐threads=1	
  
  At	
  low	
  concurrency	
  KVM	
  is	
  70-­‐80%	
  worse	
  than	
  na�ve	
  
  In	
  the	
  case	
  of	
  single	
  request,	
  Docker	
  sequen�al	
  write	
  is	
  15%	
  worse	
  than	
  na�ve	
  
  Beside	
  that	
  Docker	
  achieves	
  almost	
  na�ve	
  performance	
  
N:	
  32	
  
Error	
  bar:	
  SD
Summary	
  and	
  Conclusion	
19	
  Cloud	
  performance	
  comparison	
  
–  Docker	
  instance	
  becomes	
  ready	
  faster	
  than	
  KVM	
  (building	
  +	
  networking)	
  
  Instance	
  performance	
  comparison	
  
–  CPU	
  and	
  memory	
  performance	
  
  Na�ve	
  >	
  Docker	
  >	
  KVM	
  
–  KVM	
  
  File	
  IO	
  performance	
  is	
  poor	
  compared	
  to	
  na�ve	
  
–  Docker	
  
  Read	
  performance	
  is	
  almost	
  the	
  same	
  as	
  na�ve	
  
  Write	
  performance	
  is	
  near	
  na�ve	
  
  Docker	
  seems	
  to	
  be	
  a	
  good	
  candidate	
  in	
  the	
  future	
  
–  Nova-­‐docker	
  driver	
  lacks	
  some	
  features	
  and	
  has	
  some	
  bugs	
  	
  
  More	
  inves�ga�on	
  is	
  needed	
  
–  Security	
  
–  Stability	
  
–  Other	
  benchmarks	
  (network,	
  volume,	
  tuned	
  KVM)	
  

Docker vs kvm

  • 1.
    A  comparison  of  performance   between  KVM  and  Docker   instances  in  OpenStack High  Energy  Accelerator  Research  Organiza�on  (KEK),  Japan   1 HEPiX  Fall  2015  Workshop  at  BNL Wataru  Takase  
  • 2.
    KEK  site  will  become  Cloudy 2   Integrate  private  cloud  into  batch  service     Deploy  CVMFS  Stratum  0  and  1     Start  online  storage  service   Batch   scheduler Mediator Private  cloud Job Resource   request Request  for   launching   instances Resource  pool Launch   Instances WN WN WN WN Job Batch  worker  nodes WN WN WN WN CVMFS mount mount Resource   request Talk  focuses  on  this  part
  • 3.
    Performance  Inves�ga�on  of  Cloud   Is  virtual  machine  performance  good?     What  about  container  technology?     What  about  concurrency  impact  on   performance?     Measured  KVM  and  Docker  performance  in   OpenStack  by using  Rally   – Cloud  performance   – Instance  performance   3
  • 4.
    KVM?  Docker? 4 OpenStack  (IaaS) Docker  driver Libvirt  driver h�ps://www.docker.com/wha�sdocker Each  VM  runs  on  virtual  hardware Containers  share  host  kernel  and   hardware     KVM  :  VM  hypervisor     Docker  :  Container  manger Bare  Metal  Driver VMWare  Driver Xen  API  Driver
  • 5.
    What  is  Rally?  Benchmarking  tool  for   OpenStack     Generates  real  workload     Provides  more  than  100   test  scenarios:   –  Boot  server  and  migrate  it   –  Create  image  and  boot   server   –  Create  volume  and  a�ach   to  server   –  …   5 --- NovaServers.boot_and_delete_server: - args: image: name: ”centos-cloud:7.1" flavor: name: "m1.xsmall" min_sleep: 60 max_sleep: 60 runner: type: "constant" times: 32 concurrency: 32 context: users: tenants: 1 users_per_tenant: 32 Example_of_rally_benchmark_input.yaml
  • 6.
    Test  environment 6 Physical  server OS Kernel CPU CPU   cores RAM   (GB) Disk   (GB) OpenStack   controller   CentOS  7.1.1503 3.10.0-­‐229 Intel(R)  Xeon(R)  CPU   E5649  x2 24* 32 300 OpenStack   compute CentOS  7.1.1503 3.10.0-­‐229 Intel(R)  Xeon(R)  CPU   E5-­‐2630  v3  x2 32* 64 3800 Rally CentOS  7.1.1503 3.10.0-­‐229 AMD  Opteron(TM)   Processor  6212 8 16 1700 OS Kernel vCPU RAM   (GB) Disk   (GB) CentOS  7.1.1503 3.10.0-­‐229 1 1.8 10   OpenStack  Kilo  (RDO)     1  controller  +  1  compute  node     nova-­‐network     Rally  (2d874a7)     Sysbench  0.4.12   Instance  image  and  flavor *  HT  is  enabled
  • 7.
    Test  environment 7 File  system XFS  on  LVM  on  hardware  RAID  5 IO  scheduler Deadline Clocksource TSC QEMU 1.5.3 libvirt 1.2.8 Docker 1.6.2 Nova  Docker  driver nova-­‐docker  stable/kilo  (d556444) Image  format qcow2 Block  device  deriver VirtIO Cache  mode none File  system XFS Clocksource TSC Storage  driver OverlayFS Compute  node KVM Docker
  • 8.
    Benchmark  Scenarios 1.  Measure  cloud  performance   –  Boot  a  server  and  then  delete   2.  Measure  instance  performance   –  Boot  a  server  and  run  Sysbench  (test=cpu)   –  Boot  a  server  and  run  Sysbench  (test=memory,  memory-­‐oper=read)   –  Boot  a  server  and  run  Sysbench  (test=memory,  memory-­‐oper=write)   –  Boot  a  server  and  run  Sysbench  (test=fileio,  file-­‐test-­‐mode=seqrd)   –  Boot  a  server  and  run  Sysbench  (test=fileio,  file-­‐test-­‐mode=rndrd)   –  Boot  a  server  and  run  Sysbench  (test=fileio,  file-­‐test-­‐mode=seqwr)   –  Boot  a  server  and  run  Sysbench  (test=fileio,  file-­‐test-­‐mode=rndwr)     Each  scenario  launches  32  instances   –  Change  number  of  concurrent  requests  from  1  to  32   8
  • 9.
    Boot  a  server  and  then  delete 9 Building OpenStack Request x  32 Building OpenStack Request   Concurrent  requests:  32 Building Building Request Request ... ... Building Request   Concurrent  requests:    1 Wai�ng  for   ping,  ssh Networking   �me Ac�ve Wai�ng  for   ping,  ssh Networking   �me Wai�ng  for   ping,  ssh Wai�ng  for  ping,  ssh Wai�ng  for   ping,  ssh Wait   60  sec Dele�ng Dele�ng  �me Building  �me Wait   60  sec Wait   60  sec Wait   60  sec Wait   60  sec Dele�ng Dele�ng Dele�ng Dele�ng Dele�ng  �me Building  �me Deleted
  • 10.
    Build  a  server 10  At  high  concurrency  KVM  is  around  20%  be�er   N:  96   Error  bar:  SD
  • 11.
    11 Wait  for  Networking Boot  OS   +  cloud-­‐init N:  96   Error  bar:  SD cloud-­‐init   + Start  sshd
  • 12.
    Building  +  Networking 12 N:  96   Error  bar:  SD
  • 13.
    13 Delete  a  server N:  96   Error  bar:  SD
  • 14.
    Instance  Performance  Comparison 14 Wai�ng  for   ping,  ssh Building OpenStack Request x  32 Wai�ng  for   ping,  ssh Building OpenStack Request   Concurrent  requests:  32 Wai�ng  for   ping,  ssh Building Wai�ng  for   ping,  ssh Building Request Request ... ... Wai�ng  for   ping,  ssh Building Request   Concurrent  requests:  1 Sysbench Sysbench Sysbench Sysbench Sysbench Wait Measure   performance Dele�ng Wait Wait Wait Wait Dele�ng Dele�ng Measure   performance
  • 15.
    test=cpu,  cpu-­‐max-­‐prime=20000,  num-­‐ threads=1 15  At  low  concurrency  KVM  is  2-­‐7%  worse  than  na�ve     If  No.  of  concurrent  requests  >  2,  Docker  is  2%  worse  than   na�ve   N:  32   Error  bar:  SD
  • 16.
    16   test=memory,  memory-­‐oper=read,  memory-­‐block-­‐size=1K,   memory-­‐total-­‐size=100G,  max-­‐�me=300,  num-­‐threads=1     test=memory,  memory-­‐oper=write,  memory-­‐block-­‐size=1K,   memory-­‐total-­‐size=100G,  max-­‐�me=300,  num-­‐threads=1     At  low  concurrency  KVM  is  3-­‐10%  worse  than  na�ve     Docker  is  2-­‐5%  worse  than  na�ve  (concurrent  requests:  1-­‐16)   N:  32   Error  bar:  SD
  • 17.
    17   test=fileio,  file-­‐test-­‐mode=seqrd,  file-­‐block-­‐size=4K,  file-­‐total-­‐size=8G,  file-­‐ num=128,  file-­‐extra-­‐flags=direct,  max-­‐�me=300,  num-­‐threads=1     test=fileio,  file-­‐test-­‐mode=rndrd,  file-­‐block-­‐size=4K,  file-­‐total-­‐size=8G,  file-­‐ num=128,  file-­‐extra-­‐flags=direct,  max-­‐�me=300,  num-­‐threads=1     At  low  concurrency  KVM  sequen�al  read  is  60-­‐70%  worse  than  na�ve     KVM  random  read  is  several  %  worse  than  na�ve     Docker  achieves  na�ve  performance   N:  32   Error  bar:  SD
  • 18.
    18   test=fileio,  file-­‐test-­‐mode=seqwr,  file-­‐block-­‐size=4K,  file-­‐total-­‐size=8G,  file-­‐ num=128,  file-­‐extra-­‐flags=direct,  max-­‐�me=300,  num-­‐threads=1     test=fileio,  file-­‐test-­‐mode=rndwr,  file-­‐block-­‐size=4K,  file-­‐total-­‐size=8G,  file-­‐ num=128,  file-­‐extra-­‐flags=direct,  max-­‐�me=300,  num-­‐threads=1     At  low  concurrency  KVM  is  70-­‐80%  worse  than  na�ve     In  the  case  of  single  request,  Docker  sequen�al  write  is  15%  worse  than  na�ve     Beside  that  Docker  achieves  almost  na�ve  performance   N:  32   Error  bar:  SD
  • 19.
    Summary  and  Conclusion 19  Cloud  performance  comparison   –  Docker  instance  becomes  ready  faster  than  KVM  (building  +  networking)     Instance  performance  comparison   –  CPU  and  memory  performance     Na�ve  >  Docker  >  KVM   –  KVM     File  IO  performance  is  poor  compared  to  na�ve   –  Docker     Read  performance  is  almost  the  same  as  na�ve     Write  performance  is  near  na�ve     Docker  seems  to  be  a  good  candidate  in  the  future   –  Nova-­‐docker  driver  lacks  some  features  and  has  some  bugs       More  inves�ga�on  is  needed   –  Security   –  Stability   –  Other  benchmarks  (network,  volume,  tuned  KVM)