Denser containers with PFCacheDenser containers with PFCache
Pavel Emelyanov
ContainerCon, Seattle, 2015
AgendaAgenda
• How to store container files
• Why shared template matters
• What can be deduplicated and what should be
• PFCache
• Q&A
2
How to store container filesHow to store container files
3
Filesystem
Container
processes
How to store container filesHow to store container files
4
Filesystem
Container
processes
Block device
NetworkHost
Filesystem
Host
block device
Hardware
How to store container files (1)How to store container files (1)
5
Filesystem
Container
processes
Block device
NetworkHost
Filesystem
Host
block device
Hardware
Chroot()
Union FS
How to store container files (2)How to store container files (2)
6
Filesystem
Container
processes
Block device
NetworkHost
Filesystem
Host
block device
Hardware
Loop device
ZFS ZVol
BTRFS subvolume
PLoop
What's PLoopWhat's PLoop
• Loop device plus
– AIO for better performance
– Snapshots
– QCOW2-like format for thin provisioning
– Thin provisionong itself
• Upstreaming work in progress
7
How to store container files (3)How to store container files (3)
8
Filesystem
Container
processes
Block device
NetworkHost
Filesystem
Host
block device
Hardware
LVM
DM-thin
How to store container files (4)How to store container files (4)
9
Filesystem
Container
processes
Block device
NetworkHost
Filesystem
Host
block device
Hardware
NBD
Ceph RBD
iSCSI
How to store container files (5)How to store container files (5)
10
Filesystem
Container
processes
Block device
NetworkHost
Filesystem
Host
block device
Hardware
NFS
GFS2
OCFS
Ceph
Containers vs TemplatesContainers vs Templates
• Containers ...
– are massively cloned from pre-created “templates”
– do not have direct access to the underlying (block) storage
• Identical data can be effectively deduplicated
– Higher density
– Lower IO and/or memory consumption
11
Who can do shared templatesWho can do shared templates
12
Storage OpenVZ Docker LXC
Union FSs + + +
Btrfs +
DM-thin +
PLoop +
Ceph
ZFS +
What can be de-duplicatedWhat can be de-duplicated
13
Filesystem
Container
processes
Block device Network
What can be de-duplicatedWhat can be de-duplicated
14
Filesystem
Container
processes
Block device Network
Page cache
Cached pages
What can be de-duplicatedWhat can be de-duplicated
15
Filesystem
Container
processes
Block device Network
Page cache
Cached pages
IO flow
What is deduplicatedWhat is deduplicated
16
Storage Memory IO
Union FSs + +
Btrfs +/-
DM-thin
PLoop + +
Ceph
ZFS
Additional OpenVZ constraintsAdditional OpenVZ constraints
• Containers disks are independent image files
– Can be easily copied across nodes
– No single (shared) point of failure
• Deduplicated data is volatile
– “Templates” can be lost (e.g. while migrating)
– Too big pool with shared data can be easily shrunk
17
Virtuozzo IO stackVirtuozzo IO stack
18
Ext4
Container
processes
PLoop device
Image file
PF-CachePF-Cache
19
Ext4
PLoop device
Cache area
Cache link (xattr)Ext4
Container
processes
PLoop device
Image file Image file
Cache and cache link behaviorCache and cache link behavior
• Cache area
– target file name is sha1 sum of the contents
– files are created by user-space daemon
– cache size is limited by ploop
• Cache link
– created automatically upon file creation
– dropped when file is opened for writing
– Is kept during metadata update (chown/chmod)
20
Density resultsDensity results
21
Future workFuture work
• PLoop is available in OpenVZ & Virtuozzo
– Upstream WIP
• IO deduplication in the upstream
– Issue raied at 2013'th LSFMM
– DM-thin/btrfs IO dedup for containers
– KSM++ for VM-s
22
Thank you
xemul@odin.com
Thank you
xemul@odin.com

PFcache - LinuxCon 2015

  • 1.
    Denser containers withPFCacheDenser containers with PFCache Pavel Emelyanov ContainerCon, Seattle, 2015
  • 2.
    AgendaAgenda • How tostore container files • Why shared template matters • What can be deduplicated and what should be • PFCache • Q&A 2
  • 3.
    How to storecontainer filesHow to store container files 3 Filesystem Container processes
  • 4.
    How to storecontainer filesHow to store container files 4 Filesystem Container processes Block device NetworkHost Filesystem Host block device Hardware
  • 5.
    How to storecontainer files (1)How to store container files (1) 5 Filesystem Container processes Block device NetworkHost Filesystem Host block device Hardware Chroot() Union FS
  • 6.
    How to storecontainer files (2)How to store container files (2) 6 Filesystem Container processes Block device NetworkHost Filesystem Host block device Hardware Loop device ZFS ZVol BTRFS subvolume PLoop
  • 7.
    What's PLoopWhat's PLoop •Loop device plus – AIO for better performance – Snapshots – QCOW2-like format for thin provisioning – Thin provisionong itself • Upstreaming work in progress 7
  • 8.
    How to storecontainer files (3)How to store container files (3) 8 Filesystem Container processes Block device NetworkHost Filesystem Host block device Hardware LVM DM-thin
  • 9.
    How to storecontainer files (4)How to store container files (4) 9 Filesystem Container processes Block device NetworkHost Filesystem Host block device Hardware NBD Ceph RBD iSCSI
  • 10.
    How to storecontainer files (5)How to store container files (5) 10 Filesystem Container processes Block device NetworkHost Filesystem Host block device Hardware NFS GFS2 OCFS Ceph
  • 11.
    Containers vs TemplatesContainersvs Templates • Containers ... – are massively cloned from pre-created “templates” – do not have direct access to the underlying (block) storage • Identical data can be effectively deduplicated – Higher density – Lower IO and/or memory consumption 11
  • 12.
    Who can doshared templatesWho can do shared templates 12 Storage OpenVZ Docker LXC Union FSs + + + Btrfs + DM-thin + PLoop + Ceph ZFS +
  • 13.
    What can bede-duplicatedWhat can be de-duplicated 13 Filesystem Container processes Block device Network
  • 14.
    What can bede-duplicatedWhat can be de-duplicated 14 Filesystem Container processes Block device Network Page cache Cached pages
  • 15.
    What can bede-duplicatedWhat can be de-duplicated 15 Filesystem Container processes Block device Network Page cache Cached pages IO flow
  • 16.
    What is deduplicatedWhatis deduplicated 16 Storage Memory IO Union FSs + + Btrfs +/- DM-thin PLoop + + Ceph ZFS
  • 17.
    Additional OpenVZ constraintsAdditionalOpenVZ constraints • Containers disks are independent image files – Can be easily copied across nodes – No single (shared) point of failure • Deduplicated data is volatile – “Templates” can be lost (e.g. while migrating) – Too big pool with shared data can be easily shrunk 17
  • 18.
    Virtuozzo IO stackVirtuozzoIO stack 18 Ext4 Container processes PLoop device Image file
  • 19.
    PF-CachePF-Cache 19 Ext4 PLoop device Cache area Cachelink (xattr)Ext4 Container processes PLoop device Image file Image file
  • 20.
    Cache and cachelink behaviorCache and cache link behavior • Cache area – target file name is sha1 sum of the contents – files are created by user-space daemon – cache size is limited by ploop • Cache link – created automatically upon file creation – dropped when file is opened for writing – Is kept during metadata update (chown/chmod) 20
  • 21.
  • 22.
    Future workFuture work •PLoop is available in OpenVZ & Virtuozzo – Upstream WIP • IO deduplication in the upstream – Issue raied at 2013'th LSFMM – DM-thin/btrfs IO dedup for containers – KSM++ for VM-s 22
  • 23.