SlideShare a Scribd company logo
1
Maxim Patlasov
<mpatlasov@parallels.com>
Containers in a File
2
Agenda
•Containers overview
•Current way of keeping container files and its problems
•Basics of container-in-a-file approach
•Limitations of existing facilities
•Solution
•Benefits
•Plans for future
3
Virtualization: VM-style vs. containers
CT 1
Hardware Hardware
Host OSHost OS / Hypervisor
OS Virtualization LayerVirtual Machine Monitor
Virtual Hardware Virtual Hardware
Guest OS Guest OS
VM 1 VM 2
CT 2
Containers:
•OpenVZ / Parallels Containers
•Solaris containers/Zones
•FreeBSD jails
VMMs:
•KVM / Qemu
•VMware
•Parallels Hypervisor
4
Containers Virtualization
Each container has its own:
•Files
•Process tree
•Network
•Devices
•IPC objects
5
Containers: file system view
Natural approach: each container chroot-s to a per-container sub-tree of
some system-wide host file system.
CT 1 CT 2
Block device
Host file system
/
1 2
CTs
chroot chroot
6
Problems
•File system journal is a bottleneck:
1. CT1 performs a lot of operations leading to metadata updates (e.g.
truncates). Consequently, the journal is getting full.
2. CT2 issuing a single I/O operation blocks until journal checkpoint is
completed. Up to 15 seconds in some tests!
Host file system100% full
Journal
Lots of random writes
7
Problems (cont.)
•Lots of small-size files I/O on any CT operation (backup, clone)
•Sub-tree disk quota support is absent in mainstream kernel
•Hard to manage:
•No per-container snapshots, live backup is problematic
•Live migration – rsync unreliable, changed inode numbers
•File system type and its properties are fixed for all containers
•Need to limit number of inodes per container
8
Container in a file
Basic idea: assign virtual block device to container, keep container’ file-
system on top of virtual block device.
CT 1
Per-container
file systems
Host file system
Per-container
virtual block devices
virtual block device
image-file image-file
virtual block device
/ /
CT 2
9
LVM limitations
•Not flexible enough – works only on top of block device
•Hard to manage (e.g. how to migrate huge volume?)
•No dynamic allocation (--virtualsize 16T --size 16M)
•Management is not as convenient as for files
Block device
Per-container
file systems
Per-container
logical volumes
CT 1
/ /
CT 2
Logical volume Logical volume
10
Ordinary loop device limitations
•VFS operations leads to double page-caching
•No dynamic allocation (only “plain” format is supported)
•No helps to backup/migrate containers
•No snapshot functionality
Per-container
file systems
Per-container
loop devices
CT 1
/ /
CT 2
/dev/loop1 /dev/loop2
image-file1 image-file2 Host file system
11
Design of new loop device (ploop)
main ploop module
Container file system
kernel block layer
mount
physical medium
image file
Maps virtual
block number
to image
block number
Maps file offset
to physical sector
I/O module
DIO
VFS
format module
plain
QCOW
ploop block device
v-blocks
i-blocks
expandable
NFS
12
Stacked image configuration
Ploop snapshots are represented by image files. They are stuffed in ploop
device in stacked manner and obey the following rules:
•Only top image is opened in RW mode (others – RO)
•Every mapping bears info about ‘level’
•READ leads to access to an image of proper ‘level’
•WRITE may lead to transferring proper block to top image
Snapshot
image1
(RW)
image
(RO)
level 1
level 0
top
WRITE
Copy (if any)
complete
READ
complete
READ
complete
13
Backup via snapshot
Operations involved:
•Create snapshot
•Backup base image
•Merge
CT
image
ploop
bdev
snapshot CT
ploop
bdev
image
(RW)
(RO)
copy
image.tmp
(RW)
merge CT
ploop
bdev
image
(RW)
image.tmp
merge
Key features:
•On-line
•consistent
14
Migrate via write-tracker
Operations involved:
•Turn write-tracker on
•Iterative copy
•Freeze container
•Copy the rest
time
track on, copy all
check tracker, copy modified
check tracker, copy the rest
image a copy of image
< freeze container>
Key features:
•On-line
•I/O efficient
15
Problems solved
•File system journal is not bottleneck anymore
1. CT1 performs a lot of operations leading to metadata updates (e.g.
truncates). Consequently, the journal is getting full.
2. CT2 has its own file system. So, it’s not directly affected by CT1.
CT1 file system100% full
Journal
Lots of random writes
CT2 file system
16
Problems solved (cont.)
•Large-size image files I/O instead of lots of small-size files I/O on
management operations
•Disk quota can be implemented based on virtual device sizes. No need
for sub-tree quotas
•Live backup is easy and consistent
•Live migration is reliable and efficient
•Different containers may use file systems of different types and properties
•No need to limit “number-of-inodes-per-container”
17
Additional benefits
•Efficient container creation
•Snapshot/merge feature
•Can support QCOW2 and other image formats
•Can support arbitrary storage types
•Can help KVM to keep all disk I/O in-kernel
18
Plans
•Implement I/O module on top of vfs ops
•Add support of QCOW2 image format
•Optimizations:
•discard/TRIM events support
•online images defragmentation
•support sparse files
•Integration into mainline kernel
19
Questions

More Related Content

What's hot

BiTteRsweet FS
BiTteRsweet FSBiTteRsweet FS
BiTteRsweet FS
Asociatia ProLinux
 
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating SystemProcess, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
LieYah Daliah
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
Aparna Bhadran
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
Priyam Pandey
 
A20 dfs2
A20 dfs2A20 dfs2
A20 dfs2
Pradeep Kumar
 
Processes and Threads in Windows Vista
Processes and Threads in Windows VistaProcesses and Threads in Windows Vista
Processes and Threads in Windows Vista
Trinh Phuc Tho
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
ashish61_scs
 
Scalability fs v2
Scalability fs v2Scalability fs v2
Scalability fs v2
Nguyen Giang
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
Siddhartha Anand
 
File replication
File replicationFile replication
File replication
Klawal13
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
MongoDB
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
Mr SMAK
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
Haris456
 
Memory management in linux
Memory management in linuxMemory management in linux
Memory management in linux
Dr. C.V. Suresh Babu
 
10 replication
10 replication10 replication
10 replication
ashish61_scs
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
Suvendu Kumar Dash
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07
timcrack
 
Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5
Abu Bakr Ramadan
 
Process management in linux
Process management in linuxProcess management in linux
Process management in linux
Mazenetsolution
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Torsten Seemann
 

What's hot (20)

BiTteRsweet FS
BiTteRsweet FSBiTteRsweet FS
BiTteRsweet FS
 
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating SystemProcess, Threads, Symmetric Multiprocessing and Microkernels in Operating System
Process, Threads, Symmetric Multiprocessing and Microkernels in Operating System
 
SYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSINGSYNCHRONIZATION IN MULTIPROCESSING
SYNCHRONIZATION IN MULTIPROCESSING
 
Cache coherence
Cache coherenceCache coherence
Cache coherence
 
A20 dfs2
A20 dfs2A20 dfs2
A20 dfs2
 
Processes and Threads in Windows Vista
Processes and Threads in Windows VistaProcesses and Threads in Windows Vista
Processes and Threads in Windows Vista
 
18 philbe replication stanford99
18 philbe replication stanford9918 philbe replication stanford99
18 philbe replication stanford99
 
Scalability fs v2
Scalability fs v2Scalability fs v2
Scalability fs v2
 
Multithreaded processors ppt
Multithreaded processors pptMultithreaded processors ppt
Multithreaded processors ppt
 
File replication
File replicationFile replication
File replication
 
Backup, Restore, and Disaster Recovery
Backup, Restore, and Disaster RecoveryBackup, Restore, and Disaster Recovery
Backup, Restore, and Disaster Recovery
 
Lecture 6.1
Lecture  6.1Lecture  6.1
Lecture 6.1
 
Multithreading computer architecture
 Multithreading computer architecture  Multithreading computer architecture
Multithreading computer architecture
 
Memory management in linux
Memory management in linuxMemory management in linux
Memory management in linux
 
10 replication
10 replication10 replication
10 replication
 
Linux Memory Management
Linux Memory ManagementLinux Memory Management
Linux Memory Management
 
10 Multicore 07
10 Multicore 0710 Multicore 07
10 Multicore 07
 
Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5Real time operating systems (rtos) concepts 5
Real time operating systems (rtos) concepts 5
 
Process management in linux
Process management in linuxProcess management in linux
Process management in linux
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 

Similar to Containers in a File

PFcache (Linuxcon, Seattle, 2015)
PFcache (Linuxcon, Seattle, 2015)PFcache (Linuxcon, Seattle, 2015)
PFcache (Linuxcon, Seattle, 2015)
Pavel Emelyanov
 
PFcache - LinuxCon 2015
PFcache - LinuxCon 2015PFcache - LinuxCon 2015
PFcache - LinuxCon 2015
OpenVZ
 
Denser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovDenser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel Emelyanov
OpenVZ
 
Containers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux Kernel
OpenVZ
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
MongoDB
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
MongoDB
 
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
DataStax Academy
 
Advanced Storage Area Network
Advanced Storage Area NetworkAdvanced Storage Area Network
Advanced Storage Area Network
Soumee Maschatak
 
File000128
File000128File000128
File000128
Desmond Devendran
 
Exploring Docker Security
Exploring Docker SecurityExploring Docker Security
Exploring Docker Security
Patrick Kleindienst
 
CvmFS Workshop
CvmFS WorkshopCvmFS Workshop
CvmFS Workshop
Steve Traylen
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
MongoDB
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
Nikos Kormpakis
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
Ted Jung
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016
Phil Estes
 
.ppt
.ppt.ppt
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
Kuniyasu Suzaki
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
hik_lhz
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
MongoDB
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Adam Dunkels
 

Similar to Containers in a File (20)

PFcache (Linuxcon, Seattle, 2015)
PFcache (Linuxcon, Seattle, 2015)PFcache (Linuxcon, Seattle, 2015)
PFcache (Linuxcon, Seattle, 2015)
 
PFcache - LinuxCon 2015
PFcache - LinuxCon 2015PFcache - LinuxCon 2015
PFcache - LinuxCon 2015
 
Denser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel EmelyanovDenser containers with PF cache - Pavel Emelyanov
Denser containers with PF cache - Pavel Emelyanov
 
Containers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux KernelContainers and Namespaces in the Linux Kernel
Containers and Namespaces in the Linux Kernel
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Deployment Strategies
Deployment StrategiesDeployment Strategies
Deployment Strategies
 
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
Cassandra Day Atlanta 2015: Recording the Web: High-Fidelity Storage and Play...
 
Advanced Storage Area Network
Advanced Storage Area NetworkAdvanced Storage Area Network
Advanced Storage Area Network
 
File000128
File000128File000128
File000128
 
Exploring Docker Security
Exploring Docker SecurityExploring Docker Security
Exploring Docker Security
 
CvmFS Workshop
CvmFS WorkshopCvmFS Workshop
CvmFS Workshop
 
Deployment Strategy
Deployment StrategyDeployment Strategy
Deployment Strategy
 
Ceph in the GRNET cloud stack
Ceph in the GRNET cloud stackCeph in the GRNET cloud stack
Ceph in the GRNET cloud stack
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016Live Container Migration: OpenStack Summit Barcelona 2016
Live Container Migration: OpenStack Summit Barcelona 2016
 
.ppt
.ppt.ppt
.ppt
 
”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016”Bare-Metal Container" presented at HPCC2016
”Bare-Metal Container" presented at HPCC2016
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
What'sNnew in 3.0 Webinar
What'sNnew in 3.0 WebinarWhat'sNnew in 3.0 Webinar
What'sNnew in 3.0 Webinar
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
Building the Internet of Things with Thingsquare and Contiki - day 2 part 1
 

More from OpenVZ

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
OpenVZ
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
OpenVZ
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
OpenVZ
 
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
OpenVZ
 
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновЖивая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
OpenVZ
 
What's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovWhat's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey Bronnikov
OpenVZ
 
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховПроблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
OpenVZ
 
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировРазвёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
OpenVZ
 
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан КупреевCRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
OpenVZ
 
LibCT и контейнеры на уровне приложений -- Александр Бурлука
	LibCT и контейнеры на уровне приложений -- Александр Бурлука	LibCT и контейнеры на уровне приложений -- Александр Бурлука
LibCT и контейнеры на уровне приложений -- Александр Бурлука
OpenVZ
 
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовУправление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
OpenVZ
 
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновЖивая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
OpenVZ
 
LibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginLibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey Vagin
OpenVZ
 
CGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovCGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel Emelyanov
OpenVZ
 
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
OpenVZ
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
OpenVZ
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
OpenVZ
 
Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ
OpenVZ
 
Containers in a file
Containers in a fileContainers in a file
Containers in a file
OpenVZ
 

More from OpenVZ (20)

Speeding up ps and top
Speeding up ps and topSpeeding up ps and top
Speeding up ps and top
 
Live migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel EmelyanovLive migration: pros, cons and gotchas -- Pavel Emelyanov
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel EmelyanovLive migrating a container: pros, cons and gotchas -- Pavel Emelyanov
Live migrating a container: pros, cons and gotchas -- Pavel Emelyanov
 
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir KolyshkinCRIU: time and space travel for Linux containers -- Kir Kolyshkin
CRIU: time and space travel for Linux containers -- Kir Kolyshkin
 
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
Тестирование ПО, основанного на сторонних компонентах - Денис Силаков, SECR 2015
 
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел ЕмельяновЖивая миграция: плюсы, минусы и подводные камни - Павел Емельянов
Живая миграция: плюсы, минусы и подводные камни - Павел Емельянов
 
What's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey BronnikovWhat's missing from upstream kernel containers? - Sergey Bronnikov
What's missing from upstream kernel containers? - Sergey Bronnikov
 
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий МонаховПроблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
Проблема фрагментации виртуальных дисков и способы её решения -- Дмитрий Монахов
 
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел ТихомировРазвёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
Развёртывание приложений Docker в контейнерах Virtuozzo -- Павел Тихомиров
 
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан КупреевCRIU: ускорение запуска PHP в CloudLinux OS  -- Руслан Купреев
CRIU: ускорение запуска PHP в CloudLinux OS -- Руслан Купреев
 
LibCT и контейнеры на уровне приложений -- Александр Бурлука
	LibCT и контейнеры на уровне приложений -- Александр Бурлука	LibCT и контейнеры на уровне приложений -- Александр Бурлука
LibCT и контейнеры на уровне приложений -- Александр Бурлука
 
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир ДавыдовУправление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
Управление памятью контейнеров в проекте OpenVZ -- Владимир Давыдов
 
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел ЕмельяновЖивая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
Живая миграция контейнеров: плюсы, минусы, подводные камни -- Павел Емельянов
 
LibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey VaginLibCT: one lib to rule them all -- Andrey Vagin
LibCT: one lib to rule them all -- Andrey Vagin
 
CGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel EmelyanovCGroups kernel memory controller -- Pavel Emelyanov
CGroups kernel memory controller -- Pavel Emelyanov
 
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
What's missing from upstream kernel containers? - Kir Kolyshkin, Sergey Bronn...
 
Not so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir KolyshkinNot so brief history of Linux Containers - Kir Kolyshkin
Not so brief history of Linux Containers - Kir Kolyshkin
 
Openvz booth
Openvz boothOpenvz booth
Openvz booth
 
Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ Управление ресурсами в Linux и OpenVZ
Управление ресурсами в Linux и OpenVZ
 
Containers in a file
Containers in a fileContainers in a file
Containers in a file
 

Recently uploaded

Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
brainerhub1
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
Quickdice ERP
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
kalichargn70th171
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Julian Hyde
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
Hornet Dynamics
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
Rakesh Kumar R
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
Hornet Dynamics
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
Deuglo Infosystem Pvt Ltd
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
TheSMSPoint
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
Remote DBA Services
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Envertis Software Solutions
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
Green Software Development
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
Remote DBA Services
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Neo4j
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
Peter Muessig
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
Neo4j
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
Łukasz Chruściel
 

Recently uploaded (20)

Unveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdfUnveiling the Advantages of Agile Software Development.pdf
Unveiling the Advantages of Agile Software Development.pdf
 
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian CompaniesE-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
E-Invoicing Implementation: A Step-by-Step Guide for Saudi Arabian Companies
 
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf8 Best Automated Android App Testing Tool and Framework in 2024.pdf
8 Best Automated Android App Testing Tool and Framework in 2024.pdf
 
Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)Measures in SQL (SIGMOD 2024, Santiago, Chile)
Measures in SQL (SIGMOD 2024, Santiago, Chile)
 
E-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet DynamicsE-commerce Development Services- Hornet Dynamics
E-commerce Development Services- Hornet Dynamics
 
Fundamentals of Programming and Language Processors
Fundamentals of Programming and Language ProcessorsFundamentals of Programming and Language Processors
Fundamentals of Programming and Language Processors
 
E-commerce Application Development Company.pdf
E-commerce Application Development Company.pdfE-commerce Application Development Company.pdf
E-commerce Application Development Company.pdf
 
Empowering Growth with Best Software Development Company in Noida - Deuglo
Empowering Growth with Best Software  Development Company in Noida - DeugloEmpowering Growth with Best Software  Development Company in Noida - Deuglo
Empowering Growth with Best Software Development Company in Noida - Deuglo
 
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit ParisNeo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
Neo4j - Product Vision and Knowledge Graphs - GraphSummit Paris
 
Transform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR SolutionsTransform Your Communication with Cloud-Based IVR Solutions
Transform Your Communication with Cloud-Based IVR Solutions
 
Oracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptxOracle 23c New Features For DBAs and Developers.pptx
Oracle 23c New Features For DBAs and Developers.pptx
 
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise EditionWhy Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
Why Choose Odoo 17 Community & How it differs from Odoo 17 Enterprise Edition
 
GreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-JurisicGreenCode-A-VSCode-Plugin--Dario-Jurisic
GreenCode-A-VSCode-Plugin--Dario-Jurisic
 
Oracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptxOracle Database 19c New Features for DBAs and Developers.pptx
Oracle Database 19c New Features for DBAs and Developers.pptx
 
OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024OpenMetadata Community Meeting - 5th June 2024
OpenMetadata Community Meeting - 5th June 2024
 
Atelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissancesAtelier - Innover avec l’IA Générative et les graphes de connaissances
Atelier - Innover avec l’IA Générative et les graphes de connaissances
 
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemUI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
UI5con 2024 - Keynote: Latest News about UI5 and it’s Ecosystem
 
Vitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdfVitthal Shirke Java Microservices Resume.pdf
Vitthal Shirke Java Microservices Resume.pdf
 
GraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph TechnologyGraphSummit Paris - The art of the possible with Graph Technology
GraphSummit Paris - The art of the possible with Graph Technology
 
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf2024 eCommerceDays Toulouse - Sylius 2.0.pdf
2024 eCommerceDays Toulouse - Sylius 2.0.pdf
 

Containers in a File

  • 2. 2 Agenda •Containers overview •Current way of keeping container files and its problems •Basics of container-in-a-file approach •Limitations of existing facilities •Solution •Benefits •Plans for future
  • 3. 3 Virtualization: VM-style vs. containers CT 1 Hardware Hardware Host OSHost OS / Hypervisor OS Virtualization LayerVirtual Machine Monitor Virtual Hardware Virtual Hardware Guest OS Guest OS VM 1 VM 2 CT 2 Containers: •OpenVZ / Parallels Containers •Solaris containers/Zones •FreeBSD jails VMMs: •KVM / Qemu •VMware •Parallels Hypervisor
  • 4. 4 Containers Virtualization Each container has its own: •Files •Process tree •Network •Devices •IPC objects
  • 5. 5 Containers: file system view Natural approach: each container chroot-s to a per-container sub-tree of some system-wide host file system. CT 1 CT 2 Block device Host file system / 1 2 CTs chroot chroot
  • 6. 6 Problems •File system journal is a bottleneck: 1. CT1 performs a lot of operations leading to metadata updates (e.g. truncates). Consequently, the journal is getting full. 2. CT2 issuing a single I/O operation blocks until journal checkpoint is completed. Up to 15 seconds in some tests! Host file system100% full Journal Lots of random writes
  • 7. 7 Problems (cont.) •Lots of small-size files I/O on any CT operation (backup, clone) •Sub-tree disk quota support is absent in mainstream kernel •Hard to manage: •No per-container snapshots, live backup is problematic •Live migration – rsync unreliable, changed inode numbers •File system type and its properties are fixed for all containers •Need to limit number of inodes per container
  • 8. 8 Container in a file Basic idea: assign virtual block device to container, keep container’ file- system on top of virtual block device. CT 1 Per-container file systems Host file system Per-container virtual block devices virtual block device image-file image-file virtual block device / / CT 2
  • 9. 9 LVM limitations •Not flexible enough – works only on top of block device •Hard to manage (e.g. how to migrate huge volume?) •No dynamic allocation (--virtualsize 16T --size 16M) •Management is not as convenient as for files Block device Per-container file systems Per-container logical volumes CT 1 / / CT 2 Logical volume Logical volume
  • 10. 10 Ordinary loop device limitations •VFS operations leads to double page-caching •No dynamic allocation (only “plain” format is supported) •No helps to backup/migrate containers •No snapshot functionality Per-container file systems Per-container loop devices CT 1 / / CT 2 /dev/loop1 /dev/loop2 image-file1 image-file2 Host file system
  • 11. 11 Design of new loop device (ploop) main ploop module Container file system kernel block layer mount physical medium image file Maps virtual block number to image block number Maps file offset to physical sector I/O module DIO VFS format module plain QCOW ploop block device v-blocks i-blocks expandable NFS
  • 12. 12 Stacked image configuration Ploop snapshots are represented by image files. They are stuffed in ploop device in stacked manner and obey the following rules: •Only top image is opened in RW mode (others – RO) •Every mapping bears info about ‘level’ •READ leads to access to an image of proper ‘level’ •WRITE may lead to transferring proper block to top image Snapshot image1 (RW) image (RO) level 1 level 0 top WRITE Copy (if any) complete READ complete READ complete
  • 13. 13 Backup via snapshot Operations involved: •Create snapshot •Backup base image •Merge CT image ploop bdev snapshot CT ploop bdev image (RW) (RO) copy image.tmp (RW) merge CT ploop bdev image (RW) image.tmp merge Key features: •On-line •consistent
  • 14. 14 Migrate via write-tracker Operations involved: •Turn write-tracker on •Iterative copy •Freeze container •Copy the rest time track on, copy all check tracker, copy modified check tracker, copy the rest image a copy of image < freeze container> Key features: •On-line •I/O efficient
  • 15. 15 Problems solved •File system journal is not bottleneck anymore 1. CT1 performs a lot of operations leading to metadata updates (e.g. truncates). Consequently, the journal is getting full. 2. CT2 has its own file system. So, it’s not directly affected by CT1. CT1 file system100% full Journal Lots of random writes CT2 file system
  • 16. 16 Problems solved (cont.) •Large-size image files I/O instead of lots of small-size files I/O on management operations •Disk quota can be implemented based on virtual device sizes. No need for sub-tree quotas •Live backup is easy and consistent •Live migration is reliable and efficient •Different containers may use file systems of different types and properties •No need to limit “number-of-inodes-per-container”
  • 17. 17 Additional benefits •Efficient container creation •Snapshot/merge feature •Can support QCOW2 and other image formats •Can support arbitrary storage types •Can help KVM to keep all disk I/O in-kernel
  • 18. 18 Plans •Implement I/O module on top of vfs ops •Add support of QCOW2 image format •Optimizations: •discard/TRIM events support •online images defragmentation •support sparse files •Integration into mainline kernel