SlideShare a Scribd company logo
Linux Kernel Extensions
for Databases
Alexander Krizhanovsky
Tempesta Technologies, Inc.
ak@tempesta-tech.com
Who am I?
CEO & CTO at NatSys Lab & Tempesta Technologies
Tempesta Technologies (Seattle, WA)
● Subsidiary of NatSys Lab. developing Tempesta FW – a first and
only hybrid of HTTP accelerator and firewall for DDoS mitigation &
WAF
NatSys Lab (Moscow, Russia)
● Custom software development in:
– high performance network traffic processing
– databases
The begin
(many years ago)
Database to store Instant messenger's history
Plenty of data (NoSQL, 3-touple key)
High performance
Some consistency (no transactions)
2-3 months (quick prototype)
Simple DBMS
Disclamer:
memory & I/O only,
no index,
no locking,
no queries
“DBMS” means
InnoDB
Linux VMM?
open(O_DIRECT): OS kernel bypass
«In short, the whole "let's bypass the OS" notion is just fundamentally
broken. It sounds simple, but it sounds simple only to an idiot who writes
databases and doesn't even UNDERSTAND what an OS is meant to
do.»
Linus Torvalds
«Re: O_DIRECT question»
https://lkml.org/lkml/2007/1/11/129
mmap(2)!
Automatic page eviction
Transparrent persistency
I/O is managed by OS
...and ever radix tree index for free!
x86-64 page table
(radix tree)
A tree in the tree
a.0
c.0
a.1
c.1
Page table
a b
c
Application Tree
mmap(2): index for free
$ grep 6f0000000000 /proc/[0-9]*/maps
$
DbItem *db = mmap(0x6f0000000000, 0x40000000 /* 1GB */, ...);
DbItem *x = (DbItem *)(0x6f0000000000 + key);
...or just an array
DbItem *db = mmap(0, 0x40000000 /* 1GB */, ...);
DbItem *x = &db[key];
Virtual memory isn't for free
TLB cache is small (~1024 entries, i.e. 4MB)
TLB cache miss is up to 4 memory transfers
Spacial locality is crucial: 1 address outlier is up to 12KB
…but Linux VMM coalesces
memory areas
Context switch of user-space
processes invalidates TLB
...but threads and user/kernel
context switches are cheap
Lesson 1
Large mmap()'s are expensive
Spacial locality is your friend
Kernel mappings are resistant to context switches
DBMS vs OS
Stonebreaker, "Operating System Support for Database
Management”, 1981
● OS buffers (pages) force out with controlled order
● Record-oriented FS (block != record)
● Data consistency control (transactions)
● FS blocks physical contiguity
● Tree structured FS: a tree in a tree
● Scheduling, process management and IPC
Filesystem: extents
Modern Linux filesystems: BtrFS, EXT4, XFS
Large contigous space is allocated at once
Per-extent addressing
Lesson 2
There are no (or small) file blocks fragmentation
There are no trees inside extent
fallocate(2) became my new friend
Transactions and consistency control
(InnoDB case)
Lesson 3
Atomicity: which pages and when are written
Database operates on record granularity
Q: Can modern filesystem do this for us?
Log-enhanced filesystems
XFS – metadata only
EXT4 – metadata and data
Log writes are sequential, data updates are batched
Double writes on data updates
Log-structured filesystems
BSD LFS, Nilfs2
Dirty data blocks are written to next available segment
Changed metadata is also written to new location
...so poor performance on data updates
Inodes aren't at fixed location → inode map
Garbage collection of dead blocks (with significant overhead)
Poor fragmentation on large files → slow updates
Copy-On-Write filesystems
BtrFS, ZFS
Whole tree branches are COW'ed
Constant root place
Still fragmentation issues and heavy write loads
Very poor at random writes (OLTP), better for OLAP
Soft Updates
BSD UFS2
Proper ordering to keep filesystem structure consistent (metadata)
Garbage collection to gather lost data blocks
Knows about filesystem metadata, not about stored data
Lesson 4:
Data consistency control
Can log-enhanced data journaling FS replace doublewrite buffer?
https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4-
transactions/
, by Yves Trudeau, Percona.
NOT!
● Filesystem gurantees data block consistency, not group of blocks!
Lesson 5
Modern Linux filesystems are unstructured
Page eviction
Typically current process reclaims memory
kswapd – alloc_pages() slow path
OOM
active list
inactive list
add
freeP
P
P
P
P P
P
P
referenced
File synchronization syscalls
open(.., O_SYNC | O_DSYNC)
fsync(int fd)
fdatasync(int fd)
msync(int *addr, size_t len,..)
sync_file_range(int fd, off64_t off, off64_t nbytes,..)
File synchronization syscalls
open(.., O_SYNC | O_DSYNC)
fsync(int fd)
fdatasync(int fd)
msync(int *addr, size_t len,..)
sync_file_range(int fd, off64_t off, off64_t nbytes,..)
No page subset syncrhonization
write(fd, buf, 1GB) – isn't atomic against system failure
Some pages can be flushed before synchronization
Flush out advises
posix_fadvise(int fd, off_t offset, off_t len, int advice)
● POSIX_FADV_DONTNEED – invalidate specified pages
int invalidate_inode_page(struct page *page) {
if (PageDirty(page) || PageWriteback(page))
return 0;
madvise(void *addr, size_t length, int advice)
MADV_DONTNEED – unmap page table entries, initializes dirty
pages flushing
Lesson 6:
Linux VMM as DBMS engine?
Linux VMM
● evicts dirty pages
● it doesn't know exactly whether they're still needed (DONTNEED!)
● nobody knows when the pages are synced
● checkpoint is typically full database file sync
● performance: locked scans for free/clean pages by timeouts and
no-memory
Don't use mmap() if you want consistency!
Transactional filesystems: Reiser4
Hybrid TM: Journaling or Write-Anywhere (Copy-On-Write)
Only small data block writes are transactional
Full transaction support for large writes isn't implemented
Transactional filesystems: BtrFS
Uses log-trees, so [probaly] can be used instead of doublewrite buffer
ioctl(): BTRFS_IOC_TRANS_START and BTRFS_IOC_TRANS_END
Transactional filesystems: others
Valor
R.P.Spillane et al, “Enabling Transactional File Access via Lightweight Kernel
Extensions”, FAST'09
● Transactions: kernel module betweem VFS and filesystem
● New transactional syscalls (log begin, log append, log resolve,
transaction sync, locking)
● patched pdflush for eviction in proper order
Windows TxF
● deprecated
Transactional operating systems
TxOS
D.E.Porter et al., “Operating System Transactions”, SOSP'09
● Almost any sequence of syscalls can run in transactional context
● New transactional syscalls (sys_xbegin, sys_xend, sys_xabort)
● Alters kernel data structures by transactional headers
● Shadow-copies consistent data structures
● Properly resolves conflicts between transactional and non-
transactional calls
The same OS does the right job
Failure-atomic msync()
S.Park et al., “Failure-Atomic msync(): A Simple and Efficient Mechanism for
Preserving the Integrity of Durable Data”, Eurosys'13.
● No voluntary page writebacks: MAP_ATOMIC for mmap()
● Jounaled writeback
– msync()
– REDO logging
– page writebacks
Record-oriented filesystem
OpenVMS Record Management Service (RMS)
● Record formats: fixed length, variable length, stream
● Access methods: sequential, relative record number,
record address, index
● sys$get() & sys$put() instead of read() and write()
TempestaDB
Is part of TempestaFW (a hybrid of firewall and Web-accelerator)
In-memory database for Web-cache and firewall rules (must be fast!)
Stonebreaker's “The Traditional RDBMS Wisdom is All Wrong”
Accessed from kernel space (softirq!) as well as user space
Can be concurrently accessed by many processes
In-progress development
Kernel database for Web-accelerator?
Transport
http://natsys-lab.blogspot.ru/2015/03/linux-netlink-mmap-bulk-data-transfer.html
Collect query results → copy to some buffer
Zero-copy mmap() to user-space
Show to user
TempestaDB internals
Preallocates large pool of huge pages at boot time
● so full DB file mmap() is compensated by huge pages
● 2MB extent = huge page
Tdbfs is used for custom mmap() and msync() for persistency
mmap() => record-orientation out of the box
No-steal force or no-force buffer management
no need for doublewrite buffer
undo and redo logging is up to application
Automatic cache eviction
TempestaDB internals
TempestaDB: trx write (no-steal)
TempestaDB: commit
TempestaDB: commit (force)
TempestaDB: commit (no-force)
TempestaDB: cache eviction
NUMA replication
NUMA sharding
Memory optimized
Cache conscious Burst Hash Trie
● short offsets instead of pointers
● (almost) lock-free
lock-free block allocator for virtually contiguous memory
Burst Hash Trie
Burst Hash Trie
Burst Hash Trie
Burst Hash Trie
Burst Hash Trie: transactions
Thanks!
Availability: https://github.com/tempesta-tech/tempesta
Blog: http://natsys-lab.blogspot.com
E-mail: ak@tempesta-tech.com
We are hiring!

More Related Content

What's hot

PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
Tomas Vondra
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
PostgreSQL-Consulting
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
Ontico
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
Tomas Vondra
 
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
odnoklassniki.ru
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
Tomas Vondra
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Ontico
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Ontico
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
Laurynas Biveinis
 
My talk from PgConf.Russia 2016
My talk from PgConf.Russia 2016My talk from PgConf.Russia 2016
My talk from PgConf.Russia 2016
Alex Chistyakov
 
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Ontico
 
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Ontico
 
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
odnoklassniki.ru
 
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Ontico
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica set
Sudheer Kondla
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Ontico
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Ontico
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
Reuven Lerner
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance Algorithms
Laurynas Biveinis
 
Open Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraOpen Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second era
Sveta Smirnova
 

What's hot (20)

PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
PostgreSQL na EXT4, XFS, BTRFS a ZFS / FOSDEM PgDay 2016
 
Как PostgreSQL работает с диском
Как PostgreSQL работает с дискомКак PostgreSQL работает с диском
Как PostgreSQL работает с диском
 
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
ToroDB: scaling PostgreSQL like MongoDB / Álvaro Hernández Tortosa (8Kdata)
 
PostgreSQL performance archaeology
PostgreSQL performance archaeologyPostgreSQL performance archaeology
PostgreSQL performance archaeology
 
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
Being closer to Cassandra by Oleg Anastasyev. Talk at Cassandra Summit EU 2013
 
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFSPostgreSQL on EXT4, XFS, BTRFS and ZFS
PostgreSQL on EXT4, XFS, BTRFS and ZFS
 
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...Frontera распределенный робот для обхода веба в больших объемах / Александр С...
Frontera распределенный робот для обхода веба в больших объемах / Александр С...
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...100500 способов кэширования в Oracle Database или как достичь максимальной ск...
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
 
XtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithmsXtraDB 5.7: key performance algorithms
XtraDB 5.7: key performance algorithms
 
My talk from PgConf.Russia 2016
My talk from PgConf.Russia 2016My talk from PgConf.Russia 2016
My talk from PgConf.Russia 2016
 
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
Spilo, отказоустойчивый PostgreSQL кластер / Oleksii Kliukin (Zalando SE)
 
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
Как мы сделали PHP 7 в два раза быстрее PHP 5 / Дмитрий Стогов (Zend Technolo...
 
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
Add a bit of ACID to Cassandra. Cassandra Summit EU 2014
 
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
Как построить видеоплатформу на 200 Гбитс / Ольховченков Вячеслав (Integros)
 
Setting up mongo replica set
Setting up mongo replica setSetting up mongo replica set
Setting up mongo replica set
 
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
Новые возможности полнотекстового поиска в PostgreSQL / Олег Бартунов (Postgr...
 
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...Мастер-класс "Логическая репликация и Avito" / Константин Евтеев,  Михаил Тюр...
Мастер-класс "Логическая репликация и Avito" / Константин Евтеев, Михаил Тюр...
 
PostgreSQL
PostgreSQLPostgreSQL
PostgreSQL
 
XtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance AlgorithmsXtraDB 5.6 and 5.7: Key Performance Algorithms
XtraDB 5.6 and 5.7: Key Performance Algorithms
 
Open Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second eraOpen Source SQL databases enters millions queries per second era
Open Source SQL databases enters millions queries per second era
 

Similar to Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Technologies)

Distributed File System
Distributed File SystemDistributed File System
Distributed File System
Ntu
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
yang
 
Oct2009
Oct2009Oct2009
Oct2009
guest81ab2b4
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
Lavanya G
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
nadine39280
 
XFS.ppt
XFS.pptXFS.ppt
XFS.ppt
DmitryIg
 
Operating system
Operating systemOperating system
Operating system
SKMohamedKasim
 
2337610
23376102337610
2337610
hantfhan
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and Hadoop
Girish L
 
storage & file strucure in dbms
storage & file strucure in dbmsstorage & file strucure in dbms
storage & file strucure in dbms
sachin2690
 
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
Ivo Lukač
 
I/O System and Case Study
I/O System and Case StudyI/O System and Case Study
I/O System and Case Study
GRamya Bharathi
 
Windows xp
Windows xpWindows xp
Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)
Vivian Vhaves
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
mona_hakmy
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
tech2click
 
009709863.pdf
009709863.pdf009709863.pdf
009709863.pdf
KalsoomTahir2
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
WANdisco Plc
 
Operating system
Operating systemOperating system
Operating system
sweetysweety8
 
The Storage Systems
The Storage Systems The Storage Systems
The Storage Systems
Dhaivat Zala
 

Similar to Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Technologies) (20)

Distributed File System
Distributed File SystemDistributed File System
Distributed File System
 
Evolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave ProbertEvolution of the Windows Kernel Architecture, by Dave Probert
Evolution of the Windows Kernel Architecture, by Dave Probert
 
Oct2009
Oct2009Oct2009
Oct2009
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
 
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
A Hudi Live Event: Shaping a Database Experience within the Data Lake with Ap...
 
XFS.ppt
XFS.pptXFS.ppt
XFS.ppt
 
Operating system
Operating systemOperating system
Operating system
 
2337610
23376102337610
2337610
 
Bigdata and Hadoop
 Bigdata and Hadoop Bigdata and Hadoop
Bigdata and Hadoop
 
storage & file strucure in dbms
storage & file strucure in dbmsstorage & file strucure in dbms
storage & file strucure in dbms
 
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
Scalable Web Solutions - Use Case: Regulatory Reform In Vietnam On eZ Publish...
 
I/O System and Case Study
I/O System and Case StudyI/O System and Case Study
I/O System and Case Study
 
Windows xp
Windows xpWindows xp
Windows xp
 
Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)Ospresentation 120112074429-phpapp02 (1)
Ospresentation 120112074429-phpapp02 (1)
 
Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2Operating System 4 1193308760782240 2
Operating System 4 1193308760782240 2
 
Operating System 4
Operating System 4Operating System 4
Operating System 4
 
009709863.pdf
009709863.pdf009709863.pdf
009709863.pdf
 
02.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 201302.28.13 WANdisco ApacheCon 2013
02.28.13 WANdisco ApacheCon 2013
 
Operating system
Operating systemOperating system
Operating system
 
The Storage Systems
The Storage Systems The Storage Systems
The Storage Systems
 

More from Ontico

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Ontico
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Ontico
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Ontico
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Ontico
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Ontico
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Ontico
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Ontico
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
Ontico
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ontico
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Ontico
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Ontico
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Ontico
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Ontico
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Ontico
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Ontico
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Ontico
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Ontico
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Ontico
 
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Ontico
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Ontico
 

More from Ontico (20)

One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
 
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Deve...
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)MySQL Replication — Advanced Features / Петр Зайцев (Percona)
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
 
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
 
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
Java и Linux — особенности эксплуатации / Алексей Рагозин (Дойче Банк)
 

Recently uploaded

A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
bijceesjournal
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
171ticu
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
camseq
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
mamamaam477
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Christina Lin
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
Las Vegas Warehouse
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
VICTOR MAESTRE RAMIREZ
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
Yasser Mahgoub
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
rpskprasana
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 

Recently uploaded (20)

A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...Comparative analysis between traditional aquaponics and reconstructed aquapon...
Comparative analysis between traditional aquaponics and reconstructed aquapon...
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样学校原版美国波士顿大学毕业证学历学位证书原版一模一样
学校原版美国波士顿大学毕业证学历学位证书原版一模一样
 
Modelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdfModelagem de um CSTR com reação endotermica.pdf
Modelagem de um CSTR com reação endotermica.pdf
 
Engine Lubrication performance System.pdf
Engine Lubrication performance System.pdfEngine Lubrication performance System.pdf
Engine Lubrication performance System.pdf
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesHarnessing WebAssembly for Real-time Stateless Streaming Pipelines
Harnessing WebAssembly for Real-time Stateless Streaming Pipelines
 
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have oneISPM 15 Heat Treated Wood Stamps and why your shipping must have one
ISPM 15 Heat Treated Wood Stamps and why your shipping must have one
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student MemberIEEE Aerospace and Electronic Systems Society as a Graduate Student Member
IEEE Aerospace and Electronic Systems Society as a Graduate Student Member
 
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
 
CSM Cloud Service Management Presentarion
CSM Cloud Service Management PresentarionCSM Cloud Service Management Presentarion
CSM Cloud Service Management Presentarion
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 

Linux Kernel Extension for Databases / Александр Крижановский (Tempesta Technologies)

  • 1. Linux Kernel Extensions for Databases Alexander Krizhanovsky Tempesta Technologies, Inc. ak@tempesta-tech.com
  • 2. Who am I? CEO & CTO at NatSys Lab & Tempesta Technologies Tempesta Technologies (Seattle, WA) ● Subsidiary of NatSys Lab. developing Tempesta FW – a first and only hybrid of HTTP accelerator and firewall for DDoS mitigation & WAF NatSys Lab (Moscow, Russia) ● Custom software development in: – high performance network traffic processing – databases
  • 3. The begin (many years ago) Database to store Instant messenger's history Plenty of data (NoSQL, 3-touple key) High performance Some consistency (no transactions) 2-3 months (quick prototype)
  • 4. Simple DBMS Disclamer: memory & I/O only, no index, no locking, no queries “DBMS” means InnoDB
  • 6. open(O_DIRECT): OS kernel bypass «In short, the whole "let's bypass the OS" notion is just fundamentally broken. It sounds simple, but it sounds simple only to an idiot who writes databases and doesn't even UNDERSTAND what an OS is meant to do.» Linus Torvalds «Re: O_DIRECT question» https://lkml.org/lkml/2007/1/11/129
  • 7. mmap(2)! Automatic page eviction Transparrent persistency I/O is managed by OS ...and ever radix tree index for free!
  • 9. A tree in the tree a.0 c.0 a.1 c.1 Page table a b c Application Tree
  • 10. mmap(2): index for free $ grep 6f0000000000 /proc/[0-9]*/maps $ DbItem *db = mmap(0x6f0000000000, 0x40000000 /* 1GB */, ...); DbItem *x = (DbItem *)(0x6f0000000000 + key);
  • 11. ...or just an array DbItem *db = mmap(0, 0x40000000 /* 1GB */, ...); DbItem *x = &db[key];
  • 12. Virtual memory isn't for free TLB cache is small (~1024 entries, i.e. 4MB) TLB cache miss is up to 4 memory transfers Spacial locality is crucial: 1 address outlier is up to 12KB …but Linux VMM coalesces memory areas Context switch of user-space processes invalidates TLB ...but threads and user/kernel context switches are cheap
  • 13. Lesson 1 Large mmap()'s are expensive Spacial locality is your friend Kernel mappings are resistant to context switches
  • 14. DBMS vs OS Stonebreaker, "Operating System Support for Database Management”, 1981 ● OS buffers (pages) force out with controlled order ● Record-oriented FS (block != record) ● Data consistency control (transactions) ● FS blocks physical contiguity ● Tree structured FS: a tree in a tree ● Scheduling, process management and IPC
  • 15. Filesystem: extents Modern Linux filesystems: BtrFS, EXT4, XFS Large contigous space is allocated at once Per-extent addressing
  • 16. Lesson 2 There are no (or small) file blocks fragmentation There are no trees inside extent fallocate(2) became my new friend
  • 17. Transactions and consistency control (InnoDB case)
  • 18. Lesson 3 Atomicity: which pages and when are written Database operates on record granularity Q: Can modern filesystem do this for us?
  • 19. Log-enhanced filesystems XFS – metadata only EXT4 – metadata and data Log writes are sequential, data updates are batched Double writes on data updates
  • 20. Log-structured filesystems BSD LFS, Nilfs2 Dirty data blocks are written to next available segment Changed metadata is also written to new location ...so poor performance on data updates Inodes aren't at fixed location → inode map Garbage collection of dead blocks (with significant overhead) Poor fragmentation on large files → slow updates
  • 21. Copy-On-Write filesystems BtrFS, ZFS Whole tree branches are COW'ed Constant root place Still fragmentation issues and heavy write loads Very poor at random writes (OLTP), better for OLAP
  • 22. Soft Updates BSD UFS2 Proper ordering to keep filesystem structure consistent (metadata) Garbage collection to gather lost data blocks Knows about filesystem metadata, not about stored data
  • 23. Lesson 4: Data consistency control Can log-enhanced data journaling FS replace doublewrite buffer? https://www.percona.com/blog/2015/06/17/update-on-the-innodb-double-write-buffer-and-ext4- transactions/ , by Yves Trudeau, Percona. NOT! ● Filesystem gurantees data block consistency, not group of blocks!
  • 24. Lesson 5 Modern Linux filesystems are unstructured
  • 25. Page eviction Typically current process reclaims memory kswapd – alloc_pages() slow path OOM active list inactive list add freeP P P P P P P P referenced
  • 26. File synchronization syscalls open(.., O_SYNC | O_DSYNC) fsync(int fd) fdatasync(int fd) msync(int *addr, size_t len,..) sync_file_range(int fd, off64_t off, off64_t nbytes,..)
  • 27. File synchronization syscalls open(.., O_SYNC | O_DSYNC) fsync(int fd) fdatasync(int fd) msync(int *addr, size_t len,..) sync_file_range(int fd, off64_t off, off64_t nbytes,..) No page subset syncrhonization write(fd, buf, 1GB) – isn't atomic against system failure Some pages can be flushed before synchronization
  • 28. Flush out advises posix_fadvise(int fd, off_t offset, off_t len, int advice) ● POSIX_FADV_DONTNEED – invalidate specified pages int invalidate_inode_page(struct page *page) { if (PageDirty(page) || PageWriteback(page)) return 0; madvise(void *addr, size_t length, int advice) MADV_DONTNEED – unmap page table entries, initializes dirty pages flushing
  • 29. Lesson 6: Linux VMM as DBMS engine? Linux VMM ● evicts dirty pages ● it doesn't know exactly whether they're still needed (DONTNEED!) ● nobody knows when the pages are synced ● checkpoint is typically full database file sync ● performance: locked scans for free/clean pages by timeouts and no-memory Don't use mmap() if you want consistency!
  • 30. Transactional filesystems: Reiser4 Hybrid TM: Journaling or Write-Anywhere (Copy-On-Write) Only small data block writes are transactional Full transaction support for large writes isn't implemented
  • 31. Transactional filesystems: BtrFS Uses log-trees, so [probaly] can be used instead of doublewrite buffer ioctl(): BTRFS_IOC_TRANS_START and BTRFS_IOC_TRANS_END
  • 32. Transactional filesystems: others Valor R.P.Spillane et al, “Enabling Transactional File Access via Lightweight Kernel Extensions”, FAST'09 ● Transactions: kernel module betweem VFS and filesystem ● New transactional syscalls (log begin, log append, log resolve, transaction sync, locking) ● patched pdflush for eviction in proper order Windows TxF ● deprecated
  • 33. Transactional operating systems TxOS D.E.Porter et al., “Operating System Transactions”, SOSP'09 ● Almost any sequence of syscalls can run in transactional context ● New transactional syscalls (sys_xbegin, sys_xend, sys_xabort) ● Alters kernel data structures by transactional headers ● Shadow-copies consistent data structures ● Properly resolves conflicts between transactional and non- transactional calls
  • 34. The same OS does the right job Failure-atomic msync() S.Park et al., “Failure-Atomic msync(): A Simple and Efficient Mechanism for Preserving the Integrity of Durable Data”, Eurosys'13. ● No voluntary page writebacks: MAP_ATOMIC for mmap() ● Jounaled writeback – msync() – REDO logging – page writebacks
  • 35. Record-oriented filesystem OpenVMS Record Management Service (RMS) ● Record formats: fixed length, variable length, stream ● Access methods: sequential, relative record number, record address, index ● sys$get() & sys$put() instead of read() and write()
  • 36. TempestaDB Is part of TempestaFW (a hybrid of firewall and Web-accelerator) In-memory database for Web-cache and firewall rules (must be fast!) Stonebreaker's “The Traditional RDBMS Wisdom is All Wrong” Accessed from kernel space (softirq!) as well as user space Can be concurrently accessed by many processes In-progress development
  • 37. Kernel database for Web-accelerator?
  • 39. TempestaDB internals Preallocates large pool of huge pages at boot time ● so full DB file mmap() is compensated by huge pages ● 2MB extent = huge page Tdbfs is used for custom mmap() and msync() for persistency mmap() => record-orientation out of the box No-steal force or no-force buffer management no need for doublewrite buffer undo and redo logging is up to application Automatic cache eviction
  • 41. TempestaDB: trx write (no-steal)
  • 48. Memory optimized Cache conscious Burst Hash Trie ● short offsets instead of pointers ● (almost) lock-free lock-free block allocator for virtually contiguous memory
  • 53. Burst Hash Trie: transactions