Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Using ZFS file system with MySQL


Published on

This slide was presented at Mydbops Database Meetup 4 by Bajranj ( Zenefits ). ZFS as a filesystem has good features that can enhance MySQL by compression, Quick Snapshots and others.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Using ZFS file system with MySQL

  1. 1. MySQL on ZFS Bajrang Panigrahi August, 2019
  2. 2. ZFS Principles ● Pooled storage ● Completely eliminates the antique notion of volumes ● Does for storage what VM did for memory ● Transactional object system ● Always consistent on disk – no fsck, ever ● Provable end-to-end data integrity ● Detects and corrects silent data corruption ● Simple administration ● Concisely express your intent
  3. 3. FS/Volume Model vs Pooled Storage Traditional Volumes ● Abstraction: virtual disk ● Partition/volume for each FS ● Grow/shrink by hand ● Each FS has limited bandwidth ● Storage is fragmented, stranded ZFS Pooled Storage ● Abstraction: malloc/free ● No partitions to manage ● Grow/shrink automatically ● All bandwidth always available ● All storage in the pool is shared Storage PoolVolume FS Volume FS Volume FS ZFS ZFS ZFS
  4. 4. NFS SMB Local files VFS Filesystem (e.g. UFS, ext3) Volume Manager (e.g. LVM, SVM) NFS SMB Local files VFS DMU (Data Management Unit) SPA (Storage Pool Allocator) iSCSI FC SCSI target ZPL (ZFS POSIX Layer) ZVOL (ZFS Volume) Block interface ZFS Block allocate+write, read, free Atomic transactions on objects File interface
  5. 5. Benefits of ZFS ● Copy-on-Write (CoW) File System. ● Throttles writes. ● Data integrity and resiliency. ● Self Healing of Data on ZFS. ● Block size matching.(Allows Variable Block size) ● Snapshots & Clones ● Active development community
  6. 6. Copy-On-Write Transactions 1. Initial block tree 2. COW some blocks 4. Rewrite uberblock (atomic)3. COW indirect blocks
  7. 7. Block Pointer Structure in ZFS First copy of data When the block was written Checksum of data this block points to padding physical birth txg logical birth txg fill count 256-bit checksum BDX lvl type PSIZEcomp LSIZE offset1 offset2 offset3 vdev1 vdev2 vdev3 ASIZE ASIZE ASIZE cksum Second copy of data (for metadata) Third copy of data (pool-wide metadata)
  8. 8. END-to-END Data Integrity in ZFS ZFS validates the entire I/O path ✓ Bit rot ✓ Phantom writes ✓ Misdirected reads and writes ✓ DMA parity errors ✓ Driver bugs ✓ Accidental overwrite Disk checksum only validates media ✓ Bit rot ✓ Phantom writes ✓ Misdirected reads and writes ✓ DMA parity errors ✓ Driver bugs ✓ Accidental overwrite Disk Block Checksums ● Checksum stored with data block ● Any self-consistent block will pass ● Can't detect stray writes ● Inherent FS/volume interface limitation Data Data Data Checksum Data Checksum ZFS Data Authentication ● Checksum stored in parent block pointer ● Fault isolation between data and checksum ● Checksum hierarchy forms self-validating Merkle tree Address Checksum Checksum Address • • • Address Checksum Checksum Address
  9. 9. Self Healing of Data in ZFS Application ZFS mirror Application ZFS mirror Application ZFS mirror 1. Application issues a read. Checksum reveals that the block is corrupt on disk. 2. ZFS tries the next disk. Checksum indicates that the block is good. 3. ZFS returns good data to the application and repairs the damaged block.
  10. 10. Initial Use case at Zenefits We use AWS snapshot to rebuild a new DB for dev/ops; the first access to the data is slow because “New volumes created from existing EBS snapshots load lazily in the background” Multiple DB clusters data needed for generating the DB for dev/ops -- We use Multi-Source Replication.
  11. 11. Alternatives Multiple EBS Volume attached as Slave MySQL, and rotate on fresh snapshot request Con: Additional EBS volumes, will still have the problem of initial load of queries (Taking snap at every 15 mins) Use Percona Xtrabackup as an Incremental Data Copy to the Spoof Instance. Con: Requires an additional EBS volume and MySQL Service needs to be shutdown during the entire period the backup is restored. Use ZFS file system as a mechanism of taking a snapshot at the file system level
  12. 12. Setting up ZFS on MySQL ● Create a pool name “ZP1” zpool create -O compression=gzip -f -o autoexpand=on "zp1" mirror "/dev/xvdm" "/dev/xvdn" -o ashift=12 ● Create a new filesystem named “data2” in POOL “ZP1” #Create the ZFS Filesystems - name: Create a new file system called data2 in pool zp1 zfs: name: zp1/mysql state: present extra_zfs_properties: setuid: off compression: gzip recordsize: 128k atime: off primarycache: metadata
  13. 13. Setting up ZFS on MySQL ● Create the required datasets to run MySQL zp1/mysql 1.19T 4.92T 100K /zp1/mysql zp1/mysql/data 1.18T 4.92T 1.17T /data2/data zp1/mysql/logs 9.97G 4.92T 8.84G /data2/logs zp1/mysql/tmp 216K 4.92T 152K /data2/tmp ● Configurations on MySQL Innodb_doublewrite = 0 Innodb_checksum_algorithm = none Innodb_use_native_aio = 0
  14. 14. ZPOOL Status ● ZPOOL status zpool status pool: zp1 state: ONLINE scan: none requested config: NAME STATE READ WRITE CKSUM zp1 ONLINE 0 0 0 mirror-0 ONLINE 0 0 0 xvdm ONLINE 0 0 0 xvdn ONLINE 0 0 0 errors: No known data errors
  15. 15. ZFS List NAME USED AVAIL REFER MOUNTPOINT zp1 1.20T 4.92T 104K /zp1 zp1/mslave03 1.11G 4.92T 100K /zp1/mslave03 zp1/mslave03/data 1.11G 4.92T 1.17T /data3/data zp1/mslave03/logs 308K 4.92T 340K /data3/logs zp1/mslave03/tmp 96K 4.92T 128K /data3/tmp zp1/mslave04 686M 4.92T 100K /zp1/mslave04 zp1/mslave04/data 686M 4.92T 1.17T /data4/data zp1/mslave04/logs 300K 4.92T 332K /data4/logs zp1/mslave04/tmp 96K 4.92T 128K /data4/tmp zp1/mysql 1.19T 4.92T 100K /zp1/mysql zp1/mysql/data 1.18T 4.92T 1.17T /data2/data zp1/mysql/logs 10.2G 4.92T 8.78G /data2/logs zp1/mysql/tmp 216K 4.92T 152K /data2/tmp
  16. 16. Incremental Send and Receive zfs send zp1/mysql/data@monday | ssh host zfs receive zp1/recvd/fs zfs send -i @monday zp1/mysql/data@tuesday | ssh .. “FromSnap” “ToSnap”
  17. 17. ZFS - Design - Local Clones
  18. 18. ZFS - Design - Remote Clones
  19. 19. ZFS - usage metrics KEY Old_ENV New_ENV Performance - Page Load 2-3 minutes ~15 secs Faster Data Snapshots 15 minutes ~2 - 4 secs Cloning / EBS attachment > 20 minutes ~ 3 - 5 secs Costs: Higher* Lower Monitoring / Alerting only Slack messages Jenkins + PagerDuty
  20. 20. ZFS - Performance Benchmarking
  21. 21. ZFS - Challenges ● Fragmentation. ● Complex to tweak and tune. ● Requires extra free space or pool performance can suffer.
  22. 22. Further ... ● High Read throughput (>= 83.88 million) ● MySQL / sec upto 76.2 K ● InnoDB file I/O write upto 150K ● Enterprise-grade transactional file system. ● Automatically reconstructs data after detecting an error. ● Multiple physical media devices into one logical volume using ZPOOL. ● Snapshot and Mirroring capabilities, and can quickly compress data. (LZ4) Enjoy a user-friendly, high-volume storage system.
  23. 23. Thank you.