I can\'t believe this is butter - A Tour of btrfs

1,863 views

Published on

Btrfs ("Butter FS") is a new copy on write filesystem for Linux aimed at implementing advanced features while focusing on fault tolerance, repair and easy administration. Initially developed by Oracle, Btrfs is licensed under the GPL and open for contribution from anyone.

This tutorial will take users through some of the new features of the btrfs file system, including:

- Creating/mounting the filesystem
- Setting up mirroring/striping
- Adding/removing devices
- Rebalancing data
- Growing/shrinking volumes
- Creating snapshots/subvolumes
- Booting from snapshots

I can\'t believe this is butter - A Tour of btrfs

  1. 1. ORACLE PRODUCT LOGO Presented at I can’t believe this is butter! A tour of btrfs Avi Miller LOGO Principal Program Manager1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  2. 2. The Btrfs Filesystem •  Jointly developed by a number of companies –  Oracle, Red Hat, Fujitsu, Intel, SUSE and many others •  All data and metadata is written via copy-on-write •  CRCs maintained for all metadata and data •  Efficient writable snapshots •  Multi-device support •  Online resize and defrag •  Transparent compression •  Efficient storage for small files •  SSD optimisations and TRIM support2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  3. 3. Btrfs Progress •  Extensive performance and stability fixes •  Significant code cleanups •  Efficient free space caching across reboots •  Delayed metadata insertion and deletion •  Background scrubbing •  New LZO compression mode •  New Snappy compression mode in development •  Batched discard (via ioctl) •  Per-inode flags to control COW, compression •  Automatic file defrag option3 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  4. 4. Logging improvements •  Btrfs fsync log was rewriting some items over and over •  New code from Fujitsu bumps the metadata generation numbers inside a transaction •  Cuts down log traffic by 75% •  Will go into 3.2 merge window4 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  5. 5. Metadata Fragmentation •  Btrfs btree uses key ordering to group related items into the same metadata block •  COW tends to fragment the btree over time •  Larger block sizes lower metadata overhead and improve performance •  Larger block sizes provide inexpensive btree defragmentation •  E.g.: Intel 120GB MLC drive: –  4KB random reads: 78MB/s –  8KB random reads: 137MB/s –  16KB random reads: 186MB/s •  Code queued up for Linux 3.3 allows larger block sizes5 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  6. 6. Scrubbing •  Btrfs CRCs allow us to verify data stored on disk •  CRC errors can be corrected by reading a good copy of the block from another drive •  New scrubbing code scans the allocated data and metadata blocks (Arne Jansen) •  Any CRC errors are fixed during the scan if a second copy exists •  Will be extended to track and offline bad devices •  First Demo: btrfs filesystem creation and scrubbing6 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  7. 7. Discard/Trim •  Trim and discard notify storage that we’re done with a block •  Btrfs now supports both real-time trim and batched •  Real-time trims blocks as they are freed •  Batched trims all free space via an ioctl7 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  8. 8. Drive swapping •  GSOC project •  Current raid rebuild works via rebalance code •  Moves all extents to new locations as it rebuilds •  Drive swapping replaces an existing drive in-place •  Uses extent-allocation map to limit bytes read •  Can also restripe between RAID levels –  Pull request sent this morning!8 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  9. 9. Efficient backups •  Advanced btrfs send/receive tool in development (Jan Schmidt) •  Transmits in neutral format so corruptions are not duplicated9 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  10. 10. Embedded Systems •  Btrfs is fairly friendly to small machines •  Btrfs is not quite as friendly to small disks –  But this is getting better •  Btrfs works very well overall on low-end flash10 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  11. 11. RAID5/6 •  Initial implementation from Intel some time ago •  Merge pending completion of fsck work •  Will also add triple mirroring •  Mixed RAID modes for metadata and data are included11 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  12. 12. When Bad Things Happen to Good Data •  Beta filesystem recovery tool from Josef Bacik –  Risk-free: copies data out of the corrupt FS •  Tree root history log to recover from many hardware errors •  New fsck releases on the way to replace in place –  Chris Mason is talking on btrfs in L.A. on Saturday *cough* •  git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs- progs.git recovery-beta •  Second Demo: btrfs filesystem recovery12 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  13. 13. Billions of Files? •  Dramatic different in filesystem writeback patterns •  Sequential I/O still matters on modern SSDs •  Btrfs COW allows flexible writeback patterns •  Ext4 and XFS tend to get stuck behind their logs –  XFS has improved significantly •  Btrfs tends to produce more sequential writes and more random reads –  Writeback regression in current kernels: we’re working on it!13 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  14. 14. File Creation Benchmark Summary •  Btrfs duplicates metadata by default –  2x the writes •  Btrfs stores the file name three times •  Btrfs and XFS are CPU-bound on SSD14 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  15. 15. File Creation Throughput15 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  16. 16. IOPs16 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  17. 17. I/O Animations •  Ext4 is seeking between a large number of disk areas •  XFS is walking forward through a series of distinct areas •  Both XFS and Ext4 show heavy log activity •  Btrfs is doing sequential writes and some random reads •  http://oss.oracle.com/~mason/seekwatcher/17 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  18. 18. Root filesystem snapshots yum-plugin-fs-snapshot •  Yum plugin to trigger a snapshot for all upgrades/installs •  Can be used as an instant rollback mechanism •  Currently supports btrfs snapshots •  Requires btrfs root •  Demo: convert / to btrfs and yum-plugin-fs-snapshot18 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  19. 19. Thank You! •  Avi Miller: avi.miller@oracle.com •  http://btrfs.wiki.kernel.org •  Oracle Linux 6.2 –  http://oracle.com/linux –  http://edelivery.oracle.com/linux •  UEK2 Beta –  http://public-yum.oracle.com/beta/ –  http://oss.oracle.com/git/linux-2.6-unbreakable-beta.git/19 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  20. 20. Q&A20 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  21. 21. 21 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7
  22. 22. 22 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Insert Informaion Protection Policy Classification from Slide 7

×