SlideShare a Scribd company logo
1 of 20
Download to read offline
ZFS (on Linux)
use your disks in best possible ways

Dobrica Pavlinušić
http://blog.rot13.org
CUC sys.track 2013-10-21
What are we going to talk about?
●
●
●
●
●
●
●
●
●

ZFS history
Disks or SSD and for what?
Installation
Create pool, filesystem and/or block device
ARC, L2ARC, ZIL
snapshots, send/receive
scrub, disk reliability (smart)
tuning zfs
downsides
ZFS history
2001 – Development of ZFS started with two engineers at Sun Microsystems.
2005 – Source code was released as part of OpenSolaris.
2006 – Development of FUSE port for Linux started.
2007 – Apple started porting ZFS to Mac OS X.
2008 – A port to FreeBSD was released as part of FreeBSD 7.0.
2008 – Development of a native Linux port started.
2009 – Apple's ZFS project closed. The MacZFS project continued to develop the code.
2010 – OpenSolaris was discontinued, the last release was forked. Further development of ZFS on
Solaris was no longer open source.
2010 – illumos was founded as the truly open source successor to OpenSolaris. Development of ZFS
continued in the open. Ports of ZFS to other platforms continued porting upstream changes from
illumos.
2012 – Feature flags were introduced to replace legacy on-disk version numbers, enabling easier
distributed evolution of the ZFS on-disk format to support new features.
2013 – Alongside the stable version of MacZFS, ZFS-OSX used ZFS on Linux as a basis for the next
generation of MacZFS.
2013 – The first stable release of ZFS on Linux.
2013 – Official announcement of the OpenZFS project.
Terminology
● COW - copy on write
○ doesn’t modify data in-place on disk

●
●
●
●

checksums - protect data integrity
vdev - disks with redundancy
pool - collection of vdevs with fs or block dev
slog - sync log to improve COW
performance
● arc - RAM based cache (ECC memory
recommended)
● l2arc - disk/SSD cache for arc spill over
Capacitors!
Disks or SSD?
● ZFS is designed to use rotating platters to
store data and RAM/SSD for speedup
● Use JBOD disks and non-RAID controllers!
● Use disks which have fast error reporting
(older WD green with firmware upgrade)
● SSD with capacitors required for SLOG or
VDEVs if you care about your data!
● ZoL doesn’t use TRIM (sigh!) - overprovision
SSD for durability (80% capacity for vdev,
10% for SLOG)
Which kernel?
● x86_64 (for i386 use zfs-fuse ;-), 32-bit ARM
support under development (for NAS boxes)
● 3.2.0 (wheezy) or later
● Voluntary Kernel Preemption
arh-hw:~# grep CONFIG_PREEMPT_VOLUNTARY /boot/config-3.2.0-4-amd64
CONFIG_PREEMPT_VOLUNTARY=y

● http://zfsonlinux.org/debian.html
○ DKMS, in Debian experimental

● ZFS CDDL incompatible with GPLv2 - it will
be out-of-tree forever!
ZFS redundancy options
● vdev
○
○
○
○

mirror
RAIDZ1
RAIDZ2
RAIDZ3

1

1

1

1

R

2

2

2

3

3

3

4

4

4

R

5

5

6

6

7

7

8
● each pool can have multiple
R
R
vdevs, ZFS will spread writes
over them
● mirrors or 2^n data+redundancy disks
in single vdev for best performance

8
9
10
11
12
R
R
R
Create pool
zpool create -o ashift=12 tank
([raidz1-3] /dev/disk/by-id/…) ...
● ashift=12 (align to 4k boundary)
● You WILL NOT be able to shrink pool!
● use /dev/disk/by-id/ to create pool!
zpool history [-il] see what your pool did!
use sparse files to create degraded pool
dd if=/dev/zero of=/zfs1 bs=1 count=1 seek=512G
zfs offline <pool> /zfs1
Create file system or volume
Turn compression on!
zfs set compression=lzo4 <pool|fs>
zfs create <pool>/fs
zfs create -V 512M <pool>/block/disk1
zfs set primarycache=none <pool>
● Don’t use dedup, even devs don’t like it :-)
○ ~500 bytes of memory per block, CPU overhead due
to hashing, non-linear access to data

● You can have zfs pool inside zfs volume!
ARC, L2ARC - read cache
ARC uses ~50% of RAM available!
cat /proc/spl/kstat/zfs/arcstats or arcstat.pl

L2ARC - Any SSD is good enough for it, you
might use /dev/zram for it to get compression!
khugepaged eats 100% CPU?
echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
echo never > /sys/kernel/mm/transparent_hugepage/defrag

Possible to use zram for L2ARC to get
compression!
L2ARC headers must fit in ARC (RAM)!
ZIL - (sync)log - NOT write cache!
● put log on separate device!
● ZFS assumes it’s fastest storage (battery
backed RAM, SLC SDD, mirror it!)
● logbias = throughput
● sync = always if you have slog device!
● zil_slog_limit - log/vdev target split
○ 1 Mb => idea is to keep slog always fast

● does NOT play nice with iSCSI write-back
snapshots
● Copy on write semantics (LVM isn’t!)
○ Point in time view of filesystem

● Can be cloned to create writable copy
○ and promoted to master copy -> rollback!

zfs create filesystem@snapshot
zfs rollback filesystem@snapshot2
zfs list -t snapshots filesystem
zfs set snapdir=visible filesystem
zfs clone snapshot filesystem|volume
zfs diff snapshot snapshot|filesystem
zfs send/receive
● Snapshot filesystem or full pool
● Transfer (incremental) snapshot to another
pool -> disaster recovery
○ this will uncompress your pool, have enough CPU!

● LVM snapshots, rsync and shell script from
hell or snapshot manager
○
○
○
○

http://sysadmin-cookbook.rot13.org/#zfs
https://github.com/bassu/bzman
https://github.com/briner/dolly
https://code.google.com/p/zxfer/

● http://burp.grke.org/ - librsync, VSS (Volume
Shadow Copy Service) on Windows
Disk reliability
● Disks will fail! That’s why we are using ZFS (or some
RAID) in the first place!
● zpool scrub pool
○ at least weekly, that’s what checksums are for!
○ if disk disappear during scrub, and comes back after
reboot it will automatically resliver data
http://en.wikipedia.org/wiki/ZFS#Error_rates_in_hard_disks
http://en.wikipedia.org/wiki/ZFS#Silent_data_corruption
● tell kernel that device died:
echo 1 > /sys/block/<sdX>/device/delete
(not so)smart - better than nothing
● Lies, damn lies and smart counters! http://research.
google.com/pubs/pub32774.html
● smartctl -t long /dev/sd? weekly!
http://sysadmin-cookbook.rot13.org/#smart_test_relocate_pl

● Log output of all drives (and controllers!) and store it in
git for easy git log -p http://sysadmin-cookbook.rot13.
org/#dump_smart_sh

● Look out for write counters on SSD to early detect
wearout
● Check error recovery with smartctl -l scterc /dev/sdx
○ newer disks disable that in firmware (sigh!)
● Relocate known bad sectors
○

http://sysadmin-cookbook.rot13.org/#smart_test_relocate_pl
IOPS - ZFS tuning - zpool iostat -v
● mirrors
○ always faster than any RAID (1-disk perf!)
○ read-only load which doesn’t fit in ARC

● RDBMS - tune recordsize, logbias,
primarycache=metadata
You shouldn't need to tune zfs, but...
cat /etc/modprobe.d/zfs.conf
options zfs zfs_nocacheflush=1 zfs_arc_max=154618822656 zfs_arc_min=1073741824

meta-data heavy workloads (rsync)
●
●

increase /sys/module/zfs/parameters/zfs_arc_meta_{limit,prune}
zfs set primarycache=metadata <fileystem>

http://www.nanowolk.nl/ext/2013_02_zfs_sequential_read_write_performance/
ZFS downsides
● out-of-kernel due to CDDL
○ DKMS and distribution supports mitigate this
● performance not main goal
○ xfs is still fastest Linux fs, run it on RAID!
● not ready for SSD pools without TRIM
● doesn’t support shrinking of pool
● you can’t remove dedup metadata
● doesn’t have rebalance (as btrfs does)
○ zfs send/receive as workaround
● storage appliance model due to memory usage vs
mixed workload servers
○ doesn’t support O_DIRECT -> double buffering
References
● OpenZFS web site http://open-zfs.org/
● zfs-discuss mailing list http://zfsonlinux.org/lists.html
● ZFS on Linux / OpenZFS presentation http://events.
linuxfoundation.org/sites/events/files/slides/OpenZFS%
20-%20LinuxCon_0.pdf
● How disks fail http://blog.backblaze.
com/2013/11/12/how-long-do-disk-drives-last/
● Understanding the Robustness of SSDs under Power
Fault https://www.usenix.
org/system/files/conference/fast13/fast13-final80.pdf
● zxfer http://forums.freebsd.org/showthread.php?t=24113
http://bit.ly/cuc2013-zfs

More Related Content

Viewers also liked

Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralovedamovsky
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsJames Bromberger
 
Pubic Diplomacy and Web 2.0
Pubic Diplomacy and Web 2.0Pubic Diplomacy and Web 2.0
Pubic Diplomacy and Web 2.0stefan.geens
 
Creating And Customizing Your Blackboard Class
Creating And Customizing Your Blackboard ClassCreating And Customizing Your Blackboard Class
Creating And Customizing Your Blackboard Class1LifelongLearner
 
The Constellation Query Language
The Constellation Query LanguageThe Constellation Query Language
The Constellation Query LanguageClifford Heath
 
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...Dobrica Pavlinušić
 
Social Media & Web 2.0 Services for Choirs
Social Media & Web 2.0 Services for ChoirsSocial Media & Web 2.0 Services for Choirs
Social Media & Web 2.0 Services for ChoirsDr Stylianos Mystakidis
 
Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Dobrica Pavlinušić
 
This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?Dobrica Pavlinušić
 
Post-relational databases: What's wrong with web development? v3
Post-relational databases: What's wrong with web development? v3Post-relational databases: What's wrong with web development? v3
Post-relational databases: What's wrong with web development? v3Dobrica Pavlinušić
 
Εκπαίδευση Web 2.0 στο Δημόσιο
Εκπαίδευση Web 2.0 στο ΔημόσιοΕκπαίδευση Web 2.0 στο Δημόσιο
Εκπαίδευση Web 2.0 στο ΔημόσιοDr Stylianos Mystakidis
 
Spectacular Subcultures: From luz to hacktivism
Spectacular Subcultures: From luz to hacktivismSpectacular Subcultures: From luz to hacktivism
Spectacular Subcultures: From luz to hacktivismPaleFire
 

Viewers also liked (19)

Dockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec KraloveDockerize the World - presentation from Hradec Kralove
Dockerize the World - presentation from Hradec Kralove
 
Debian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIsDebian Cloud - building the Debian AMIs
Debian Cloud - building the Debian AMIs
 
Pubic Diplomacy and Web 2.0
Pubic Diplomacy and Web 2.0Pubic Diplomacy and Web 2.0
Pubic Diplomacy and Web 2.0
 
Creating And Customizing Your Blackboard Class
Creating And Customizing Your Blackboard ClassCreating And Customizing Your Blackboard Class
Creating And Customizing Your Blackboard Class
 
The Constellation Query Language
The Constellation Query LanguageThe Constellation Query Language
The Constellation Query Language
 
What Is Powerpoint
What Is PowerpointWhat Is Powerpoint
What Is Powerpoint
 
Web scale monitoring
Web scale monitoringWeb scale monitoring
Web scale monitoring
 
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
Slobodni softver za digitalne arhive: EPrints u Knjižnici Filozofskog fakulte...
 
Social Media & Web 2.0 Services for Choirs
Social Media & Web 2.0 Services for ChoirsSocial Media & Web 2.0 Services for Choirs
Social Media & Web 2.0 Services for Choirs
 
Ppt Demo Slideshare
Ppt Demo SlidesharePpt Demo Slideshare
Ppt Demo Slideshare
 
Test
TestTest
Test
 
Open Workshop on Information Literacy
Open Workshop on Information LiteracyOpen Workshop on Information Literacy
Open Workshop on Information Literacy
 
Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?Mojo Facets – so, you have data and browser?
Mojo Facets – so, you have data and browser?
 
This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?This is an interesting metadata source. Can I import it into Koha?
This is an interesting metadata source. Can I import it into Koha?
 
Intro to Haml
Intro to HamlIntro to Haml
Intro to Haml
 
Post-relational databases: What's wrong with web development? v3
Post-relational databases: What's wrong with web development? v3Post-relational databases: What's wrong with web development? v3
Post-relational databases: What's wrong with web development? v3
 
Εκπαίδευση Web 2.0 στο Δημόσιο
Εκπαίδευση Web 2.0 στο ΔημόσιοΕκπαίδευση Web 2.0 στο Δημόσιο
Εκπαίδευση Web 2.0 στο Δημόσιο
 
Spectacular Subcultures: From luz to hacktivism
Spectacular Subcultures: From luz to hacktivismSpectacular Subcultures: From luz to hacktivism
Spectacular Subcultures: From luz to hacktivism
 
Euronem Zambia 2008
Euronem Zambia 2008Euronem Zambia 2008
Euronem Zambia 2008
 

More from Dobrica Pavlinušić

Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernelsMainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernelsDobrica Pavlinušić
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Dobrica Pavlinušić
 
Let's hack cheap hardware 2016 edition
Let's hack cheap hardware 2016 editionLet's hack cheap hardware 2016 edition
Let's hack cheap hardware 2016 editionDobrica Pavlinušić
 
Raspberry Pi - best friend for all your GPIO needs
Raspberry Pi - best friend for all your GPIO needsRaspberry Pi - best friend for all your GPIO needs
Raspberry Pi - best friend for all your GPIO needsDobrica Pavlinušić
 
Cheap, good, hackable tools from China: AVR component tester
Cheap, good, hackable tools from China: AVR component testerCheap, good, hackable tools from China: AVR component tester
Cheap, good, hackable tools from China: AVR component testerDobrica Pavlinušić
 
FSEC 2014 - I can haz your board with JTAG
FSEC 2014 - I can haz your board with JTAGFSEC 2014 - I can haz your board with JTAG
FSEC 2014 - I can haz your board with JTAGDobrica Pavlinušić
 
Hardware hacking for software people
Hardware hacking for software peopleHardware hacking for software people
Hardware hacking for software peopleDobrica Pavlinušić
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloudDobrica Pavlinušić
 
KohaCon11: Integrating Koha with RFID system
KohaCon11: Integrating Koha with RFID systemKohaCon11: Integrating Koha with RFID system
KohaCon11: Integrating Koha with RFID systemDobrica Pavlinušić
 
Free Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG libraryFree Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG libraryDobrica Pavlinušić
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Dobrica Pavlinušić
 
Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?Dobrica Pavlinušić
 
Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?Dobrica Pavlinušić
 
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAPVirtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAPDobrica Pavlinušić
 

More from Dobrica Pavlinušić (20)

Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernelsMainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
Mainline kernel on ARM Tegra20 devices that are left behind on 2.6 kernels
 
Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !Linux+sensor+device-tree+shell=IoT !
Linux+sensor+device-tree+shell=IoT !
 
bro - what is in my network?
bro - what is in my network?bro - what is in my network?
bro - what is in my network?
 
Let's hack cheap hardware 2016 edition
Let's hack cheap hardware 2016 editionLet's hack cheap hardware 2016 edition
Let's hack cheap hardware 2016 edition
 
Raspberry Pi - best friend for all your GPIO needs
Raspberry Pi - best friend for all your GPIO needsRaspberry Pi - best friend for all your GPIO needs
Raspberry Pi - best friend for all your GPIO needs
 
Cheap, good, hackable tools from China: AVR component tester
Cheap, good, hackable tools from China: AVR component testerCheap, good, hackable tools from China: AVR component tester
Cheap, good, hackable tools from China: AVR component tester
 
Ganeti - build your own cloud
Ganeti - build your own cloudGaneti - build your own cloud
Ganeti - build your own cloud
 
FSEC 2014 - I can haz your board with JTAG
FSEC 2014 - I can haz your board with JTAGFSEC 2014 - I can haz your board with JTAG
FSEC 2014 - I can haz your board with JTAG
 
Hardware hacking for software people
Hardware hacking for software peopleHardware hacking for software people
Hardware hacking for software people
 
Gnu linux on arm for $50 - $100
Gnu linux on arm for $50 - $100Gnu linux on arm for $50 - $100
Gnu linux on arm for $50 - $100
 
Security of Linux containers in the cloud
Security of Linux containers in the cloudSecurity of Linux containers in the cloud
Security of Linux containers in the cloud
 
SysAdmin cookbook
SysAdmin cookbookSysAdmin cookbook
SysAdmin cookbook
 
Printing on Linux, simple right?
Printing on Linux, simple right?Printing on Linux, simple right?
Printing on Linux, simple right?
 
KohaCon11: Integrating Koha with RFID system
KohaCon11: Integrating Koha with RFID systemKohaCon11: Integrating Koha with RFID system
KohaCon11: Integrating Koha with RFID system
 
Deploy your own P2P network
Deploy your own P2P networkDeploy your own P2P network
Deploy your own P2P network
 
Free Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG libraryFree Libre Open Source Software at FFZG library
Free Libre Open Source Software at FFZG library
 
Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)Virtualization which isn't: LXC (Linux Containers)
Virtualization which isn't: LXC (Linux Containers)
 
Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?Post-relational databases: What's wrong with web development?
Post-relational databases: What's wrong with web development?
 
Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?Kako napraviti Google od zgrade sa računalima?
Kako napraviti Google od zgrade sa računalima?
 
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAPVirtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
Virtual LDAP - kako natjerati strgane aplikacije da koriste LDAP
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

ZFS (on Linux) - use your disks in best possible ways

  • 1. ZFS (on Linux) use your disks in best possible ways Dobrica Pavlinušić http://blog.rot13.org CUC sys.track 2013-10-21
  • 2. What are we going to talk about? ● ● ● ● ● ● ● ● ● ZFS history Disks or SSD and for what? Installation Create pool, filesystem and/or block device ARC, L2ARC, ZIL snapshots, send/receive scrub, disk reliability (smart) tuning zfs downsides
  • 3. ZFS history 2001 – Development of ZFS started with two engineers at Sun Microsystems. 2005 – Source code was released as part of OpenSolaris. 2006 – Development of FUSE port for Linux started. 2007 – Apple started porting ZFS to Mac OS X. 2008 – A port to FreeBSD was released as part of FreeBSD 7.0. 2008 – Development of a native Linux port started. 2009 – Apple's ZFS project closed. The MacZFS project continued to develop the code. 2010 – OpenSolaris was discontinued, the last release was forked. Further development of ZFS on Solaris was no longer open source. 2010 – illumos was founded as the truly open source successor to OpenSolaris. Development of ZFS continued in the open. Ports of ZFS to other platforms continued porting upstream changes from illumos. 2012 – Feature flags were introduced to replace legacy on-disk version numbers, enabling easier distributed evolution of the ZFS on-disk format to support new features. 2013 – Alongside the stable version of MacZFS, ZFS-OSX used ZFS on Linux as a basis for the next generation of MacZFS. 2013 – The first stable release of ZFS on Linux. 2013 – Official announcement of the OpenZFS project.
  • 4. Terminology ● COW - copy on write ○ doesn’t modify data in-place on disk ● ● ● ● checksums - protect data integrity vdev - disks with redundancy pool - collection of vdevs with fs or block dev slog - sync log to improve COW performance ● arc - RAM based cache (ECC memory recommended) ● l2arc - disk/SSD cache for arc spill over
  • 6. Disks or SSD? ● ZFS is designed to use rotating platters to store data and RAM/SSD for speedup ● Use JBOD disks and non-RAID controllers! ● Use disks which have fast error reporting (older WD green with firmware upgrade) ● SSD with capacitors required for SLOG or VDEVs if you care about your data! ● ZoL doesn’t use TRIM (sigh!) - overprovision SSD for durability (80% capacity for vdev, 10% for SLOG)
  • 7. Which kernel? ● x86_64 (for i386 use zfs-fuse ;-), 32-bit ARM support under development (for NAS boxes) ● 3.2.0 (wheezy) or later ● Voluntary Kernel Preemption arh-hw:~# grep CONFIG_PREEMPT_VOLUNTARY /boot/config-3.2.0-4-amd64 CONFIG_PREEMPT_VOLUNTARY=y ● http://zfsonlinux.org/debian.html ○ DKMS, in Debian experimental ● ZFS CDDL incompatible with GPLv2 - it will be out-of-tree forever!
  • 8. ZFS redundancy options ● vdev ○ ○ ○ ○ mirror RAIDZ1 RAIDZ2 RAIDZ3 1 1 1 1 R 2 2 2 3 3 3 4 4 4 R 5 5 6 6 7 7 8 ● each pool can have multiple R R vdevs, ZFS will spread writes over them ● mirrors or 2^n data+redundancy disks in single vdev for best performance 8 9 10 11 12 R R R
  • 9. Create pool zpool create -o ashift=12 tank ([raidz1-3] /dev/disk/by-id/…) ... ● ashift=12 (align to 4k boundary) ● You WILL NOT be able to shrink pool! ● use /dev/disk/by-id/ to create pool! zpool history [-il] see what your pool did! use sparse files to create degraded pool dd if=/dev/zero of=/zfs1 bs=1 count=1 seek=512G zfs offline <pool> /zfs1
  • 10. Create file system or volume Turn compression on! zfs set compression=lzo4 <pool|fs> zfs create <pool>/fs zfs create -V 512M <pool>/block/disk1 zfs set primarycache=none <pool> ● Don’t use dedup, even devs don’t like it :-) ○ ~500 bytes of memory per block, CPU overhead due to hashing, non-linear access to data ● You can have zfs pool inside zfs volume!
  • 11. ARC, L2ARC - read cache ARC uses ~50% of RAM available! cat /proc/spl/kstat/zfs/arcstats or arcstat.pl L2ARC - Any SSD is good enough for it, you might use /dev/zram for it to get compression! khugepaged eats 100% CPU? echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo never > /sys/kernel/mm/transparent_hugepage/defrag Possible to use zram for L2ARC to get compression! L2ARC headers must fit in ARC (RAM)!
  • 12. ZIL - (sync)log - NOT write cache! ● put log on separate device! ● ZFS assumes it’s fastest storage (battery backed RAM, SLC SDD, mirror it!) ● logbias = throughput ● sync = always if you have slog device! ● zil_slog_limit - log/vdev target split ○ 1 Mb => idea is to keep slog always fast ● does NOT play nice with iSCSI write-back
  • 13. snapshots ● Copy on write semantics (LVM isn’t!) ○ Point in time view of filesystem ● Can be cloned to create writable copy ○ and promoted to master copy -> rollback! zfs create filesystem@snapshot zfs rollback filesystem@snapshot2 zfs list -t snapshots filesystem zfs set snapdir=visible filesystem zfs clone snapshot filesystem|volume zfs diff snapshot snapshot|filesystem
  • 14. zfs send/receive ● Snapshot filesystem or full pool ● Transfer (incremental) snapshot to another pool -> disaster recovery ○ this will uncompress your pool, have enough CPU! ● LVM snapshots, rsync and shell script from hell or snapshot manager ○ ○ ○ ○ http://sysadmin-cookbook.rot13.org/#zfs https://github.com/bassu/bzman https://github.com/briner/dolly https://code.google.com/p/zxfer/ ● http://burp.grke.org/ - librsync, VSS (Volume Shadow Copy Service) on Windows
  • 15. Disk reliability ● Disks will fail! That’s why we are using ZFS (or some RAID) in the first place! ● zpool scrub pool ○ at least weekly, that’s what checksums are for! ○ if disk disappear during scrub, and comes back after reboot it will automatically resliver data http://en.wikipedia.org/wiki/ZFS#Error_rates_in_hard_disks http://en.wikipedia.org/wiki/ZFS#Silent_data_corruption ● tell kernel that device died: echo 1 > /sys/block/<sdX>/device/delete
  • 16. (not so)smart - better than nothing ● Lies, damn lies and smart counters! http://research. google.com/pubs/pub32774.html ● smartctl -t long /dev/sd? weekly! http://sysadmin-cookbook.rot13.org/#smart_test_relocate_pl ● Log output of all drives (and controllers!) and store it in git for easy git log -p http://sysadmin-cookbook.rot13. org/#dump_smart_sh ● Look out for write counters on SSD to early detect wearout ● Check error recovery with smartctl -l scterc /dev/sdx ○ newer disks disable that in firmware (sigh!) ● Relocate known bad sectors ○ http://sysadmin-cookbook.rot13.org/#smart_test_relocate_pl
  • 17. IOPS - ZFS tuning - zpool iostat -v ● mirrors ○ always faster than any RAID (1-disk perf!) ○ read-only load which doesn’t fit in ARC ● RDBMS - tune recordsize, logbias, primarycache=metadata You shouldn't need to tune zfs, but... cat /etc/modprobe.d/zfs.conf options zfs zfs_nocacheflush=1 zfs_arc_max=154618822656 zfs_arc_min=1073741824 meta-data heavy workloads (rsync) ● ● increase /sys/module/zfs/parameters/zfs_arc_meta_{limit,prune} zfs set primarycache=metadata <fileystem> http://www.nanowolk.nl/ext/2013_02_zfs_sequential_read_write_performance/
  • 18. ZFS downsides ● out-of-kernel due to CDDL ○ DKMS and distribution supports mitigate this ● performance not main goal ○ xfs is still fastest Linux fs, run it on RAID! ● not ready for SSD pools without TRIM ● doesn’t support shrinking of pool ● you can’t remove dedup metadata ● doesn’t have rebalance (as btrfs does) ○ zfs send/receive as workaround ● storage appliance model due to memory usage vs mixed workload servers ○ doesn’t support O_DIRECT -> double buffering
  • 19. References ● OpenZFS web site http://open-zfs.org/ ● zfs-discuss mailing list http://zfsonlinux.org/lists.html ● ZFS on Linux / OpenZFS presentation http://events. linuxfoundation.org/sites/events/files/slides/OpenZFS% 20-%20LinuxCon_0.pdf ● How disks fail http://blog.backblaze. com/2013/11/12/how-long-do-disk-drives-last/ ● Understanding the Robustness of SSDs under Power Fault https://www.usenix. org/system/files/conference/fast13/fast13-final80.pdf ● zxfer http://forums.freebsd.org/showthread.php?t=24113