Divide and conquer – Shared diskcluster file systems shipped with the              Linux kernel             Udo Seidel
Shared file systems●   Multiple server access same data●   Different approaches    ●   Network based, e.g. NFS, CIFS    ● ...
History●   GFS(2)    ●   First version in the mid 90’s    ●   Started on IRIX, later ported to Linux    ●   Commercial bac...
Features/Challenges/More●   As much as possible similar to local file    systems    ●   Internal setup    ●   management● ...
Framework●   Bridges the gap between one-node and cluster●   3 main components    ●   Cluster-ware    ●   Locking    ●   F...
Framework GFS2 (I)●   Cluster-ware of general purpose    ●   More flexible    ●   More options/functions    ●   More compl...
Framework GFS2 (II)# cat /etc/cluster/cluster.conf<?xml version="1.0" ?><cluster config_version="3" name="gfs2"><fence_dae...
Framework OCFS2 (I)●   Cluster-ware just for OCFS2    ●   Less flexible    ●   Less options/functions    ●   Less complexi...
Framework OCFS2 (II)# cat /etc/ocfs2/cluster.confnode:      ip_port = 7777      ip_address = 192.168.0.1      number = 0  ...
Locking●   Distributed Lock Manager (DLM)●   Based on VMS-DLM●   Lock modes    ●   Exclusive Lock (EX)    ●   Protected Re...
Locking - Compatibility                 Existing LockRequested Lock   NL       CR     CW    PR    PW    EXNL              ...
Fencing●   Separation of host and storage    ●   Power Fencing        –   Power switch, e.g. APC        –   Server side, e...
Fencing - GFS2●   Both fencing methods●   Part of cluster configuration●   Cascading possible
Fencing - OCFS2●   Only power fencing    ●   Only self fencing
GFS2 – Internals (I)●   Superblock    ●   Starts at block 128    ●   Expected data + cluster information    ●   Pointers t...
GFS2 – Internals (II)●   Master directory    ●   Contains meta-data, e.g journal index, quota, ...    ●   Not visible for ...
GFS2 – Internals (III)●   Inode/Dinode    ●   Usual information, e.g. owner, mode, time stamp    ●   Pointers to blocks: e...
GFS2 – Internals (IV)●   Meta files    ●   jindex directory containing the journals        –   journalX    ●   rindex Reso...
GFS2 – what else●   Extended attributes xattr●   ACL’s●   Local mode = one node access
OCFS2 – Internals (I)●   Superblock    ●   Starts at block 3 (1+2 for OCFS1)    ●   Expected data + cluster information   ...
OCFS2 – Internals (II)●   Master or system directory    ●   Contains meta-data, e.g journal index, quota, ...    ●   Not v...
OCFS2 – Internals (III)●   Inode    ●   Usual information, e.g. owner, mode, time stamp    ●   Pointers to blocks: either ...
OCFS2 – Internals (IV)●   orphan_dir    ●   Local meta data file    ●   Cluster aware deletion of files in use●   truncate...
OCFS2 – what else●   Two versions: 1.2 and 1.4    ●   Mount compatible    ●   Framework not network compatible    ●   New ...
File system management●   Known/expected tools + cluster details    ●   mkfs    ●   mount/umount    ●   fsck●   File syste...
GFS2 management (I)●   File system creation needs additional    information    ●   Cluster name    ●   Unique file system ...
GFS2 management (II)●   Mount/umount    ●   No real syntax surprise    ●   First node checks all journals    ●   Enabling ...
GFS2 management (III)●   File system check    ●   Journal recovery of node X by node Y    ●   Done by one node    ●   file...
GFS2 tuning (I)●   gfs2_tool    ●   Most powerful        –   Display superblock        –   Change superblock settings (loc...
GFS2 tuning (II)●   gfs2_edit    ●   Logical extension of gfs2_tool    ●   More details, e.g. node-specific meta data, blo...
GFS2 tuning (III)●   gfs2_grow    ●   Needs space in meta directory    ●   Online only    ●   No shrinking
OCFS2 management (I)●   File system creation    ●   no additional information needed    ●   Tuning by optional parameters●...
OCFS2 management (II)●   File system check    ●   Journal recovery of node X by node Y    ●   Done by one node    ●   file...
OCFS2 tuning (I)●   tunefs.ocfs2    ●   Display/change file system label    ●   Display/change number of journals    ●   C...
OCFS2 tuning (II)●   debugfs.ocfs2    ●   Display file system settings, e.g. superblock    ●   Display inode information  ...
Volume manager●   Necessary to handle more than one    LUN/partition●   Cluster-aware●   Bridge feature gap, e.g. volume b...
Key data - comparison                                         GFS2                          OCFS2Maximum # of cluster node...
Summary●   GFS2 longer history than OCFS2●   OCFS2 setup simpler and easier to maintain●   GFS2 setup more flexible and po...
Referenceshttp://sourceware.org/cluster/gfs/http://www.redhat.com/gfs/http://oss.oracle.com/projects/ocfs2/http://sources....
Thank you!
Upcoming SlideShare
Loading in...5
×

Linuxkongress2010.gfs2ocfs2.talk

357
-1

Published on

Some technical information on GFS2 and OCFS2 (autumn/fall 2010)

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
357
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Linuxkongress2010.gfs2ocfs2.talk

  1. 1. Divide and conquer – Shared diskcluster file systems shipped with the Linux kernel Udo Seidel
  2. 2. Shared file systems● Multiple server access same data● Different approaches ● Network based, e.g. NFS, CIFS ● Clustered – Shared disk, e.g. CXFS, CFS, GFS(2), OCFS2 – Distributed parallel, e.g. Lustre, Ceph
  3. 3. History● GFS(2) ● First version in the mid 90’s ● Started on IRIX, later ported to Linux ● Commercial background: Sistina and RedHat ● Part of Vanilla Linux kernel since 2.6.19● OCFS2 ● OCFS1 for database files only ● First version in 2005 ● Part of Vanilla Linux kernel since 2.6.16
  4. 4. Features/Challenges/More● As much as possible similar to local file systems ● Internal setup ● management● Cluster awareness ● Data integrity ● Allocation
  5. 5. Framework● Bridges the gap between one-node and cluster● 3 main components ● Cluster-ware ● Locking ● Fencing
  6. 6. Framework GFS2 (I)● Cluster-ware of general purpose ● More flexible ● More options/functions ● More complexity ● Configuration files in XML● Locking uses cluster framework too● system­config­cluster OR Conga OR vi & scp
  7. 7. Framework GFS2 (II)# cat /etc/cluster/cluster.conf<?xml version="1.0" ?><cluster config_version="3" name="gfs2"><fence_daemon post_fail_delay="0" post_join_delay="3"/><clusternodes><clusternode name="node0" nodeid="1" votes="1"><fence/></clusternode><clusternode name="node1" nodeid="2" votes="1">...</cluster>#
  8. 8. Framework OCFS2 (I)● Cluster-ware just for OCFS2 ● Less flexible ● Less options/functions ● Less complexity ● Configuration file in ASCII● Locking uses cluster framework too● ocfs2console OR vi & scp
  9. 9. Framework OCFS2 (II)# cat /etc/ocfs2/cluster.confnode: ip_port = 7777 ip_address = 192.168.0.1 number = 0 name = node0 cluster = ocfs2...cluster: node_count = 2#
  10. 10. Locking● Distributed Lock Manager (DLM)● Based on VMS-DLM● Lock modes ● Exclusive Lock (EX) ● Protected Read (PR) ● No Lock (NL) ● Concurrent Write Lock (CW) – GFS2 only ● Concurrent Read Lock (CR) – GFS2 only ● Protected Write (PR) – GFS2 only
  11. 11. Locking - Compatibility Existing LockRequested Lock NL CR CW PR PW EXNL Yes Yes Yes Yes Yes YesCR Yes Yes Yes Yes Yes NoCW Yes Yes Yes No No NoPR Yes Yes No Yes No NoPW Yes Yes No No No NoEX Yes No No No No No
  12. 12. Fencing● Separation of host and storage ● Power Fencing – Power switch, e.g. APC – Server side, e.g. IPMI, iLO – Useful in other scenarios – Post-mortem more difficult ● I/O fencing – SAN switch, e.g. Brocade, Qlogic – Possible to investigate “unhealthy” server
  13. 13. Fencing - GFS2● Both fencing methods● Part of cluster configuration● Cascading possible
  14. 14. Fencing - OCFS2● Only power fencing ● Only self fencing
  15. 15. GFS2 – Internals (I)● Superblock ● Starts at block 128 ● Expected data + cluster information ● Pointers to master and root directory● Resource groups ● Comparable to cylinder groups of traditional Unix file system ● Allocatable from different cluster nodes -> locking granularity
  16. 16. GFS2 – Internals (II)● Master directory ● Contains meta-data, e.g journal index, quota, ... ● Not visible for ls and Co. ● File system unique and cluster node specific files● Journaling file system ● One journal per cluster node ● Each journal accessible by all nodes (recovery)
  17. 17. GFS2 – Internals (III)● Inode/Dinode ● Usual information, e.g. owner, mode, time stamp ● Pointers to blocks: either data or pointer ● Only one level of indirection ● “stuffing”● Directory management via Extendible Hashing● Meta file statfs ● statfs() ● Tuning via sysfs
  18. 18. GFS2 – Internals (IV)● Meta files ● jindex directory containing the journals – journalX ● rindex Resource group index ● quota ● per_node directory containing node specific files
  19. 19. GFS2 – what else● Extended attributes xattr● ACL’s● Local mode = one node access
  20. 20. OCFS2 – Internals (I)● Superblock ● Starts at block 3 (1+2 for OCFS1) ● Expected data + cluster information ● Pointers to master and root directory ● Up to 6 backups – at pre-defined offset – at 2^n Gbyte, n=0,2,4,6,8,10● Cluster groups ● Comparable to cylinder groups of traditional Unix file system
  21. 21. OCFS2 – Internals (II)● Master or system directory ● Contains meta-data, e.g journal index, quota, ... ● Not visible for ls and Co. ● File system unique and cluster node specific files● Journaling file system ● One journal per cluster node ● Each journal accessible by all nodes (recovery)
  22. 22. OCFS2 – Internals (III)● Inode ● Usual information, e.g. owner, mode, time stamp ● Pointers to blocks: either data or pointer ● Only one level of indirection● global_inode_alloc ● Global meta data file ● inode_alloc node specific counterpart● slot_map ● Global meta data file ● Active cluster nodes
  23. 23. OCFS2 – Internals (IV)● orphan_dir ● Local meta data file ● Cluster aware deletion of files in use● truncate_log ● Local meta data file ● Deletion cache
  24. 24. OCFS2 – what else● Two versions: 1.2 and 1.4 ● Mount compatible ● Framework not network compatible ● New features disabled per default● For 1.4: ● Extended attributes xattr ● Inode based snapshotting ● preallocation
  25. 25. File system management● Known/expected tools + cluster details ● mkfs ● mount/umount ● fsck● File system specific tools ● gfs2_XXXX ● tunefs.ocfs2, debugfs.ocfs2
  26. 26. GFS2 management (I)● File system creation needs additional information ● Cluster name ● Unique file system identifier (string) ● Optional: – Locking mode to be used – number of journals ● Tuning by changing default size for journals, resource groups, ...
  27. 27. GFS2 management (II)● Mount/umount ● No real syntax surprise ● First node checks all journals ● Enabling ACL, quota, single node mode
  28. 28. GFS2 management (III)● File system check ● Journal recovery of node X by node Y ● Done by one node ● file system offline anywhere else ● Known phases – Journals – Meta data – References: data blocks, inodes
  29. 29. GFS2 tuning (I)● gfs2_tool ● Most powerful – Display superblock – Change superblock settings (locking mode, cluster name) – List meta data – freeze/unfreeze file system – Special attributes, e.g. appendonly, noatime ● Requires file system online (mostly)
  30. 30. GFS2 tuning (II)● gfs2_edit ● Logical extension of gfs2_tool ● More details, e.g. node-specific meta data, block level● gfs2_jadd ● Different sizes possible ● No deletion possible ● Can cause data space shortage
  31. 31. GFS2 tuning (III)● gfs2_grow ● Needs space in meta directory ● Online only ● No shrinking
  32. 32. OCFS2 management (I)● File system creation ● no additional information needed ● Tuning by optional parameters● Mount/umount ● No real syntax surprise ● First node checks all journals ● Enabling ACL, quota, single node mode
  33. 33. OCFS2 management (II)● File system check ● Journal recovery of node X by node Y ● Done by one node ● file system offline anywhere else ● Fixed offset of superblock backup handy ● Known phases – Journals – Meta data – References: data blocks, inodes
  34. 34. OCFS2 tuning (I)● tunefs.ocfs2 ● Display/change file system label ● Display/change number of journals ● Change journal setup, e.g.size ● Grow file system (no shrinking) ● Create backup of superblock ● Display/enable/disable specific file system features – Sparse files – “stuffed” inodes
  35. 35. OCFS2 tuning (II)● debugfs.ocfs2 ● Display file system settings, e.g. superblock ● Display inode information ● Access meta data files
  36. 36. Volume manager● Necessary to handle more than one LUN/partition● Cluster-aware● Bridge feature gap, e.g. volume based snapshotting● CLVM● EVMS – OCFS2 only
  37. 37. Key data - comparison GFS2 OCFS2Maximum # of cluster nodes Supported 16 (theoretical: 256) 256journaling Yes YesCluster-less/local mode Yes YesMaximum file system size 25 TB (theoretical: 8 EB) 16 TB (theoretical: 4 EB)Maximum file size 25 TB (theoretical: 8 EB) 16 TB (theoretical: 4 EB)POSIX ACL Yes YesGrow-able Yes/online only Yes/online and offlineShrinkable No NoQuota Yes YesO_DIRECT On file level YesExtended attributes Yes YesMaximum file name length 255 255File system snapshots No No
  38. 38. Summary● GFS2 longer history than OCFS2● OCFS2 setup simpler and easier to maintain● GFS2 setup more flexible and powerful● OCFS2 getting close to GFS2● Dependence on choice of Linux vendor
  39. 39. Referenceshttp://sourceware.org/cluster/gfs/http://www.redhat.com/gfs/http://oss.oracle.com/projects/ocfs2/http://sources.redhat.com/cluster/wiki/http://sourceware.org/lvm2/http://evms.sourceforge.net/
  40. 40. Thank you!
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×