Linux User Group Bulgaria
                      10th annual meeting


                           Cluster Filesystems




                                             Marian Marinov - mm@yuhu.biz
                                           System Architect - Siteground.com
Stara Zagora 09.Jun.2007
Agenda


         1. What is cluster filesystem?
           ➢ single disk filesystems

           ➢ shared disk filesystems

           ➢ distributed disk filesystems




         2. Shared storage
           ➢ Why we need shared storage?

           ➢ What shared storrage solutions are avalable

         at the moment?
           ➢ Sample configurations




         3. Cluster filesystems
           ➢ Information

           ➢ Configuration




Stara Zagora 09.Jun.2007
What is a cluster filesystem?


         1. Single disk filesystems
           ➢ reiserfs

           ➢ ext2/3/4

           ➢ xfs




         2. Shared disk filesystems
           ➢ ocfs2

           ➢ gfs1/2




         3. Distributed filesystems
           ➢ pvfs1/2

           ➢ GFarm




Stara Zagora 09.Jun.2007
Shared storage


         1. Why we need shared storage?
           ➢ reliability

           ➢ better disk utilization




         2. What shared storrage solutions are avalable
         at the moment?
           ➢ SAN/NAS (FCP storage solutions)

           ➢ DRBD – Distributed Replicated Block Device

           ➢ GNDB – Global Network Block Device

           ➢ iSCSI over TCP/IP

           ➢ ATA over Ethernet (AoE)




         3. Sample configurations
           ➢ DRBD

           ➢ iSCSI

           ➢ AoE




Stara Zagora 09.Jun.2007
Shared storage - DRBD


         Basic setups:
          ➢ Master/Slave

          ➢ Master/Master




Stara Zagora 09.Jun.2007
Shared storage - GNBD


         HMM ?




Stara Zagora 09.Jun.2007
Shared storage – iSCSI over TCP/IP




Stara Zagora 09.Jun.2007
Shared storage – iSCSI over TCP/IP




Stara Zagora 09.Jun.2007
Shared storage – iSCSI over TCP/IP




Stara Zagora 09.Jun.2007
Shared storage – iSCSI over TCP/IP

          ➢   Can be routed
          ➢   Support for authentication
          ➢   Can run on any disks / files
          ➢   Kernel / User space componets for the client & server

        Trivial iSCSI configuration:
          ➢   name – iqn.YYYY-MM.com.example:disk.name
          ➢   add target info to /etc/ietd.conf

          ➢   Lun definitions describe disks to export
          ➢   fileio type for normal disks
          ➢   Special nullio type for testing

        Target iqn.2006-08.com.example:lab.exports
         Lun 0 Path=/dev/sdX,Type=fileio
         Lun 1 Sectors=10000,Type=nullio

Stara Zagora 09.Jun.2007
Shared storage – iSCSI over TCP/IP

       Recent releases have a DB driven config.
       Use “iscsiadm” program to manipulate
       “rm -f /var/db/iscsi/*” to start fresh
       3 steps
         ➢ Add discovery address

         ➢ Log into target

         ➢ When done, log out of target

          ➢

       $ iscsiadm -m discovery --type sendtargets –portal
       examplehost
       [cbb01c] 192.168.1.6:3260,1 iqn.2006-
       08.com.example:lab.exports
       $ iscsiadm -m node --record cbb01c –-login
       $ iscsiadm -m node --record cbb01c –-logout



Stara Zagora 09.Jun.2007
Shared storage – ATA over Ethernet

     ➢ Very simple standard – only 6 page specification
     ➢ Lightweight client – less CPU overhead then iSCSI

     ➢ Very easy to setup – Autoconfiguratio

     via Ethernet broadcast
     ➢ Not routable, no authentication

     ➢ Disks addressed by „shelf“ and „slot“ numbers.



     ➢ „Virtual Blade“ (vblade) software – available
     for Linux & FreeBSD
     ➢ very small, user space daemon

     ➢ very simple command:

     ➢ vbladed <shelf> <slot> <ethn> <device>




Stara Zagora 09.Jun.2007
Shared storage – ATA over Ethernet


           ➢ Single kernel module
           ➢ Automaticly finds blades

           ➢ Additional load time parameters:

           ➢ aoe_iflist – list of interfaces to listen on

           ➢ AoEtools package




Stara Zagora 09.Jun.2007
Shared storage – ATA over Ethernet

        The ATA over Ethernet
        header




Stara Zagora 09.Jun.2007
Shared storage – AoE vs iSCSI




Stara Zagora 09.Jun.2007
Shared filesystems - OCFS2


       Where is OCFS1?

       OCFS2 info:
       ➢ General purpuse cluster filesystem

       ➢ Almost POSIX compliant

            ➢ fcntl(2) locking

            ➢ shared writable mmap

       ➢ Keeps filesystem operations local

        ➢ reduces lock contention

        ➢ Implements lock caching

            ➢ FS Internal abstraction for cluster locking

       ➢ Uses good practices:

            ➢ Ext3 directory code & group allocation

            ➢ JBD journaling

        ➢ Own heartbeat engin

        ➢ Only concerned with cluster locking



Stara Zagora 09.Jun.2007
Shared filesystems - OCFS2

         Supported in the mainline kernel
         ➢

       ➢ Very easy to setup

       ➢ Standard set of FS utils

      mkfs.ocfs2, mount.ocfs2, fsck.ocfs & etc.
       ➢ Cluster aware

       ➢ GUI for the configuration

       ➢ No resize




      Installation:
         ➢compile the kernel with OCFS2 support:
        CONFIG_OCFS2_FS=m
        ➢ Download and build the sources of the ocfs2-tools

        or use binary packages.
        ➢ Configure o2cb init script

        ➢ Generate /etc/ocfs2/cluster.conf




Stara Zagora 09.Jun.2007
Shared filesystems - OCFS2

        Sample configuration:

        cluster:
           node_count = 2
           name = testme
        node:
           ip_port = 8989
           ip_address = 10.2.0.4
           number = 0
           name = lu
           cluster = testme
        node:
           ip_port = 9898
           ip_address = 10.2.0.8
           number = 1
           name = gamelon
           cluster = testme
Stara Zagora 09.Jun.2007
Shared filesystems - OCFS2

        Sample configuration:

        cluster:
           node_count = 2
           name = testme
        node:
           ip_port = 8989
           ip_address = 10.2.0.4
           number = 0
           name = lu
           cluster = testme
        node:
           ip_port = 9898
           ip_address = 10.2.0.8
           number = 1
           name = gamelon
           cluster = testme
Stara Zagora 09.Jun.2007
Shared filesystems - GFS2

        Where is GFS1?

        GFS2:
         ➢ Can use different types of DLM

           ➢ GFS DLM

           ➢ Grand Unified Lock Manager (GuLM)

           ➢ Milticast

         ➢ Fencing

           ➢ Internal

           ➢ External

         ➢ Integrated failover

         ➢ Cluster Logical Volume Manager – CLVM

         ➢ Global Network Block Device – GNBD

         ➢ Cluster configuration system – CCM

         ➢ Cluster manager – CMAN




Stara Zagora 09.Jun.2007
Shared filesystems - GFS2




Stara Zagora 09.Jun.2007
Shared filesystems - GFS2




Stara Zagora 09.Jun.2007
Distributed filesystems - PVFS2

        PVFS1 works only for Linux 2.4 kernels

        PVFS:
         ➢ split in two parts

         ➢ Metadata server

         ➢ I/O server

         ➢ Designed for HPC clusters

         ➢ Designed to preform best with software written

        with MPICH1 & MPICH2 (MPI-IO interface)
         ➢ No resize




Stara Zagora 09.Jun.2007
Distributed filesystems - PVFS2




Stara Zagora 09.Jun.2007
Distributed filesystems - GFarm


        Grid Datafarm File System:
        ➢   Distributed and fault tolerant file system
        ➢   Dispersed storage
        ➢   FUSE Module
        ➢   GFarm APIs

        GFarm node types:
        ➢ Client

        ➢ Filesystem – gfsd

        ➢ Metadata server – gfmd & OpenLDAP or PostgreSQL

        ➢ Metadata cache – gfarm_agent




Stara Zagora 09.Jun.2007
Distributed filesystems - GFarm


        gfsd - the Gfarm filesystem daemon
        gfmd - the Gfarm filesystem metadata server
        gfarm_agent - the Gfarm metadata cache server
        Gfarm command tools
        ➢ gfls                      ➢ gfcp

        ➢ gfrm                      ➢ gfgrep

        ➢ gfwhere                   ➢ gfwc

        ➢ gfrep                     ➢ gfrun

        ➢ gfhost                    ➢ gfmpirun_p4

        ➢ gfreg                     ...
        ➢ gfexport                  etc.
        ➢ gfkey

        ➢ gfps




Stara Zagora 09.Jun.2007
Distributed filesystems - GFarm


              Authentication

        gfmd & gfsd can use:
        ➢ shared secret

        ➢ GSI – Grid Security Infrastructure

        ➢ PostgreSQL & OpenLDAP auth methods




Stara Zagora 09.Jun.2007
Distributed filesystems - GFarm




Stara Zagora 09.Jun.2007
Distributed filesystems - GFarm




Stara Zagora 09.Jun.2007
Distributed filesystems - GFarm


        ? ? ? ? ? ? ?
         ? ? ?? ? ? ?
        ? ? Въпроси ? ?
        ? ? ? ? ? ? ?
         ? ? ? ? ? ?
          ? ? ? ? ? ? ?
Stara Zagora 09.Jun.2007

Cluster filesystems

  • 1.
    Linux User GroupBulgaria 10th annual meeting Cluster Filesystems Marian Marinov - mm@yuhu.biz System Architect - Siteground.com Stara Zagora 09.Jun.2007
  • 2.
    Agenda 1. What is cluster filesystem? ➢ single disk filesystems ➢ shared disk filesystems ➢ distributed disk filesystems 2. Shared storage ➢ Why we need shared storage? ➢ What shared storrage solutions are avalable at the moment? ➢ Sample configurations 3. Cluster filesystems ➢ Information ➢ Configuration Stara Zagora 09.Jun.2007
  • 3.
    What is acluster filesystem? 1. Single disk filesystems ➢ reiserfs ➢ ext2/3/4 ➢ xfs 2. Shared disk filesystems ➢ ocfs2 ➢ gfs1/2 3. Distributed filesystems ➢ pvfs1/2 ➢ GFarm Stara Zagora 09.Jun.2007
  • 4.
    Shared storage 1. Why we need shared storage? ➢ reliability ➢ better disk utilization 2. What shared storrage solutions are avalable at the moment? ➢ SAN/NAS (FCP storage solutions) ➢ DRBD – Distributed Replicated Block Device ➢ GNDB – Global Network Block Device ➢ iSCSI over TCP/IP ➢ ATA over Ethernet (AoE) 3. Sample configurations ➢ DRBD ➢ iSCSI ➢ AoE Stara Zagora 09.Jun.2007
  • 5.
    Shared storage -DRBD Basic setups: ➢ Master/Slave ➢ Master/Master Stara Zagora 09.Jun.2007
  • 6.
    Shared storage -GNBD HMM ? Stara Zagora 09.Jun.2007
  • 7.
    Shared storage –iSCSI over TCP/IP Stara Zagora 09.Jun.2007
  • 8.
    Shared storage –iSCSI over TCP/IP Stara Zagora 09.Jun.2007
  • 9.
    Shared storage –iSCSI over TCP/IP Stara Zagora 09.Jun.2007
  • 10.
    Shared storage –iSCSI over TCP/IP ➢ Can be routed ➢ Support for authentication ➢ Can run on any disks / files ➢ Kernel / User space componets for the client & server Trivial iSCSI configuration: ➢ name – iqn.YYYY-MM.com.example:disk.name ➢ add target info to /etc/ietd.conf ➢ Lun definitions describe disks to export ➢ fileio type for normal disks ➢ Special nullio type for testing Target iqn.2006-08.com.example:lab.exports Lun 0 Path=/dev/sdX,Type=fileio Lun 1 Sectors=10000,Type=nullio Stara Zagora 09.Jun.2007
  • 11.
    Shared storage –iSCSI over TCP/IP Recent releases have a DB driven config. Use “iscsiadm” program to manipulate “rm -f /var/db/iscsi/*” to start fresh 3 steps ➢ Add discovery address ➢ Log into target ➢ When done, log out of target ➢ $ iscsiadm -m discovery --type sendtargets –portal examplehost [cbb01c] 192.168.1.6:3260,1 iqn.2006- 08.com.example:lab.exports $ iscsiadm -m node --record cbb01c –-login $ iscsiadm -m node --record cbb01c –-logout Stara Zagora 09.Jun.2007
  • 12.
    Shared storage –ATA over Ethernet ➢ Very simple standard – only 6 page specification ➢ Lightweight client – less CPU overhead then iSCSI ➢ Very easy to setup – Autoconfiguratio via Ethernet broadcast ➢ Not routable, no authentication ➢ Disks addressed by „shelf“ and „slot“ numbers. ➢ „Virtual Blade“ (vblade) software – available for Linux & FreeBSD ➢ very small, user space daemon ➢ very simple command: ➢ vbladed <shelf> <slot> <ethn> <device> Stara Zagora 09.Jun.2007
  • 13.
    Shared storage –ATA over Ethernet ➢ Single kernel module ➢ Automaticly finds blades ➢ Additional load time parameters: ➢ aoe_iflist – list of interfaces to listen on ➢ AoEtools package Stara Zagora 09.Jun.2007
  • 14.
    Shared storage –ATA over Ethernet The ATA over Ethernet header Stara Zagora 09.Jun.2007
  • 15.
    Shared storage –AoE vs iSCSI Stara Zagora 09.Jun.2007
  • 16.
    Shared filesystems -OCFS2 Where is OCFS1? OCFS2 info: ➢ General purpuse cluster filesystem ➢ Almost POSIX compliant ➢ fcntl(2) locking ➢ shared writable mmap ➢ Keeps filesystem operations local ➢ reduces lock contention ➢ Implements lock caching ➢ FS Internal abstraction for cluster locking ➢ Uses good practices: ➢ Ext3 directory code & group allocation ➢ JBD journaling ➢ Own heartbeat engin ➢ Only concerned with cluster locking Stara Zagora 09.Jun.2007
  • 17.
    Shared filesystems -OCFS2 Supported in the mainline kernel ➢ ➢ Very easy to setup ➢ Standard set of FS utils mkfs.ocfs2, mount.ocfs2, fsck.ocfs & etc. ➢ Cluster aware ➢ GUI for the configuration ➢ No resize Installation: ➢compile the kernel with OCFS2 support: CONFIG_OCFS2_FS=m ➢ Download and build the sources of the ocfs2-tools or use binary packages. ➢ Configure o2cb init script ➢ Generate /etc/ocfs2/cluster.conf Stara Zagora 09.Jun.2007
  • 18.
    Shared filesystems -OCFS2 Sample configuration: cluster: node_count = 2 name = testme node: ip_port = 8989 ip_address = 10.2.0.4 number = 0 name = lu cluster = testme node: ip_port = 9898 ip_address = 10.2.0.8 number = 1 name = gamelon cluster = testme Stara Zagora 09.Jun.2007
  • 19.
    Shared filesystems -OCFS2 Sample configuration: cluster: node_count = 2 name = testme node: ip_port = 8989 ip_address = 10.2.0.4 number = 0 name = lu cluster = testme node: ip_port = 9898 ip_address = 10.2.0.8 number = 1 name = gamelon cluster = testme Stara Zagora 09.Jun.2007
  • 20.
    Shared filesystems -GFS2 Where is GFS1? GFS2: ➢ Can use different types of DLM ➢ GFS DLM ➢ Grand Unified Lock Manager (GuLM) ➢ Milticast ➢ Fencing ➢ Internal ➢ External ➢ Integrated failover ➢ Cluster Logical Volume Manager – CLVM ➢ Global Network Block Device – GNBD ➢ Cluster configuration system – CCM ➢ Cluster manager – CMAN Stara Zagora 09.Jun.2007
  • 21.
    Shared filesystems -GFS2 Stara Zagora 09.Jun.2007
  • 22.
    Shared filesystems -GFS2 Stara Zagora 09.Jun.2007
  • 23.
    Distributed filesystems -PVFS2 PVFS1 works only for Linux 2.4 kernels PVFS: ➢ split in two parts ➢ Metadata server ➢ I/O server ➢ Designed for HPC clusters ➢ Designed to preform best with software written with MPICH1 & MPICH2 (MPI-IO interface) ➢ No resize Stara Zagora 09.Jun.2007
  • 24.
    Distributed filesystems -PVFS2 Stara Zagora 09.Jun.2007
  • 25.
    Distributed filesystems -GFarm Grid Datafarm File System: ➢ Distributed and fault tolerant file system ➢ Dispersed storage ➢ FUSE Module ➢ GFarm APIs GFarm node types: ➢ Client ➢ Filesystem – gfsd ➢ Metadata server – gfmd & OpenLDAP or PostgreSQL ➢ Metadata cache – gfarm_agent Stara Zagora 09.Jun.2007
  • 26.
    Distributed filesystems -GFarm gfsd - the Gfarm filesystem daemon gfmd - the Gfarm filesystem metadata server gfarm_agent - the Gfarm metadata cache server Gfarm command tools ➢ gfls ➢ gfcp ➢ gfrm ➢ gfgrep ➢ gfwhere ➢ gfwc ➢ gfrep ➢ gfrun ➢ gfhost ➢ gfmpirun_p4 ➢ gfreg ... ➢ gfexport etc. ➢ gfkey ➢ gfps Stara Zagora 09.Jun.2007
  • 27.
    Distributed filesystems -GFarm Authentication gfmd & gfsd can use: ➢ shared secret ➢ GSI – Grid Security Infrastructure ➢ PostgreSQL & OpenLDAP auth methods Stara Zagora 09.Jun.2007
  • 28.
    Distributed filesystems -GFarm Stara Zagora 09.Jun.2007
  • 29.
    Distributed filesystems -GFarm Stara Zagora 09.Jun.2007
  • 30.
    Distributed filesystems -GFarm ? ? ? ? ? ? ? ? ? ?? ? ? ? ? ? Въпроси ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Stara Zagora 09.Jun.2007