HA NAS with CephFSWyllys Ingersoll – Keeper Technology, LLC
Data and Storage Management Experts
Focus on IC & Commercial Customers for 12 Years
•  Mul?-PB Enterprise Systems
•  Imagery, Computer Forensics, Big Data
•  High Volume/High Velocity Data Analysis
•  Full Solu?on Provider
•  Keeper Products + Partner Products

Introductions
Keeper Technology
•  How we implemented HA NAS gateways using
cephfs
•  Cluster configura?on
•  SoPware used
•  Issues encountered
•  Performance Sta?s?cs
Overview
Overview
3/17/17 - 3
•  Ceph Jewel 10.2.5 on Ubuntu-based Linux
•  6 Storage servers, ~ 80 TB Usable (3-copy)
•  ~15 OSD per server
•  2, 3, and 4TB 7200RPM spinning drives (no SSD)
•  2 Gateways
•  HP DL380 G9 Servers w/48GB RAM
•  3 Monitors + 3 MDS Servers
•  HP DL360 G6 w/48GB RAM
Configuration
Cluster Configuration
3/17/17 - 4
•  Provide NFS and/or SMB filesystem shares with
redundancy
•  Failover if a gateway goes down
•  Clients should not lose data
•  Minimal interrup?on of client workflow
•  “Seamless” for NFSv3 – others are WiP.
•  Minimum 2 gateways required
Goals
HA NAS Goals
3/17/17 - 5
•  SAMBA 4.5.5 w/CTDB support
•  Built with “—with-cluster-support” flag
•  CTDB is key to HA func?onality
•  CTDB = Clustered Trivial Database
•  Node monitor, failover, IP takeover
•  Define mul?ple floa?ng IP Addresses in DNS
•  CTDB configured with virtual shared IPs and real IP of
each GW.
•  CTDB nodes communicate on private network
•  Insecure protocol
Software
SMB
3/17/17 - 6
•  Ganesha NFS 2.4.3
•  User space NFS service, replaces kernel NFS
•  Building code from github repo code
•  Store ganesha config on shared FS
•  Ex: /cephfs/nfs/ganesha.conf
•  HA gateways must have common NFS Export IDs
•  Use “VFS” FSAL (not “CEPH”) for Ganesha exports
Software
NFS
3/17/17 - 7
•  Single FS per cluster (for now) - /cephfs
•  Disable snapshots or restrict to top level
•  Hard-linking bug prevents reliable snapshots on subdirs
•  Prefer kernel mounted over fuse for performance
•  Kernel 4.8.10
•  Each export is a subdirectory
•  /cephfs/exports/foobar
Software
NAS with Cephfs
3/17/17 - 8
Data Flow
3/17/17 - 9
10GB
Ceph
GW-1
GW-2
GW-3
Virtual IP
Client
1
Client
2
Client
3
SMB
CTDB
CTDB
CTDB
10GB Network
•  SAMBA locks stored on shared FS
•  Ex: /cephfs/ctdb
•  CTDB monitors SAMBA and Ganesha services
•  Starts and stops as necessary via “callout” scripts
•  CTDB assigns virtual IP addrs as needed
Software
NAS With Cephfs
3/17/17 - 10
•  Kernel support for cephfs varies
•  Using “bleeding edge” kernels for best results
•  Cannot set quotas on subdirectories
•  kernel cephfs limita?on
•  Cannot limit size available for a single export
•  Each share has max size = en?re cephfs data pool
•  Snapshots only at top level
•  Cannot snapshot each exported subdirectory
Issues
Issues and Problems
3/17/17 - 11
•  mds_cache_size = 8,000,000 (default was 100k)
•  Uses more RAM, but we have 48GB
•  Avoid “failing to respond to cache pressure” errors
•  Use “default” crush tunables (not “jewel”).
•  Works beqer with older kernels

Issues
Adjustments
3/17/17 - 12
•  FIO parameters
•  Vary block sizes (4K, 64K, 1m, 4m)
•  Vary # of jobs (1, 16, 32, 64)
•  Iodepth = 1
•  Read/write + randread/randwrite
•  Ioengine = sync
•  direct=1
•  Simultaneous 4 dis?nct clients on 10GB link
Performance
Test Methodology
3/17/17 - 13
Performance
Performance Configuration
3/17/17 - 14
NAS Gateway
Client
1
Client
2
Client
3
Client
4
/cephfs
Shared FS (/cephfs)
mounted with NFS or SMB
/cephfs
 /cephfs
 /cephfs
 /cephfs
FIO directIO used on each client
to read & write data to the share.
Ceph Cluster
Performance
NFS Read (kernel mount)
3/17/17 - 15
Performance
NFS Writes (kernel mount)
3/17/17 - 16
Performance
NFS Read (Fuse mount)
3/17/17 - 17
Performance
NFS Write (Fuse mount)
3/17/17 - 18
Performance
SMB Read (kernel mount)
3/17/17 - 19
Performance
SMB Write (kernel mount)
3/17/17 - 20
Performance
SMB Read (fuse mount)
3/17/17 - 21
Performance
SMB Write (fuse mount)
3/17/17 - 22
•  High Availability NAS is possible with cephfs
•  Some issues remain
•  Deep snapshots
•  Quotas/limits on subdirs
•  Performance is “OK”
•  Standard NAS protocol limita?ons (NFS, SMB)
Summary
Summary
3/17/17 - 23
Thank You 
| 21740 Beaumeade Circle | Suite 150 | Ashburn, VA 20147 | P [571] 333 2725 | F [703] 738 7231 | solu?ons@keepertech.com | www.keepertech.com
Thank You

Ceph Day San Jose - HA NAS with CephFS

  • 1.
    HA NAS withCephFSWyllys Ingersoll – Keeper Technology, LLC
  • 2.
    Data and StorageManagement Experts Focus on IC & Commercial Customers for 12 Years •  Mul?-PB Enterprise Systems •  Imagery, Computer Forensics, Big Data •  High Volume/High Velocity Data Analysis •  Full Solu?on Provider •  Keeper Products + Partner Products Introductions Keeper Technology
  • 3.
    •  How weimplemented HA NAS gateways using cephfs •  Cluster configura?on •  SoPware used •  Issues encountered •  Performance Sta?s?cs Overview Overview 3/17/17 - 3
  • 4.
    •  Ceph Jewel10.2.5 on Ubuntu-based Linux •  6 Storage servers, ~ 80 TB Usable (3-copy) •  ~15 OSD per server •  2, 3, and 4TB 7200RPM spinning drives (no SSD) •  2 Gateways •  HP DL380 G9 Servers w/48GB RAM •  3 Monitors + 3 MDS Servers •  HP DL360 G6 w/48GB RAM Configuration Cluster Configuration 3/17/17 - 4
  • 5.
    •  Provide NFSand/or SMB filesystem shares with redundancy •  Failover if a gateway goes down •  Clients should not lose data •  Minimal interrup?on of client workflow •  “Seamless” for NFSv3 – others are WiP. •  Minimum 2 gateways required Goals HA NAS Goals 3/17/17 - 5
  • 6.
    •  SAMBA 4.5.5w/CTDB support •  Built with “—with-cluster-support” flag •  CTDB is key to HA func?onality •  CTDB = Clustered Trivial Database •  Node monitor, failover, IP takeover •  Define mul?ple floa?ng IP Addresses in DNS •  CTDB configured with virtual shared IPs and real IP of each GW. •  CTDB nodes communicate on private network •  Insecure protocol Software SMB 3/17/17 - 6
  • 7.
    •  Ganesha NFS2.4.3 •  User space NFS service, replaces kernel NFS •  Building code from github repo code •  Store ganesha config on shared FS •  Ex: /cephfs/nfs/ganesha.conf •  HA gateways must have common NFS Export IDs •  Use “VFS” FSAL (not “CEPH”) for Ganesha exports Software NFS 3/17/17 - 7
  • 8.
    •  Single FSper cluster (for now) - /cephfs •  Disable snapshots or restrict to top level •  Hard-linking bug prevents reliable snapshots on subdirs •  Prefer kernel mounted over fuse for performance •  Kernel 4.8.10 •  Each export is a subdirectory •  /cephfs/exports/foobar Software NAS with Cephfs 3/17/17 - 8
  • 9.
    Data Flow 3/17/17 -9 10GB Ceph GW-1 GW-2 GW-3 Virtual IP Client 1 Client 2 Client 3 SMB CTDB CTDB CTDB 10GB Network
  • 10.
    •  SAMBA locksstored on shared FS •  Ex: /cephfs/ctdb •  CTDB monitors SAMBA and Ganesha services •  Starts and stops as necessary via “callout” scripts •  CTDB assigns virtual IP addrs as needed Software NAS With Cephfs 3/17/17 - 10
  • 11.
    •  Kernel supportfor cephfs varies •  Using “bleeding edge” kernels for best results •  Cannot set quotas on subdirectories •  kernel cephfs limita?on •  Cannot limit size available for a single export •  Each share has max size = en?re cephfs data pool •  Snapshots only at top level •  Cannot snapshot each exported subdirectory Issues Issues and Problems 3/17/17 - 11
  • 12.
    •  mds_cache_size =8,000,000 (default was 100k) •  Uses more RAM, but we have 48GB •  Avoid “failing to respond to cache pressure” errors •  Use “default” crush tunables (not “jewel”). •  Works beqer with older kernels Issues Adjustments 3/17/17 - 12
  • 13.
    •  FIO parameters • Vary block sizes (4K, 64K, 1m, 4m) •  Vary # of jobs (1, 16, 32, 64) •  Iodepth = 1 •  Read/write + randread/randwrite •  Ioengine = sync •  direct=1 •  Simultaneous 4 dis?nct clients on 10GB link Performance Test Methodology 3/17/17 - 13
  • 14.
    Performance Performance Configuration 3/17/17 -14 NAS Gateway Client 1 Client 2 Client 3 Client 4 /cephfs Shared FS (/cephfs) mounted with NFS or SMB /cephfs /cephfs /cephfs /cephfs FIO directIO used on each client to read & write data to the share. Ceph Cluster
  • 15.
    Performance NFS Read (kernelmount) 3/17/17 - 15
  • 16.
    Performance NFS Writes (kernelmount) 3/17/17 - 16
  • 17.
    Performance NFS Read (Fusemount) 3/17/17 - 17
  • 18.
    Performance NFS Write (Fusemount) 3/17/17 - 18
  • 19.
    Performance SMB Read (kernelmount) 3/17/17 - 19
  • 20.
    Performance SMB Write (kernelmount) 3/17/17 - 20
  • 21.
    Performance SMB Read (fusemount) 3/17/17 - 21
  • 22.
    Performance SMB Write (fusemount) 3/17/17 - 22
  • 23.
    •  High AvailabilityNAS is possible with cephfs •  Some issues remain •  Deep snapshots •  Quotas/limits on subdirs •  Performance is “OK” •  Standard NAS protocol limita?ons (NFS, SMB) Summary Summary 3/17/17 - 23
  • 24.
    Thank You |21740 Beaumeade Circle | Suite 150 | Ashburn, VA 20147 | P [571] 333 2725 | F [703] 738 7231 | solu?ons@keepertech.com | www.keepertech.com Thank You