Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Delex 2020: Deep diving into the dynamic provisioning of GlusterFS volumes in k8s with Heketi


Published on

In this talk Artem will make a deep dive into the dynamic provisioning of GlusterFS volumes in kubernetes with Heketi. With overview of mentioned stack and detailed description of solution architecture, build on top of mentioned stack.

Be ready to look into the source code with Artem, and see how Heketi works with GlusterFS. Is it issue-free? Nope for sure. So Artem will go through most common troubleshooting. Describe possible improvements and develop it from the ground-up.
The outcome of the talk: common mistakes in the architecture, conclusions and recommendations.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Delex 2020: Deep diving into the dynamic provisioning of GlusterFS volumes in k8s with Heketi

  1. 1. Deep diving into the dynamic provisioning of GlusterFS volumes in k8s with Heketi Artem Romanchik
  2. 2. Key notes Persistent Volumes Claim (PVC) GlusterFS Heketi Known issues Good advice
  3. 3. Did you work with GlusterFS and Heketi? Most popular answers А что это такое? Немножко БГ миловал! Борис Гребенщиков Использую в проде Крутая штука!
  4. 4. GlusterFS GlusterFS is a scalable network filesystem suitable for data-intensive tasks such as cloud storage and media streaming. * - Distributed Glusterfs Volume - Replicated Glusterfs Volume - Distributed Replicated Glusterfs Volume - Striped Glusterfs Volume - Distributed Striped Glusterfs Volume
  5. 5. Heketi Heketi provides a RESTful management interface which can be used to manage the life cycle of GlusterFS volumes. With Heketi, cloud services like OpenStack Manila, Kubernetes, and OpenShift can dynamically provision GlusterFS volumes. Kubernetes Heketi GlusterFS Management Mount volumes
  6. 6. Persistent Volumes Claim (PVC). What is it?
  7. 7. PVC from user side K8S POD PVC Volume Mount apiVersion: v1 kind: PersistentVolumeClaim metadata: name: mystorage spec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: slow
  8. 8. PVC from server side PVC StorageClass PV … glusterfs: endpoints: glusterfs-dynamic-service-storage-service-0 path: vol_c2fe5ec0d33aff8bc91893d9fedf84f7 … parameters: resturl: … volumetype: replicate:3 provisioner: K8S POD Service Endpoints Heketi resources: requests: storage: 1Gi storageClassName: slow
  9. 9. PVC from Heketi side Kubernetes (provisioner) Heketi POST /volumes HTTP/1.1 * Location: /queue/fb82…adc3 * {"size":1,"name":"","durability":{"type":"replicate","replicate":{"replica":3},"disperse":{}},"gid":2008,"snapshot":{"enable":true,"factor":1}} GET /queue/fb82..adc3 ** ** Every 2s Location: /volumes/2927…66b02 GET /volumes/2927…66b02 *** {"size":1,"name":"vol_29279734e412cac413e2baf5deb66b02","durability":{"type":"replicate","replicate":{"replica":3},"disperse":{}}, "gid":2008,"glustervolumeoptions":["",""],"snapshot":{"enable":true,"factor":1} Volume {"size":1,"name":"vol_2927…66b02" ***
  10. 10. PVC from GlusterFS side Heketi GlusterFS mkdir -p /var/lib/heketi/mounts/vg_name/brick_name Thin logical volume create * mkfs.xfs -i size=512 -n size=8192 /dev/mapper/vg_name-brick_name ** awk "BEGIN {print "/dev/mapper/vg_name-brick_name /var/lib/heketi/mounts/vg_name/brick_name xfs rw,inode64,noatime,nouuid 1 2" >> "/var/lib/heketi/fstab"}" Add volume to /var/lib/heketi/fstab ** Mount volume mkdir /var/lib/heketi/mounts/vg_name/brick_name/brick gluster --mode=script --timeout=600 volume create vol_name replica 3 brick1 brick2 brick3 gluster --mode=script --timeout=600 volume set vol_name ID gluster --mode=script --timeout=600 volume start vol_name * lvcreate -qq --autobackup=y --poolmetadatasize 8192K --chunksize 256K --size 1048576K --thin vg_name/tp_name --virtualsize 1048576K --name brick_name
  11. 11. Map of the Heketi world pvc StorageClass PV Heketi Service GlusterFS Volume Brick1 Brick2 Brick3 Heketi DB … metadata: annotations: a9d8b1ae636258c09af7378946ceac76 name: pvc-90d794f3-41c8-11ea-8353-06d8b3ea3b88 glusterfs: endpoints: glusterfs-dynamic-test path: vol_90da4e99-41c8-11ea-8e54-06029613cf28 … Endpoints … subsets: - addresses: - ip: - ip: - ip: ports: - port: 1 protocol: TCP pkg/remoteexec/*executor ssh/ "port": 8081, "glusterfs": { "executor": "kubernetes", "db": "/var/lib/heketi/heketi.db", "kubeexec": { "host": "https://kubernetes.default.svc.cluster.local", "fstab": "/var/lib/heketi/fstab", "backup_lvm_metadata": true
  12. 12. import ( … gcli "" gapi "“ …) … func (p *glusterfsVolumeProvisioner) CreateVolume(gid int) (r *v1.GlusterfsPersistentVolumeSource, size int, volID string, err error) { … cli := gcli.NewClient(d.url, d.user, d.secretValue) … volumeReq := &gapi.VolumeCreateRequest{Size: sz, Name: customVolumeName, Clusters: clusterIDs, Gid: gid64, Durability: p.volumeType, GlusterVolumeOptions: p.volumeOptions, Snapshot: snaps} volume, err := cli.VolumeCreate(volumeReq) …
  13. 13. Not all information we can find in the docs And here: type provisionerConfig struct { … url string user string volumeType gapi.VolumeDurabilityInfo volumeNamePrefix string thinPoolSnapFactor float32 customEpNamePrefix string .... } There is no information here: Fortunatelly, we can see a pretty good description here: customepnameprefix: By default dynamically provisioned volumes has an endpoint and service created with the naming schema of glusterfs-dynamic-<PVC UUID format. With this option present in storageclass, an admin can now prefix the desired endpoint from storageclass. If customepnameprefix storageclass parameter is set, the dynamically provisioned volumes will have an endpoint and service created in the following format where - is the field separator/delimiter: customepnameprefix-<PVC UUID> Task: custom gluster volumes names
  14. 14. External projects as part of Heketi gorilla/mux is a powerful URL router and dispatcher Library for creating powerful modern CLI applications as well as a program to generate applications and command files An embedded key/value database for Go BoltDB apps/glusterfs/dbcommon.go heketi-cli vloume list GET HTTP://localhost:8080/volumes/list JSON /var/lib/heketi.db
  15. 15. clusterentries nodeentries Heketi database nodeentries deviceentries deviceentries volumeentries brickentries volumeentries volumeentries volumeentries brickentries brickentries brickentries brickentries brickentries brickentries brickentries brickentries brickentries
  16. 16. Heketi Database # cat db_before.json | jq '. | map_values(keys)' { "clusterentries": [ "2e16d5adfb5eababeceb6719e5e808cd" ], "volumeentries": [ "08f43b51ac546868c0d1a29e5f3921fb" ], "brickentries": [ "077dfbe0c7f0a8949960ef21333a2c11", "1…b0", "1c6…5“], "nodeentries": [ "520bf6e83b4bba8fb5a992a6da6ef041", "d3…a0", "fdb…13" ], "deviceentries": [ "0888196c04e9e5f7ad346ba7ec173c01", "38…94", "fb…a6" ], "blockvolumeentries": [], "dbattributeentries": [], "pendingoperations": [] } "08f43b51ac546868c0d1a29e5f3921fb": { "Info": { "size": 2, "name": "vol_08f43b51ac546868c0d1a29e5f3921fb", "durability": { "type": "replicate", "replicate": { "replica": 3 }, "disperse": {} }, "gid": 2016, "snapshot": { "enable": false, "factor": 1 }, "id": "08f43b51ac546868c0d1a29e5f3921fb", "cluster": "2e16d5adfb5eababeceb6719e5e808cd", "mount": { "glusterfs": { "hosts": [ "", "“, "“ ], "device": "", "options": { "backup-volfile-servers": ",“ } } }, "blockinfo": {} }, "Bricks": [ "077dfbe0c7f0a8949960ef21333a2c11", "1…b0", "1c6…5 “ ], "GlusterVolumeOptions": [ ], "Pending": { "Id": "“ } } … "bricks": { "$ 077dfbe0c7f0a8949960ef21333a2c11 ": { "Info": { "id": " 077dfbe0c7f0a8949960ef21333a2c11 ", "path": "$brick_path_1", "device": vg_z12…13", "node": “520bf6e83b4bba8fb5a992a6da6ef041", "volume": " 08f43b51ac546868c0d1a29e5f3921fb ", "size": $volume_size_bytes }, "TpSize": $volume_size_bytes, "PoolMetadataSize": 12288, "Pending": { "Id": "" }, "LvmThinPool": "tp_ 077dfbe0c7f0a8949960ef21333a2c11 ", "LvmLv": "", "SubType": 1 … "State": "online", "Info": { "zone": 1, "hostnames": { "manage": [ "“ ], "storage": [ "“ ] }, "cluster": "2e16d5adfb5eababeceb6719e5e808cd", "id": "520bf6e83b4bba8fb5a992a6da6ef041" }, "Devices": [ "0888196c04e9e5f7ad346ba7ec173c01"
  17. 17. How Heketi manage Gluster executor mock ssh kubernetes How manage Does not send any commands out to servers. Sends commands to real systems over ssh Send commands to k8s api Purpose Development and tests GlusterFS as separate servers Manage GlusterFS k8s pods /etc/heketi/heketi.json … "glusterfs": { "executor": "kubernetes", "db": "/var/lib/heketi/heketi.db" "kubeexec": { "host": "https://kubernetes.default.svc.cluster.local", "fstab": "/var/lib/heketi/fstab” …. Custom fstab for GlusterFS PODs
  18. 18. Possible Heketi database problems Problem Impact Solution We have a volume in GlusterFS, but have no in Heketi Lost control Add information about the lost volume to db We have a volume in Heketi, but have no in GlusterFS Inconsistent DB Delete volume from Heketi DB use heketi-cli or API Volume settings not same as in the Heketi DB Inconsistent DB Fix values in Heketi DB We replaced the physical device with GlusterFS bricks Storage isn’t health Recreate all volumes
  19. 19. If we replace a physical device Restore LV from the lvm archive. Ex.: /etc/lvm/archive/ Replacing a broken GlusterFS POD – it’s realy easy! Example: device=$(grep /dev/ $backup | sed -e s/'t'//g -e s/'#'//g | cut -d " " -f3) device_id=$(grep /dev/ $backup -B1 | head -n1 | cut -d " " -f3) pvcreate --uuid $device_id $device –norestorefile vgname=$(grep -E 'vg_.*{' $backup | cut -d " " -f1) vgcreate $vgname $device bricks=$(grep brick_ $backup | sed s/'t'//g | cut -d " " -f1) for brick in $bricks; do chunksize=256K; tp_name=…; poolmetadatasize=…; size=… lvcreate -qq --autobackup=y --poolmetadatasize $poolmetadatasize --chunksize $chunksize --size $size --thin $vgname/$tp_name --virtualsize $size --name $brick" mkfs.xfs -i size=512 -n size=8192 /dev/mapper/$vgname-$brick done Reset all bricks: gluster v status $volume | grep ${brickpath: -25} | grep " N " && gluster volume reset-brick $volume $brickpath start && gluster volume reset-brick $volume $brickpath $brickpath commit force But replacing only a broken device isn’t possible from Heketi. We need to do this manually: heketi-cli node add --zone=1 --cluster=$heketiClusterID --management-host-name=$newNodeAddress --storage-host-name=$newNodeAddress heketi-cli device add --name $deviceName --node $newNodeID heketi-cli node disable $oldNodeID heketi-cli node remove $oldNodeID heketi-cli device delete $oldDeviceID heketi-cli node delete $oldNodeID
  20. 20. Good advices - OpenShift docs (Gluster storage) - RedHat knowledge base (KB) No-Cost RHEL Developer Subscription: 3. You can use our scripts for checking and fixing Heketi and Gluster 2. Implementation Guide for IBM Blockchain Platform for Multicloud 1. Feel free to use RedHat resources: 4. RTFM :
  21. 21. Q&A Answer: 42
  22. 22. Thank you! Artem Romanchik Targetprocess, Inc