SlideShare a Scribd company logo
Dan Lambright1
Erasure Codes and
Storage Tiers on
Gluster
Dan Lambright
SA summit
Sep 23, 2014
Dan Lambright2
AGENDA
●
Why erasure codes (ec) in Gluster
●
How ec works
●
Brief peek at underlying mathematics
●
Storage tiering in gluster
●
Demo
●
“One more thing”
Dan Lambright3
Why erasure codes in gluster?
● Desire protection from double failure
● RAID6 controllers are expensive
● Imagine a 64 node volume
● Each brick on a separate bare metal machine
● Cost is 64 x $ for LSI MegaRaid controller
20K
=
Dan Lambright4
Why erasure codes in gluster?
● Triplication (3 way replication) is expensive
● Two redundant disks for every data disk
● 200% overhead! :(
Dan Lambright5
Erasure codes
● Store m disks worth of data on k disks (k>m)
● n redundant disks (k-m),
● can pick n to choose failure tolerance
● A generalization of RAID6
● Distributed across nodes
Dan Lambright6
Overhead analysis
● Can also consider mean time before failure
k total disks n how many
failures
admitted
m number of
data disks
Capacity
overhead
(n/k)
RAID level
3 1 2 33.33% 5
5 1 4 20% 5
6 2 4 33.33% 6
7 3 4 42.86% E
9 1 8 11.11% 5
10 2 8 20% 6
11 3 8 27.27% E
12 4 8 33.33% E
ERASURE CODES PRIMER
Dan Lambright8
ERASURE CODE TERMS
● m data disks
● n parity disks
● k total number disks = m+n
● Symbol – Smallest data unit. w bits.
● Typically w = 8 = a byte
● Chunk (aka fragment) – r symbols per disk
● Stripe – collection of m+n chunks across k disks
● Unit of manipulation for recovery
● Also known as a “slice”
Dan Lambright9
ERASURE CODE TERMS
●
r=6
m=4
n =2
k=6
w=1
symbol
fragment
“Stripe” of
6 fragments
011010
Dan Lambright10
Systematic
● m data chunks, n coding chunks
● (can stripe parity and data chunks on the same disk)
● Reads are simple, only decode on repairs
Slice 1
Slice 2
Slice 3
Dan Lambright11
Non-Systematic
● All k chunks in a stripe are coded
● Do not to distinguish data from code servers
● Encode/decode on writes and reads
Slice 1
Slice 2
Slice 3
Dan Lambright12
Encoding / Decoding Overhead
● Network RTT dominate the encode/decode overhead
●
Packages exist to implement the math
● Intel has fast routines for Inverse, dot product,
encoding, decoding, etc
● Jerasure library from academia
● Gluster's is purpose built and fast
GLUSTER IMPLEMENTATION
Dan Lambright14
GLUSTERFS “Disperse Volumes”
● Done by Datalab corp. by Xavier Hernandez.
● Use case : archiving medical records
● Developed over last 2 years
● Now part of gluster upstream
Dan Lambright15
CLI
Two new options have been added to the 'create' command of the cli interface:
gluster volume create <name> disperse <count> redundancy <count>
Disperse is “k” (total number volumes)
Redundancy is “n”
Dan Lambright16
“Disperse volumes” design choices
● The “symbols” are bytes: w = 8
● The fragment size r = 128
● Algorithm: Reed solomon
● Generator matrix: Vandermonde
● Non–systematic
● Encoding / decoding done on client side
● Modeled after AFR
● Concurrent writes must be processed in order
STORAGE TIERS
Dan Lambright18
Storage Tiers
● Different “subvolume” tiers presented as a single volume
● HDD, SSD, tape, “persistent memory”, etc.
● Plug-in policy describes how data moves between tiers
● V1 policy: Cache
● slow and fast tiers
● CLI to add/remove cache tier from existing volume
Dan Lambright19
Example: Erasure codes + SSD
● User sees one volume
● SSD “caches” ec data
Tiered volume
“cache”:
on SSD
ec
on HDD
Hot Cold
demote
promote
Dan Lambright20
Future : Data classification (DC)
● Add rules to storage graph
● Rule determines subvolume
● File name
● Attribute (size, content)
● Etc.
Filename =
*.lock ?`
Yes No
Secure /
Encrypted
HDD
Dan Lambright21
Future flexibility
● Many use cases
● Compliance
● Multi-tenancy
● Rack-aware placement (for performance)
● Policies described by language
● Arbitrary number of tiers, rules, subvolumes ..
● Template based
DEMO
promote
ONE MORE THING..
promote
Dan Lambright24
Bitrot
● A daemon that scans gluster volumes
● Finds corrupted data
● Digest associated with each file
● Alert / recover on mismatch
● “Plug-ins” to daemon may do other things..
● Tuning parameters to be non-intrusive to performance
● Encryption
● Compression
● Etc.
25
Do it!
● Learn the math:
● http://web.eecs.utk.edu/~plank/plank/papers/FAST-
2013-Tutorial.html
● Get the bits:
● https://forge.gluster.org/disperse
RED HAT CONFIDENTIAL – DO NOT DISTRIBUTE
Thank You!
● dlambright@redhat.com
● RHS:
www.redhat.com/storage/
● GlusterFS:
www.gluster.org
●
@Glusterorg
@RedHatStorage
Gluster
Red Hat Storage
Slides Available on Mojo

More Related Content

What's hot

Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
ScyllaDB
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
hugo lu
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
Alexei Starovoitov
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
Quentin Ambard
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
Adrian Huang
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
Sean Chittenden
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
Sage Weil
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
shimosawa
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
NAVER D2
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
mysqlops
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
Ceph Community
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
Adrian Huang
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Karan Singh
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
Abdul Manaf
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
Red_Hat_Storage
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
Priyanka Aash
 
Ceph
CephCeph
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph
병수 박
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 

What's hot (20)

Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1Scylla Summit 2022: Scylla 5.0 New Features, Part 1
Scylla Summit 2022: Scylla 5.0 New Features, Part 1
 
The linux networking architecture
The linux networking architectureThe linux networking architecture
The linux networking architecture
 
BPF - in-kernel virtual machine
BPF - in-kernel virtual machineBPF - in-kernel virtual machine
BPF - in-kernel virtual machine
 
Jvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & CassandraJvm tuning for low latency application & Cassandra
Jvm tuning for low latency application & Cassandra
 
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
qemu + gdb: The efficient way to understand/debug Linux kernel code/data stru...
 
PostgreSQL + ZFS best practices
PostgreSQL + ZFS best practicesPostgreSQL + ZFS best practices
PostgreSQL + ZFS best practices
 
A crash course in CRUSH
A crash course in CRUSHA crash course in CRUSH
A crash course in CRUSH
 
Linux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKBLinux Kernel Booting Process (1) - For NLKB
Linux Kernel Booting Process (1) - For NLKB
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Percona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replicationPercona Live 2012PPT: introduction-to-mysql-replication
Percona Live 2012PPT: introduction-to-mysql-replication
 
Ceph Month 2021: RADOS Update
Ceph Month 2021: RADOS UpdateCeph Month 2021: RADOS Update
Ceph Month 2021: RADOS Update
 
Linux Kernel - Virtual File System
Linux Kernel - Virtual File SystemLinux Kernel - Virtual File System
Linux Kernel - Virtual File System
 
Ceph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing GuideCeph Object Storage Reference Architecture Performance and Sizing Guide
Ceph Object Storage Reference Architecture Performance and Sizing Guide
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
 
MariaDB Galera Cluster
MariaDB Galera ClusterMariaDB Galera Cluster
MariaDB Galera Cluster
 
Ceph Block Devices: A Deep Dive
Ceph Block Devices:  A Deep DiveCeph Block Devices:  A Deep Dive
Ceph Block Devices: A Deep Dive
 
A Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm BasebandsA Journey into Hexagon: Dissecting Qualcomm Basebands
A Journey into Hexagon: Dissecting Qualcomm Basebands
 
Ceph
CephCeph
Ceph
 
Implementing distributed mclock in ceph
Implementing distributed mclock in cephImplementing distributed mclock in ceph
Implementing distributed mclock in ceph
 
High-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uringHigh-Performance Networking Using eBPF, XDP, and io_uring
High-Performance Networking Using eBPF, XDP, and io_uring
 

Similar to Erasure codes and storage tiers on gluster

Caching in
Caching inCaching in
Caching in
RichardWarburton
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
srisatish ambati
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
RichardWarburton
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
Saeid Zebardast
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
Gluster.org
 
Caching in
Caching inCaching in
Caching in
RichardWarburton
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
ScyllaDB
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
Petar Djekic
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
FromDual GmbH
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)
Datio Big Data
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
Haim Yadid
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
Scott Hernandez
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
Raghavendra Prabhu
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
Databricks
 
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
data://disrupted®
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Anne Nicolas
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
Tanel Poder
 
WiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-TreeWiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-Tree
Sveta Smirnova
 

Similar to Erasure codes and storage tiers on gluster (20)

Caching in
Caching inCaching in
Caching in
 
Cacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svccCacheconcurrencyconsistency cassandra svcc
Cacheconcurrencyconsistency cassandra svcc
 
Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)Caching in (DevoxxUK 2013)
Caching in (DevoxxUK 2013)
 
An Introduction to Apache Cassandra
An Introduction to Apache CassandraAn Introduction to Apache Cassandra
An Introduction to Apache Cassandra
 
Challenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan LambrightChallenges with Gluster and Persistent Memory with Dan Lambright
Challenges with Gluster and Persistent Memory with Dan Lambright
 
Caching in
Caching inCaching in
Caching in
 
cachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Cachingcachegrand: A Take on High Performance Caching
cachegrand: A Take on High Performance Caching
 
Elasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuningElasticsearch 101 - Cluster setup and tuning
Elasticsearch 101 - Cluster setup and tuning
 
UKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL TuningUKOUG 2011: Practical MySQL Tuning
UKOUG 2011: Practical MySQL Tuning
 
Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)Apache Spark II (SparkSQL)
Apache Spark II (SparkSQL)
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
“Show Me the Garbage!”, Garbage Collection a Friend or a Foe
 
MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)MongoDB Operational Best Practices (mongosf2012)
MongoDB Operational Best Practices (mongosf2012)
 
NUMA and Java Databases
NUMA and Java DatabasesNUMA and Java Databases
NUMA and Java Databases
 
TeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage DevicesTeraCache: Efficient Caching Over Fast Storage Devices
TeraCache: Efficient Caching Over Fast Storage Devices
 
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
Operation Unthinkable – Software Defined Storage @ Booking.com (Peter Buschman)
 
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
Kernel Recipes 2019 - Marvels of Memory Auto-configuration (SPD)
 
Low Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling ExamplesLow Level CPU Performance Profiling Examples
Low Level CPU Performance Profiling Examples
 
WiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-TreeWiredTiger In-Memory vs WiredTiger B-Tree
WiredTiger In-Memory vs WiredTiger B-Tree
 

More from Red_Hat_Storage

Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red_Hat_Storage
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red_Hat_Storage
 
Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance
Red_Hat_Storage
 
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red_Hat_Storage
 
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage MattersRed Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
Red_Hat_Storage
 
Red Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage MattersRed Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage Matters
Red_Hat_Storage
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
Red_Hat_Storage
 
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red_Hat_Storage
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red_Hat_Storage
 
Red Hat Storage Day Boston - Persistent Storage for Containers
Red Hat Storage Day Boston - Persistent Storage for Containers Red Hat Storage Day Boston - Persistent Storage for Containers
Red Hat Storage Day Boston - Persistent Storage for Containers
Red_Hat_Storage
 
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red_Hat_Storage
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red_Hat_Storage
 
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red_Hat_Storage
 
Red Hat Storage Day - When the Ceph Hits the Fan
Red Hat Storage Day -  When the Ceph Hits the FanRed Hat Storage Day -  When the Ceph Hits the Fan
Red Hat Storage Day - When the Ceph Hits the Fan
Red_Hat_Storage
 
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red_Hat_Storage
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red_Hat_Storage
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
Red_Hat_Storage
 
Red Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for ContainersRed Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for Containers
Red_Hat_Storage
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red_Hat_Storage
 
Red Hat Storage Day New York - Welcome Remarks
Red Hat Storage Day New York - Welcome Remarks Red Hat Storage Day New York - Welcome Remarks
Red Hat Storage Day New York - Welcome Remarks
Red_Hat_Storage
 

More from Red_Hat_Storage (20)

Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red Hat Storage Day Dallas - Storage for OpenShift Containers
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
 
Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance Red Hat Storage Day Dallas - Defiance of the Appliance
Red Hat Storage Day Dallas - Defiance of the Appliance
 
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
Red Hat Storage Day Dallas - Gluster Storage in Containerized Application
 
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage MattersRed Hat Storage Day Dallas - Why Software-defined Storage Matters
Red Hat Storage Day Dallas - Why Software-defined Storage Matters
 
Red Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage MattersRed Hat Storage Day Boston - Why Software-defined Storage Matters
Red Hat Storage Day Boston - Why Software-defined Storage Matters
 
Red Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super StorageRed Hat Storage Day Boston - Supermicro Super Storage
Red Hat Storage Day Boston - Supermicro Super Storage
 
Red Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph StorageRed Hat Storage Day Boston - OpenStack + Ceph Storage
Red Hat Storage Day Boston - OpenStack + Ceph Storage
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
 
Red Hat Storage Day Boston - Persistent Storage for Containers
Red Hat Storage Day Boston - Persistent Storage for Containers Red Hat Storage Day Boston - Persistent Storage for Containers
Red Hat Storage Day Boston - Persistent Storage for Containers
 
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
Red Hat Storage Day Boston - Red Hat Gluster Storage vs. Traditional Storage ...
 
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
Red Hat Storage Day New York - Red Hat Gluster Storage: Historical Tick Data ...
 
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
Red Hat Storage Day New York - QCT: Avoid the mess, deploy with a validated s...
 
Red Hat Storage Day - When the Ceph Hits the Fan
Red Hat Storage Day -  When the Ceph Hits the FanRed Hat Storage Day -  When the Ceph Hits the Fan
Red Hat Storage Day - When the Ceph Hits the Fan
 
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
Red Hat Storage Day New York - Penguin Computing Spotlight: Delivering Open S...
 
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
Red Hat Storage Day New York - Intel Unlocking Big Data Infrastructure Effici...
 
Red Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference ArchitecturesRed Hat Storage Day New York - New Reference Architectures
Red Hat Storage Day New York - New Reference Architectures
 
Red Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for ContainersRed Hat Storage Day New York - Persistent Storage for Containers
Red Hat Storage Day New York - Persistent Storage for Containers
 
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
Red Hat Storage Day New York -Performance Intensive Workloads with Samsung NV...
 
Red Hat Storage Day New York - Welcome Remarks
Red Hat Storage Day New York - Welcome Remarks Red Hat Storage Day New York - Welcome Remarks
Red Hat Storage Day New York - Welcome Remarks
 

Erasure codes and storage tiers on gluster

  • 1. Dan Lambright1 Erasure Codes and Storage Tiers on Gluster Dan Lambright SA summit Sep 23, 2014
  • 2. Dan Lambright2 AGENDA ● Why erasure codes (ec) in Gluster ● How ec works ● Brief peek at underlying mathematics ● Storage tiering in gluster ● Demo ● “One more thing”
  • 3. Dan Lambright3 Why erasure codes in gluster? ● Desire protection from double failure ● RAID6 controllers are expensive ● Imagine a 64 node volume ● Each brick on a separate bare metal machine ● Cost is 64 x $ for LSI MegaRaid controller 20K =
  • 4. Dan Lambright4 Why erasure codes in gluster? ● Triplication (3 way replication) is expensive ● Two redundant disks for every data disk ● 200% overhead! :(
  • 5. Dan Lambright5 Erasure codes ● Store m disks worth of data on k disks (k>m) ● n redundant disks (k-m), ● can pick n to choose failure tolerance ● A generalization of RAID6 ● Distributed across nodes
  • 6. Dan Lambright6 Overhead analysis ● Can also consider mean time before failure k total disks n how many failures admitted m number of data disks Capacity overhead (n/k) RAID level 3 1 2 33.33% 5 5 1 4 20% 5 6 2 4 33.33% 6 7 3 4 42.86% E 9 1 8 11.11% 5 10 2 8 20% 6 11 3 8 27.27% E 12 4 8 33.33% E
  • 8. Dan Lambright8 ERASURE CODE TERMS ● m data disks ● n parity disks ● k total number disks = m+n ● Symbol – Smallest data unit. w bits. ● Typically w = 8 = a byte ● Chunk (aka fragment) – r symbols per disk ● Stripe – collection of m+n chunks across k disks ● Unit of manipulation for recovery ● Also known as a “slice”
  • 9. Dan Lambright9 ERASURE CODE TERMS ● r=6 m=4 n =2 k=6 w=1 symbol fragment “Stripe” of 6 fragments 011010
  • 10. Dan Lambright10 Systematic ● m data chunks, n coding chunks ● (can stripe parity and data chunks on the same disk) ● Reads are simple, only decode on repairs Slice 1 Slice 2 Slice 3
  • 11. Dan Lambright11 Non-Systematic ● All k chunks in a stripe are coded ● Do not to distinguish data from code servers ● Encode/decode on writes and reads Slice 1 Slice 2 Slice 3
  • 12. Dan Lambright12 Encoding / Decoding Overhead ● Network RTT dominate the encode/decode overhead ● Packages exist to implement the math ● Intel has fast routines for Inverse, dot product, encoding, decoding, etc ● Jerasure library from academia ● Gluster's is purpose built and fast
  • 14. Dan Lambright14 GLUSTERFS “Disperse Volumes” ● Done by Datalab corp. by Xavier Hernandez. ● Use case : archiving medical records ● Developed over last 2 years ● Now part of gluster upstream
  • 15. Dan Lambright15 CLI Two new options have been added to the 'create' command of the cli interface: gluster volume create <name> disperse <count> redundancy <count> Disperse is “k” (total number volumes) Redundancy is “n”
  • 16. Dan Lambright16 “Disperse volumes” design choices ● The “symbols” are bytes: w = 8 ● The fragment size r = 128 ● Algorithm: Reed solomon ● Generator matrix: Vandermonde ● Non–systematic ● Encoding / decoding done on client side ● Modeled after AFR ● Concurrent writes must be processed in order
  • 18. Dan Lambright18 Storage Tiers ● Different “subvolume” tiers presented as a single volume ● HDD, SSD, tape, “persistent memory”, etc. ● Plug-in policy describes how data moves between tiers ● V1 policy: Cache ● slow and fast tiers ● CLI to add/remove cache tier from existing volume
  • 19. Dan Lambright19 Example: Erasure codes + SSD ● User sees one volume ● SSD “caches” ec data Tiered volume “cache”: on SSD ec on HDD Hot Cold demote promote
  • 20. Dan Lambright20 Future : Data classification (DC) ● Add rules to storage graph ● Rule determines subvolume ● File name ● Attribute (size, content) ● Etc. Filename = *.lock ?` Yes No Secure / Encrypted HDD
  • 21. Dan Lambright21 Future flexibility ● Many use cases ● Compliance ● Multi-tenancy ● Rack-aware placement (for performance) ● Policies described by language ● Arbitrary number of tiers, rules, subvolumes .. ● Template based
  • 24. Dan Lambright24 Bitrot ● A daemon that scans gluster volumes ● Finds corrupted data ● Digest associated with each file ● Alert / recover on mismatch ● “Plug-ins” to daemon may do other things.. ● Tuning parameters to be non-intrusive to performance ● Encryption ● Compression ● Etc.
  • 25. 25 Do it! ● Learn the math: ● http://web.eecs.utk.edu/~plank/plank/papers/FAST- 2013-Tutorial.html ● Get the bits: ● https://forge.gluster.org/disperse
  • 26. RED HAT CONFIDENTIAL – DO NOT DISTRIBUTE Thank You! ● dlambright@redhat.com ● RHS: www.redhat.com/storage/ ● GlusterFS: www.gluster.org ● @Glusterorg @RedHatStorage Gluster Red Hat Storage Slides Available on Mojo