At the Public Sector Red Hat Storage Days on 1/20/16 and 1/21/16, Jason Calloway walked attendees through the basics of scalable POSIX file systems in the cloud.
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Scalable POSIX File Systems in the Cloud
1. SCALABLE POSIX FILE SYSTEMS
IN THE CLOUD
Jason Callaway
@jasoncallaway | blog.jasoncallaway.com
January 2016
2. THE RED HAT STORAGE MISSION
To offer a unified, open software-defined
storage portfolio that delivers a range of data
services for next generation workloads
thereby accelerating the transition to modern
IT infrastructures.
3. Traditional Storage
Complex proprietary silos
Open, Software-Defined Storage
Standardized, unified,openplatforms
Custom GUI
Proprietary Software
Proprietary
Hardware
Standard
Computers
and Disks
Standard
Hardware
OpenSource
Software
Ceph Gluster +++
Control Plane (API, GUI)
ADMIN USER
THE FUTURE OF STORAGE
ADMIN
USER
ADMIN
USER
ADMIN
USER
Custom GUI
Proprietary Software
Proprietary
Hardware
Custom GUI
Proprietary Software
Proprietary
Hardware
5. A RISING TIDE
“By 2020, between 70-80% of unstructured data will be held on
lower-cost storage managed by SDS environments.”
“By 2019, 70% of existing storage array products
will also be available as software only versions”
“By 2016, server-based storage solutions will lower
storage hardware costs by 50% or more.”
Gartner: “IT Leaders Can Benefit From Disruptive Innovation in the Storage Industry”
Innovation Insight: Separating Hype From Hope for Software-Defined Storage
Innovation Insight: Separating Hype From Hope for Software-Defined Storage
Market size is projected to increase approximately 20%
year-over-year between 2015 and 2019.
2013 2014 2015 2016 2017 2018 2019
$1,349B
$1,195B
$1,029B
$859B
$706B
$592B
SDS-P MARKET SIZE BY SEGMENT
$457B
BlockStorage
File Storage
Object Storage
Hyperconverged
Source: IDC
Software-Defined Storage is leading a shift in the
global storage industry, with far-reaching
effects.
6. Open Software-Defined Storage is a fundamental
reimagining of how storage infrastructure works.
It provides substantial economic and operational
advantages, and it has quickly become ideally
suited for a growing number of use cases.
TODAY EMERGING FUTURE
Cloud
Infrastructure
Cloud
Native Apps
Analytics
Hyper-
Convergence
Containers
???
???
THE JOURNEY
8. THE RED HAT STORAGE PORTFOLIO
Ceph
management
OPENSOURCE
SOFTWARE
Gluster
management
Ceph
data services
Gluster
data services
STANDARD
HARDWARE
Share-nothing, scale-out
architecture provides durability
and adapts to changing demands
Self-managing and self-healing
features reduce operational overhead
Standards-based interfaces
and full APIs ease integration
with applications and systems
Supported by the
experts at Red Hat
9. GROWING INNOVATION COMMUNITIES
lOver 11M downloads in the last 12 months
lIncreased developmentvelocity, authorship,and
discussionhas resulted in rapid feature expansion.
lContributions fromIntel, SanDisk, SUSE,and DTAG.
lPresenting CephDays incities around the world and
quarterly virtual Ceph Developer Summitevents.
78 AUTHORS/mo
1500 COMMITS/mo
258 POSTERS/mo
41 AUTHORS/mo
259 COMMITS/mo
166 POSTERS/mo
10. SanDisk sells the InfiniFlashstorage arrays,
designed for use with Red Hat Ceph Storage.
Optimizations contributed by SanDisk deliver
high performance whichallow Ceph
customers to service new workloads.
Our relationship includes:
lEngineering and product collaboration
lCommunity thought leadership
PARTNER SOLUTIONS
All-flash arrays, optimized for Ceph
Supermicro's Red Hat Ceph Storage
optimized solutions offer durable,software-
defined,scale-outstorage platforms in
1U/2U/4U form factors and are designed to
maximize performance,density,and capacity
Customer can expect to see:
lReference architectures,validated for
performance,density,and capacity
lWhitepapers and datasheets that support
Red Hat Storage solutions
Systems designed with storage in mind
Through siliconinnovationand software
optimization, Intel pushes the envelope on
open, software-defined storage.A key
contributor,Intel recently donated significant
hardware to the Ceph project.
Inteldevelopment efforts have included:
lSSD and performance optimizations
lCephFS development
Accelerating software-defined storage
11. Version1.3 of Red Hat Ceph Storage is the firstmajor
release since joining the Red Hat Storage product
portfolio,and incorporates feedback from customers
who have deployed inproductionat large scale.
Areas of improvement:
lRobustness at scale
lPerformance tuning
lOperational efficiency
REFINEMENTS FOR
PETABYTE-SCALE OPERATORS
Optimized for large-scale deployments
Version3.1 of Red Hat Gluster Storage contains many new
features and capabilities aimed to bolster data protection,
performance,security,and client compatibility.
New capabilitiesinclude:
lErasure coding
lTiering
lBit Rot detection
lNVSv4 client support
Enhanced for flexibility and performance
13. Nimble file storage for petabyte-scale workloads
lMachine analytics with Splunk
lBig dataanalytics with Hadoop
TARGET USE CASES
Enterprise File Sharing
lMediaStreaming
lActive Archives
Analytics
Enterprise Virtualization
OVERVIEW:
RED HAT GLUSTER STORAGE
Purpose-built as a scale-out file store with a
straightforward architecture suitable for public,
private, and hybrid cloud
Simple to install and configure, with a minimal
hardware footprint
Offers mature NFS, SMB and HDFS interfaces
for enterprise use
Customer Highlight: Intuit
Intuit uses Red Hat Gluster Storage to provide
flexible, cost-effective storage for their industry-
leading financial offerings. Rich Media & Archival
14. OVERVIEW:
TERMINOLOGY
Brick: basic unit of storage, represented by an export directory on a server in the
trusted storage pool.
Cluster: a group of linked computers, working together closely thus in many
respects forming a single computer.
FUSE: Filesystem in Userspace (FUSE) is a loadable kernel module for Unix-like
computer operating systems that lets non-privileged users create their own file
systems without editing kernel code.
Geo-Replication: provides a continuous, asynchronous, andincremental replication
service from site to another over Local Area Networks (LANs), Wide Area Network
(WANs), and across the Internet.
15. OVERVIEW:
TERMINOLOGY
Metadata: defined as data providing information about one or more other pieces of
data. There is no special metadata storage concept in GlusterFS. The metadata is
stored with the file data itself.
Namespace: an abstract container or environment created to hold a logical
grouping of unique identifiers or symbols. Each Gluster volume exposes a single
namespace as a POSIX mount point that contains every file in the cluster.
Volume: a logical collection of bricks. Most of the gluster management operations
happen on the volume.
17. OVERVIEW:
VOLUME PERMUTATIONS
Distributed -- Namespace is distributed horizontally across n bricks
Replicated -- Namespace is synchronously replicated to an identical namespace
Striped -- Namespace stripes data across bricks
Dispersed -- Namespace uses Erasure Coding
No Redundancy Replicated (synchronous replication) Dispersed (erasure codes)
Distributed Distributed Distributed
Striped Striped
Distributed-Striped Distributed-Striped
Geo-replicated Geo-replicated Geo-replicated
18. Standard replicated back-ends are very durable, and can
recover very quickly, but they have an inherently large
capacity overhead.
Erasure coding back-ends reconstruct corrupted or lost
data by using information about the data stored
elsewhere in the system.
Providing failure protection with erasure coding
eliminates the need for RAID, consumes far less space
than replication, and can be appropriate for capacity-
optimized use cases.
ERASURE CODING
Storing more data with less hardware
OBJECT/FILE
1 2 3 4 X Y
ERASURE CODED POOL/VOLUME
STORAGE CLUSTER
Up to 75% reduction in TCO
19. Optimally, infrequently accessed data is served from less
expensive storage systems while frequently accessed
data can be served from faster, more expensive ones.
However, manually moving data between storage tiers
can be time-consuming and expensive.
Red Hat Gluster Storage 3.1 now supports automated
promotion and demotion of data between “hot” and
“cold” sub volumes based on frequency of access.
TIERING
Cost-effective flash acceleration
OBJECT/FILE
HOT SUBVOLUME
(FLASH,REPLICATED)
STORAGE CLUSTER
COLD SUBVOLUME
(ROTATIONAL, ERASURE CODED)
20. Bit rot detection is a mechanism that detects data
corruption resulting from silent hardware failures,
leading to deterioration in performance and integrity.
Red Hat Gluster Storage 3.1 provides a mechanism to
scan data periodically and detect bit-rot.
Using the SHA256 algorithm, checksums are computed
when files are accessed and compared against
previously stored values. If they do not match, an error is
logged for the storage admin.
BIT ROT DETECTION
Detection of silent data corruption
ADMIN
0
!!!
0 0 0 0 00 X
21. Using NFS-Ganesha, an NFS server implementation,
Red Hat Gluster Storage 3.1 provides client access
with simplified failover and failback in the case of a
node or network failure.
Supporting both NFSv3 and NFSv4 clients,
NFSGanesha introduces ACLs for additional
security, Kerberos authentication, and dynamic
export management.
SECURITY
Scalable and secure NFSv4 client support
CLIENT CLIENT CLIENT CLIENT
NFS NFS NFS NFS
NFS-GANESHA NFS-GANESHA
STORAGE CLUSTER
22. OVERVIEW:
MAXIMUMS
• 64 nodes per cluster
• 8 volumes per LVM RAID
• XFS max recommended size for brick
• 100 TB certified / 8 EB maximum on RHEL 6
• 500 TB certified / 8 EB maximum on RHEL 7
• 16 PB usable per distributed-replicated cluster
• (500 TB * 64 nodes / 2 replication factor)
• ~4 PB with EC2 16 TB EBS volumes
• http://blog.gluster.org/category/performance/
24. WHAT TOOL DO I USE?
Use Case Gluster Ceph
File
• NFS
• CIFS
• FUSE Native
Client
Works great!
Tech-preview
in June
OLTP-like Maybe Works great!
Block Nope Works great!
Object Works great! Works great!
Cloud Works great! Nope
Big Data Works great! Works great!
Geo-replication Works great! It’s complicated
25. ANALYTICS
Big Data analytics with Hadoop
CLOUD
INFRASTRUCTURE
RICH MEDIA
AND ARCHIVAL
SYNC AND
SHARE
ENTERPRISE
VIRTUALIZATION
Machine data analytics with Splunk
Virtual machine storage with OpenStack
Object storage for tenant applications
Cost-effective storage for rich media streaming
Active archives
File sync and share with ownCloud
Storage for conventional
virtualization with RHEV
FOCUSED SET OF USE CASES
26. In-place Hadoop analytics in a
POSIX compatible environment
HADOOP MAP REDUCE FRAMEWORK
lAllows the HortonworksData Platform 2.1 to be
deployed on Red Hat Gluster Storage
lHadoop toolscan operate on data in-place
lAccess to the Hadoop ecosystem of tools
lAccess to non-Hadoop analytics tools
lConsistent operating model: Hadoop can run directly
on Red Hat Gluster Storage nodes
lFlexible, unified enterprise big data repository
lBetter analytics (Hadoop and non-Hadoop)
lFamiliar POSIX-compatible file system and tools
lStart small, scale as big data needs grow
lMulti-volume support (HDFS is single-volume)
lUnified management (Hortonworks HDP Ambari
and Red Hat Gluster Storage)
FEATURES
BIG DATA ANALYTICS
Hadoop Distributed
File System
Red Hat Gluster
Storage Cluster
BENEFITS
27. High-performance, scale-out, online
cold storage for Splunk Enterprise
Hot/warm data optimized for performance
10s of TB on Splunkserver DAS
Cold optimized for cost, capacity and elasticity
Red Hat Storage Server on commodity x86 servers
lMultiple ingest options using NFS & FUSE
lExpand storage pools without incurring downtime
lSupport for both clustered and non-clustered
configurations
lRun high speed indexing and search on
lSplunk’s cold data store
lPay as you grow economicsfor Splunk cold data
lReduce ingestion time for data with
lstandard protocols
l“Always online”, fast, disk-based storage pools
provide constant access to historical data
MACHINE DATA ANALYTICS
FEATURES BENEFITS
28. Massively-scalable, flexible, and
cost-effective storage for image,
video, and audio content
Unstructuredimage, video,
and audio content
lSupport for multi-petabyte storage clusters on
commodityhardware
lErasure coding and replication for capacity-
optimized or performance-optimized pools
lSupport for standard file & object protocols
lSnapshots and replication capabilities for high
availability and disaster recovery
lProvides massive and linear scalability in on-
premise or cloud environments
lOffers robust data protection with an optimal
blend of price & performance
lStandard protocolsallow access to broadcast
content anywhere, on any device
lCost-effective, high performance storage for
on-demand rich media content
RICH MEDIA
Red Hat Gluster
Storage Cluster
Red Hat Ceph
Storage Cluster
FEATURES BENEFITS
29. Open source, capacity-optimized
archival storage on commodity
hardware
Unstructured
file data
lCache tiering to enable "temperature"-
based storage
lErasure coding to support archive and
cold storage use cases
lSupport for industry-standard file and
object access protocols
lStore data based on its access frequency
lStore data on premise or in a public or hybrid cloud
lAchieve durability while reducing raw capacity
requirements and limiting cost
lDeploy on industry-standard hardware
ACTIVE ARCHIVES
Red Hat Gluster
Storage Cluster
Red Hat Ceph
Storage Cluster
Unstructured
object data
Volume
backups
FEATURES BENEFITS
30. Powerful, software-defined, scale-
out, on-premise storage for file
sync and share with ownCloud
Web
browser
lSecure file sync and share with enterprise-grade
auditing and accounting
lCombined solution of Red Hat Gluster Storage,
ownCloud, HP ProLiant SL4550 Gen 8 servers
lDeployed on-premise, managed by internal IT
lAccess sync and share data from mobile devices,
desktop systems, web browsers
lSecure collaborationwith consumer-grade
ease of use
lLower risk by storing data on-premise
lConform to corporate data security and
compliance polices
lLower totalcost of ownership with standard,
high-density servers and open source
FILE SYNC AND SHARE
OWNCLOUD ENTERPRISE EDITION
Mobile
application
Desktop
OS
RED HAT GLUSTER STORAGE
FEATURES BENEFITS
31. The Journeyto Software-Defined Storage INTERNAL ONLY31
Scalable, reliable storage for
Red Hat Enterprise Virtualization
lReliably store virtual machine images in a
distributed Red Hat Gluster Storage volume
lManage storage through the RHEV-M console
lDeploy on standard hardware of choice
lSeamlessly grow and shrink storage
infrastructure when demand changes
lReduce operational complexities by
eliminating dependency on complex and
expensive SAN infrastructures
lDeploy efficiently on less expensive, easier to
provision, standard hardware
lAchieve centralized visibility and controlof
server and storage infrastructure
ENTERPRISE VIRTUALIZATION
FEATURES BENEFITS
34. MGMT
Support in the console for discovery, format, and creation of bricks based
on recommended best practices; an improved dashboard that shows vital
statistics of pools.
MGMTCORECORECORE
New support in the console for snapshotting and geo-replication features.
New features to allow creation of a tier of fast media (SSDs, Flash) that
accompanies slower media, supporting policy-based movement of data between
tiers and enhancing create/read/write performance for many small file workloads.
Ability to detect silent data corruption in files via signing and scrubbing,
enabling long term retention and archival of data without fear of “bit rot”.
Ability to schedule periodic execution of snapshots easily, without the
complexity of custom automation scripts.
Device management,
dashboard
Snapshots,
Geo-replication
Tiering
Bit rot detection
Snapshot
scheduling
These features were introduced in the most recent release of Red Hat Gluster Storage,
and are now supported by Red Hat.
CORECORE
Features that enable incremental, efficient backup of volumes using
standard commercial backup tools, providing time-savings over full-
volume backups.
Introduction of erasure coded volumes (dispersed volumes) that
provide cost-effective durability and increase usable capacity when
compared to standard RAID and replication.
Backup hooks
Erasure coding
DETAIL:
RED HAT GLUSTER STORAGE 3.1
35. DETAIL:
RED HAT GLUSTER STORAGE 3.1
PERF
Optimizations to enhance small file performance,
especially with small file create and write operations.
PERFSECURITYPROTOCOL
Optimizations that result in enhanced rebalance speed at large scale.
Introduction of the ability to operate with SELinux in enforcing
mode, increasing security across an entire deployment.
Support for data access via clustered, active-active NFSv4 endpoints,
based on the NFS-Ganesha project.
Enhancements to SMB 3 protocol negotiation, copy-data offload, and
support for in-flight data encryption [Sayan: what is copy-data offload?]
Small file
Rebalance
SELinux enforcing
mode
NFSv4
(multi-headed)
SMB 3 (subset of
features)
These features were introduced in the most recent release of Red Hat Gluster Storage,
and are now supported by Red Hat.
PROTOCOL