In this Introduction to GlusterFS webinar, introduction and review of the GlusterFS architecture and key functionalities. Learn how GlusterFS is deployed in the datacenter, in the cloud, or between the two. We’ll also cover a brief update on GlusterFS v3.3 which is currently in beta.
2. Today’s Speakers
Heather Wellington Tom Trainer
Marketing Manager Director
Gluster, Inc. Product Marketing
Gluster, Inc.
A Better Way To Do Storage 2
3. History of Gluster
How it all started
– Backgrounds in high performance, clustered computing
– Working at Lawrence Livermore National Labs
• AB Periasamy & Hitesh Chellani design “Thunder”
• One of the worlds fastest super computers
• On Intel commodity hardware
• Solved filesystem scalability and performance limitations
– Large customer in oil & gas persuaded them to focus on storage
– Gluster founded by Hitesh & AB to bring technology to market
Thunder
Result: award winning technology
A Better Way To Do Storage 3
4. What is the Gluster File System?
Scale-out file storage software for
– Network Attached Storage (NAS)
– Object
– Big Data / Analytics
GlusterFS provides
– Flexibility to deploy in ANY environment
– High availability
– Unified files and objects
rd
– File system for Apache Hadoop August 23
– Scalability to Petabytes & beyond
– Linearly scalable performance
– Superior storage economics
A Better Way To Do Storage 4
5. GlusterFS Architecture Design Goals
Innovation
– Eliminate metadata
– Dramatically improve time
– Unify files and objects
Elasticity
– Flexibility adapt to growth/reduction
– Add, delete volumes & users
– Without disruption
Scale linearly
– Multiple dimensions
• Performance
Performance
• Capacity
– Aggregated resources
Simplicity
Capacity – Ease of management
– No complex Kernel patches
– Run in user space
A Better Way To Do Storage 5
6. Key Differentiators
Software only
Open source
Modular, stackable storage OS architecture
Data stored in native formats
No metadata – Elastic hashing
Unified files and objects
Apache Hadoop file system compatible replacement
Filesystem runs in user space
Virtual Machine (VM) virtual motion enabler
A Better Way To Do Storage 6
7. Software Only - Future Proofing Storage
Hardware agnostic
Superior storage economics & flexibility
– Data center / private cloud use commodity hardware
– Public cloud – i.e. AWS, RackSpace, GoGrid, Nimbula – pay for only what you need
No lock-in
– Hardware vendors-at purchase time or in the future
– Any Cloud – Public, private, and hybrid
– Performance, capacity, or availability levels
– GlusterFS – not proprietary, files are stored in native formats (i.e. EXT4)
A Better Way To Do Storage 7
8. Open Source
200,000+ downloads
Global Adoption – ~16,000 /month
550+ registered deployments
– 45 countries
2,500+ registered users
– Mailing lists, Forums, etc.
Active community
– Diverse testing environments
– Bugs identification and fixes
– Code contributions
Member of broader ecosystem
– OpenStack, Linux Foundation, Open
Virtualization Alliance
A Better Way To Do Storage 8
9. Elastic Hashing
No metadata server
An algorithmic approach
– Unique hash tag for each file stored
– Tags stored within the file system
– Rapid file read – low latency
Traditional Central Metadata Server
Innovative Elastic Approach
A Better Way To Do Storage Traditional Distributed Metadata Server 9
10. A Standard Gluster Deployment
Standard clients running
Clients/Apps Clients/Apps Clients/Apps
standard apps
IP Network Over any standard IP
network
Access to application data,
Gluster Global Namespace (HTTP, NFS, CIFS, Gluster Native) as files and folders and or
objects, in global
Application Data Objects VMs ApacheTM HadoopTM
namespace, using a variety
of standard protocols
VMDK VMDK
Stored in a commoditized,
virtual storage pool virtualized, scale-out,
centrally managed pool of
DAS, SAN, NAS
A Better Way To Do Storage 10
11. Unified On-premise, Private and Public Cloud Storage
Client/Apps Client/Apps Client/Apps
Client/Apps
Client/Apps Client/Apps
Client/Apps Client/Apps
Client/Apps
IP Network
Gluster Global Namespace
Data Center / Private Cloud Public Cloud
Replication
A Better Way To Do Storage 11
12. Deployment Scenarios
Common Solutions Built on GlusterFS
Media serving (CDN)
Large scale file storage
Tier 2 & 3 archive
File sharing
High Performance Computing (HPC) storage
IaaS storage layer
Disaster recovery
Backup & restore
Private cloud
A Better Way To Do Storage 12
13. Introducing GlusterFS 3.3 Beta 1
Next generation file and object storage
– The first system for data storage that enables you to store and access data as
an object and as a file
– Flexible and powerful, it simplifies access and management of data
– Eases migration of legacy, file-based applications to object storage for use in
the cloud
Public beta availability: July 20, 2011
– Broad community testing and participation
– Selected enterprise customer engagements
A Better Way To Do Storage 13
14. The “Traditional” Unified Approach
Files Objects DB’s
“VNX reminds me of my old VHS, DVD and cable box….
….one thing fails and I’m blown out of the water.” NAS Object Block
Beta Customer , 2011
Carved Up
Proprietary Storage Pool
Bolt-on hardware approach
– Combined hardware raises costs
– Higher TCO Traditional Monolithic Hardware
– Paying for what you many not need Bolt-on Approach
(i.e. EMC VNX)
Increased risk
– Common hardware elements can fail
• Power supplies
• Fans
• Cabling…lots of cabling
A Better Way To Do Storage 14
15. Software Approach to File and Object
High performance storage across heterogeneous server environments
Window Access
– Improves Windows performance
– Uses HTTP, not slower CIFS
– We will still support SAMBA
Object Storage
– API
– Internet Protocol (IP)
– ResTFul
– Get/Put
– Buckets
– Objects seen as files
Standards based
– Amazon S3 ReSTFul interface
compatible
– Access data as objects and a NAS
Network Attached Storage (NAS)
interface to access files (NFS, CIFS, – NFS / CIFS / GlusterFS
GlusterFS) – POSIX compliant
– Access files within objects
A Better Way To Do Storage 15
16. GlusterFS 3.3 Unified File & Object Storage
Widely deployable and extremely flexible
– On-premise, virtualized and in public and private clouds
– Deep unification of file and object data storage
• Not just unified at the management layer
• Not a bolt-together hardware product
– Access data within objects as files
– Compatible with Amazon Web Services
• S3
• Create S3 on EC2 and EBS
– Back up objects from the data center to AWS
– Enable S3 functionality in the data center
– Built to run on commodity hardware
A Better Way To Do Storage 16
17. GlusterFS 3.3 Easing & Accelerating Legacy App Migration
Unified file & object storage accelerates legacy app migration to the cloud
Gluster FS 3.3
Object Storage
Data Center, Virtual
Public, Private Cloud
Object
Info
Enterprise Apps / Enterprise Data Center
Enables cloudification of applications
– Removes remaining storage hurdles related to file only
access
– Allows for gradual app migration to the cloud
– Enables moves to both private and public cloud
infrastructures
A Better Way To Do Storage 17
18. Gluster 3.3 Unified File & Object – Use Cases
S3
Data Center: S3 in house &/ integrate with S3
Data Center IaaS: Deliver S3 to clients
– Take control of cloud services
– Reduce AWS S3 costs
– Deliver S3 like global services in-house
– Legacy application migration
IaaS
– Deliver File and S3 Services
– Unified file and object
• Competitive differentiator
– Drastically reduce storage costs Traditional Data Center
Private Cloud
– Increase offerings, revenues and margins
IaaS
A Better Way To Do Storage 18
19. Introducing GlusterFS Compatibility for Apache Hadoop
Enhancement to GlusterFS providing a new file system
option for Apache Hadoop
– Proven scale-out storage solution provides simultaneous file and object access within the
Hadoop
– Introduces a 4th storage option for Hadoop (HDFS, local disk, Kosmos)
Included in GlusterFS 3.3 beta 2 available August 23, 2011
– Community beta for testing and participation
– Select enterprise customer engagements
Requirements driven by community and customer requests
– “Eliminate the 64MB fixed block size imposed by HDFS”
– “Eliminate the centralized metadata server”
– “Give us NAS” (via POSIX compliance)
Benefits
– Out of the box compatibility with MapReduce applications, no rewrite required
– Enables organizations to unify data storage
– Flexible and powerful, it simplifies access and management of data
A Better Way To Do Storage 19
20. Seamless Integration for Hadoop Deployments
Metadata
Server NameNode
MapReduce MapReduce MapReduce MapReduce MapReduce
HDFS HDFS HDFS HDFS GlusterFS GlusterFS
GlusterFS can co-exist HDFS
Does NOT use the NameNode metadata server
Built using the Hadoop file system API
– Requires simple configuration file changes
– C Lib Gluster client enable Gluster direct access
– Java Client
• JNI interface
– gluster_hadoop.jar
• Provides Java binding for Hadoop compatibility
A Better Way To Do Storage 20
21. Seamless Integration for Hadoop Deployments
Metadata
Server NameNode
MapReduce MapReduce MapReduce MapReduce MapReduce
GlusterFS GlusterFS GlusterFS HDFS GlusterFS GlusterFS
Co-exists or replace HDFS
Ultimately eliminate the need for the NameNode
Faster access times – faster filesystem
All the features and benefits of GlusterFS
A Better Way To Do Storage 21
22. Seamless Integration for Hadoop Deployments
Metadata
Server NameNode
MapReduce MapReduce MapReduce MapReduce MapReduce
GlusterFS GlusterFS GlusterFS GlusterFS GlusterFS
NameNode metadata server eliminated
Faster access times – faster filesystem
All the features and benefits of GlusterFS
A Better Way To Do Storage 22
23. Why It’s Different?
No metadata server
– No single point of failure, automated self heal and failover
– No performance bottleneck on data lookups for fast file access
Built in replication
– Synchronous for inter-node replication
– Asynchronous for geo-replication
No block size restrictions
– Ideal for small and large files
POSIX compliant file system
– Out of the box NFS, CIFS and Gluster native access
Expanded data access options
– File and object access to data
– Access files from your object interface and access data within objects as files
– File based applications can access data without modification
Reduces requirement for replicated files from 3 to 2
– 33% capacity savings
A Better Way To Do Storage 23
24. Potential Uses for GlusterFS and Hadoop
Simplify and unify storage deployments
– Centralized data store providing access to more applications
Provide users with file level access to data
– Users can easily brose data using off the shelf tools
Enable legacy applications to access data via NFS
– Analytic apps can access data without modification
Enable object base access to data
– Modern applications can use object based access to data
A Better Way To Do Storage 24
25. Filesystem Runs in User Space
User Space Not tied to kernel
GlusterFS
No reassemblies
Server
(CPU/Mem)
Kernel
Independence
1 TB 1 TB
1 TB 1 TB
1 TB 1 TB
1 TB 1 TB
1 TB 1 TB
1 TB 1 TB
1 TB 1 TB
A Better Way To Do Storage 25
26. The Gluster Connector for OpenStack – July 27, 2011
Enables GlusterFS to be the underlying file system
Connects GlusterFS to Xen and KVM hypervisor
– Unified File and Object storage OpenStack with Gluster
– Highly-available, scale-out NAS
– Alternative to SWIFT
Mobile Apps. Web Clients. Enterprise Software Ecosystem
API Layer
OpenStack Prior to Gluster
…
Compute
Unified File &
Object Storage
SWIFT
OpenStack Imaging Services
A Better Way To Do Storage 26
27. The Gluster Connector for OpenStack – July 27, 2011
Connector enables GlusterFS to be chosen as the filesystem
– Provides:
• Unified File and Object storage
• Highly scalable NAS
• High Availability – synchronous and asynchronous replication
• Preferred, scalable alternative to SWIFT
• Virtual motion of virtual machines (a.k.a. vmotion)
VM VM VM VM VM VM VM VM
Hypervisor Hypervisor
GlusterFS GlusterFS
Server Server
(CPU/Mem) (CPU/Mem)
Virtual storage pool
A Better Way To Do Storage 27
28. Pandora Internet Radio
Problem
• Explosive user & title growth
• As many as 12 file formats for each song
• „Hot‟ content and long tail
Solution
• Three data centers, each with a six-node
GlusterFS cluster
• 1.2 PB of audio served • Replication for high availability
per week • 250+ TB total capacity
• 13 million files Benefits
• Over 50 GB/sec peak • Easily scale capacity
traffic • Centralized management; one administrator
to manage day-to-day operations
• No changes to application
• Higher reliability
A Better Way To Do Storage 28
29. Brightcove
Problem
• Cloud-based online video platform
• Explosive customer & title growth
• Massive video in multiple locations
• Costs rising, esp. with HD formats
Solution
• Complete scale-out based on commodity
DAS/JBOD
• Replication for high availability
• Over 1 PB currently in
• 1PB total capacity
Gluster
• Separate 4 PB project Benefits
in the works • Easily scale capacity
• Centralized management; one administrator
to manage day-to-day operations
• Higher reliability
• Path to multi-site
A Better Way To Do Storage 29
30. Cincinnati Bell Technology Solutions
Problem
• Host a dedicated enterprise cloud solution
• Large scale VMware environment
• Need high availability
Solution
• Large scale VM
• Gluster for VM storage, NFS to clients
storage
• SAS drives on back-end
• Low cost service • Replication for high availability
delivery for enterprise
customer Benefits
• Drastic reduction in • Storage provisioning from 6 wks to 15 min.
provisioning time • Vendor agnostic storage
• Low cost of service delivery
• Elastic growth
A Better Way To Do Storage 30
31. Partners Healthcare
Private Cloud: Centralized Storage as a Service
Problem
• Capacity growth from 144TB to 1+PB
• Multiple distributed users/departments
• Multi OS access - Windows, Linux and Unix
Solution
• Over 500 TB • GlusterFS Cluster
• 9 Sun “Thumper” • Solaris/ZFS/x4500 w/ InfiniBand
systems in cluster • Native CIFS/ NFS access
Benefits
• Capacity on demand / pay as you grow
• Centralized management
• Higher reliability
• OPEX decreased by 10X
A Better Way To Do Storage 31
32. Gluster Enterprise Deployment Options
On-premise/datacenter
Storage Software Appliance
– Deploy on bare metal
– Any hardware on Red Hat Hardware HCL
Virtualization/private cloud
Virtual Machines
– Deployable on the leading virtual machines
Public cloud
Amazon Web Services (AWS)
– Runs within Amazon Machine Image (AMI)
RightScale Cloud Management
– GlusterFS managed via a RightScale ServerTemplate
– Deployable via the RightScale Cloud Management Dashboard
GoGrid Cloud
– Gluster Server Image (GSI) for scale-out NAS on GoGrid cloud
A Better Way To Do Storage 32
34. Summary
GlusterFS is scale-out storage
– NAS
– Object
– Big Data / Analytics
Flexibility, scalability, superior economics
OpenStack cloud
– Unified file and object storage
– Virtual machine (VM) virtual motion
Innovative architecture provides a better way to do
storage
A Better Way To Do Storage 34
35. Questions and Answers
Your turn - ask our experts
Try Gluster for free here: http://www.gluster.com/trybuy/
Additional resources here: http://www.gluster.com/products/resources/
Join the community: http://www.gluster.org/
Follow on twitter: @gluster.
Read our blog: http://blog.gluster.com/
Contact us at: info@gluster.com or 1-800-805-5215
A Better Way To Do Storage 35