This document summarizes the history and development of the Gluster distributed storage system. It describes how Gluster began in Bangalore and has grown to support large scale deployments. Key developments included creating a unified global namespace, adding features through a stackable architecture, and developing templates to simplify common use cases. Gluster now provides a simple, scalable, and low-cost storage solution that is well suited for the growing data demands of cloud computing through its scale-out architecture and open source model.
1. Gluster: Where We've Been
AB Periasamy
Office of the CTO, Red Hat
John Mark Walker
Gluster Community Guy
2. Topics
The Big Idea
Humble beginnings
From Bangalore to Milpitas
Scale-out + Open source == WINNING
User-space, no metadata server, stackable
Cloud and commoditization
06/13/12
3. A Data Explosion!
74% == Unstructured data annual growth
63,000 PB == Scale-out storage in 2015
40% == storage-related expense for cloud
44x == Unstructured data volume growth by
2020
06/13/12
4. Conference Room US Head Office
06/13/12
Bengaluru Office Bengaluru Office
7. What Can You Store?
Media – Docs, Photos, Video
VM Filesystem – VM Disk Images
Big Data – Log Files, RFID Data
Objects – Long Tail Data
06/13/12
8. The big idea:
Storage should be
simple
Simple, scalable, low-cost
06/13/12
9. What is GlusterFS,
Really?
Gluster is a unified, distributed
storage system
DHT, stackable, POSIX, Swift, HDFS
06/13/12
10. Phase 1: Lego Kit for
Storage
“People who think that userspace filesystems
are realistic for anything but toys are just
misguided" – Linus Torvalds
Goal: create a global namespace
06/13/12
11. volume testvol-posix
type storage/posix
option directory /media/datastore
option volume-id 329e31c1-04cc-4386-8bb8-xxxx
end-volume
volume testvol-access-control
type features/access-control
subvolumes testvol-posix
end-volume
volume testvol-locks
type features/locks
subvolumes testvol-access-control
end-volume
volume testvol-io-threads
type performance/io-threads
subvolumes testvol-locks
end-volume
06/13/12
12. Versions 1.x – 2.x
Hand-crafted volume definition files
See examples
Simple configuration files
Faster than tape? It's good!
06/13/12
19. And now for something
completely different
Commoditization and the changing
economics of storage
Why we're winning
06/13/12
20. Simple Economics
Simplicity, scalability, less cost
Virtualized Multi-Tenant Automated Commoditized
Scale on Demand In the Cloud Scale Out Open Source
06/13/12
21. Simplicity Bias
FC, FCoE, iSCSI → HTTP, Sockets
Modified BSD OS → Linux / User Space /
C, Python & Java
Appliance based → Application based
06/13/12
23. Thank you!
AB Periasamy
Office of the CTO, Red Hat
ab@redhat.com
John Mark Walker
Gluster Community Guy
johnmark@redhat.com
Editor's Notes
Add examples where complexity has been bad - EMC, Cisco, Brocade et al. certification made business out of complexity - if too complicated, doesn't scale
Discuss approach – how GlusterFS is unique and different from other approaches - Lessons form GNU Hurd - user space distributed storage operating system - overcome some parts of the OS: implemented scheduler, POSIX locking, RDMA, MM, cf. JVM, python, etc. - no metadata separation
If you have a bunch of files, should be as simple as an FTP server - in user space, required FUSE, POSIX translator, NAS protocol, cluster translator
Learned about missing features Found the largest problem and wanted to solve it - patterns emerged - scalable unstructured data storage was the #1 problem people wanted to solve Had a clearer idea where we wanted to go – clear direction
Standalone NFS replacement Active-active replicated storage Scalable, distributed storage .. And then scalable, replicated distributed storage + other combos
Elastic features driven by cloud and virt usage - shared storage for virtual guests - flexible, self-service storage - elastic volume management became requirement - automated provisioning of storage w/ CLI (native NFS server? Or 3.2?)
Marker famework: - story of why it's necessary - backup of data in other locales - don't need entire snapshot - users wanted to continuous, unlimited replication - don't want sysadmin intervention – on-demand - queries FS to find what files have changed - manages queue, telling rsync exactly which files to change Inotify – doesn't scale, if daemon crashes, stops tracking changes - would have to write journaling feature to maintain change queue Geo-replication – can work on high-latency, flaky networks