SlideShare a Scribd company logo
Roberto Franchini 
franchini@celi.it 
Codemotion Milano 
29/11/2014
GlusterFS 
A scalable distributed 
file system
whoami(1) 
15 years of experience, proud to be a programmer 
Writes software for information extraction, nlp, opinion mining 
(@scale ), and a lot of other buzzwords 
Implements scalable architectures 
Plays with servers 
Member of the JUG-Torino coordination team 
franchini@celi.it 
http://www.celi.it http://www.blogmeter.it 
github.com/robfrank github.com/uim-celi 
twitter.com/robfrankie linkedin.com/in/robfrank
The problem 
Identify a distributed and scalable 
file system 
for today's and tomorrow's 
Big Data
Once upon a time 
2008: One nfs share 
1,5TB ought to be enough for anybody 
2010: Herd of shares 
(1,5TB x N) ought to be enough for anybody 
Nobody couldn’t stop the data flood 
It was the time for something new
Requirements 
Can be enlarged on demand 
No dedicated HW 
OS is preferred and trusted 
No specialized API 
No specialized Kernel 
POSIX compliance 
Zilions of big and small files 
No NAS or SAN (€€€€€)
Clustered Scale-out General Purpose Storage 
Platform 
− POSIX-y Distributed File System 
− ...and so much more 
Built on commodity systems 
− x86_64 Linux ++ 
− POSIX filesystems underneath (XFS, 
EXT4) 
No central metadata Server (NO SPOF) 
Modular architecture for scale and functionality
Common 
use cases 
Large Scale File Server 
Media / Content Distribution Network (CDN) 
Backup / Archive / Disaster Recovery (DR) 
High Performance Computing (HPC) 
Infrastructure as a Service (IaaS) storage layer 
Database offload (blobs) 
Unified Object Store + File Access
Features 
ACL and Quota support 
Fault-tolerance 
Peer to peer 
Self-healing 
Fast setup up 
Enlarge on demand 
Shrink on demand 
Snapshot 
On premise phisical or virtual 
On cloud
Architecture
Architecture 
Peer / Node 
− cluster servers (glusterfs server) 
− Runs the gluster daemons and participates in volumes 
Brick 
− A filesystem mountpoint on servers 
− A unit of storage used as a capacity building block
Bricks on a node
Architecture 
Translator 
− Logic between bricks or subvolume that generate a 
subvolume with certain characteristic 
− distribute, replica, stripe are special translators to 
generate simil-RAID configuration 
− perfomance translators 
Volume 
− Bricks combined and passed through translators 
− Ultimately, what's presented to the end user
Volume
Volume types
Distributed 
The default configuration 
Files “evenly” spread across bricks 
Similar to file-level RAID 0 
Server/Disk failure could be catastrophic
Distributed
Replicated 
Files written synchronously to replica peers 
Files read synchronously, 
but ultimately serviced by the first responder 
Similar to file-level RAID 1
Replicated
Distributed + replicated 
Distribued + replicated 
Similar to file-level RAID 10 
Most used layout
Distributed replicated
Striped 
Individual files split among bricks (sparse files) 
Similar to block-level RAID 0 
Limited Use Cases 
HPC Pre/Post Processing 
File size exceeds brick size
Striped
Moving parts
Components 
glusterd 
Management daemon 
One instance on each GlusterFS server 
Interfaced through gluster CLI 
glusterfsd 
GlusterFS brick daemon 
One process for each brick on each server 
Managed by glusterd
Components 
glusterfs 
Volume service daemon 
One process for each volume service 
NFS server, FUSE client, Self-Heal, Quota, ... 
mount.glusterfs 
FUSE native client mount extension 
gluster 
Gluster Console Manager (CLI)
Clients
Clients: native 
FUSE kernel module allows the filesystem to be built and 
operated entirely in userspace 
Specify mount to any GlusterFS server 
Native Client fetches volfile from mount server, then 
communicates directly with all nodes to access data 
Recommended for high concurrency and high write 
performance 
Load is inherently balanced across distributed volumes
Clients:NFS 
Standard NFS v3 clients 
Standard automounter is supported 
Mount to any server, or use a load balancer 
GlusterFS NFS server includes Network Lock Manager 
(NLM) to synchronize locks across clients 
Better performance for reading many small files from a 
single client 
Load balancing must be managed externally
Clients: libgfapi 
Introduced with GlusterFS 3.4 
User-space library for accessing data in GlusterFS 
Filesystem-like API 
Runs in application process 
no FUSE, no copies, no context switches 
...but same volfiles, translators, etc.
Clients: SMB/CIFS 
In GlusterFS 3.4 – Samba + libgfapi 
No need for local native client mount & re-export 
Significant performance improvements with FUSE 
removed from the equation 
Must be setup on each server you wish to connect to via 
CIFS 
CTDB is required for Samba clustering
Clients: HDFS 
Access data within and outside of Hadoop 
No HDFS name node single point of failure / bottleneck 
Seamless replacement for HDFS 
Scales with the massive growth of big data
Scalability
Under the hood 
Elastic Hash Algorithm 
No central metadata 
No Performance Bottleneck 
Eliminates risk scenarios 
Location hashed intelligently on filename 
Unique identifiers (GFID), similar to md5sum
Scalability 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
Scale out performance and availability 
Scale out capacitry
Scalability 
Add disks to servers to increase storage size 
Add servers to increase bandwidth and storage size 
Add servers to increase availability (replica factor)
What we do with glusterFS
What we do with GFS 
Daily production of more than 10GB of Lucene inverted 
indexes stored on glusterFS 
more than 200GB/month 
Search stored indexes to extract different sets of 
documents for every customers 
YES: we open indexes directly on storage 
(it's POSIX!!!)
2010: first installation 
Version 3.0.x 
8 (not dedicated) servers 
Distributed replicated 
No bound on brick size (!!!!) 
Ca 4TB avaliable 
NOTE: stuck to 3.0.x until 2012 due to problems on 3.1 and 
3.2 series, then RH acquired gluster (RH Storage)
2012: (little) cluster 
New installation, version 3.3.2 
4TB available on 8 servers (DELL c5000) 
still not dedicated 
1 brick per server limited to 1TB 
2TB-raid 1 on each server 
Still in production
2012: enlarge 
New installation, upgrade to 3.3.x 
6TB available on 12 servers (still not dedicated) 
Enlarged to 9TB on 18 servers 
Bricks size bounded AND unbounded
2013: fail 
18 not dedicated servers: too much 
18 bricks of different sizes 
2 big down due to bricks out of space 
Didn’t restart after a move 
but… 
All data were recovered 
(files are scattered on bricks, read from them!)
2014: consolidate 
2 dedicated servers 
12 x 3TB SAS raid6 
4 bricks per server 
28 TB available 
distributed replicated 
4x1Gb bonded NIC 
ca 40 clients (FUSE) (other 
servers)
Consolidate 
brick 1 
brick 2 
brick 3 
brick 4 
Gluster Server 1 
brick 1 
brick 2 
brick 3 
brick 4 
Gluster Server 2
Scale up 
brick 11 
brick 12 
brick 13 
brick 31 
Gluster Server 1 
brick 21 
brick 22 
brick 32 
brick 24 
Gluster Server 2 
brick 31 
brick 32 
brick 23 
brick 14 
Gluster Server 3
Do 
Dedicated server (phisical or virtual) 
RAID 6 or RAID 10 (with small files) 
Multiple bricks of same size 
Plan to scale
Do not 
Multi purpose server 
Bricks of different size 
Very small files 
Write to bricks
Some raw tests 
read 
Total transferred file size: 23.10G bytes 
43.46M bytes/sec 
write 
Total transferred file size: 23.10G bytes 
38.53M bytes/sec
Raw tests 
NOTE: ran in production under heavy load, no 
clean test room
Resources 
http://www.gluster.org/ 
https://access.redhat.com/documentation/en- 
US/Red_Hat_Storage/ 
https://github.com/gluster 
http://www.redhat.com/products/storage-server/ 
http://joejulian.name/blog/category/glusterfs/ 
http://jread.us/2013/06/one-petabyte-red-hat-storage-and-glusterfs- 
project-overview/
Thank you!
Roberto Franchini 
franchini@celi.it

More Related Content

What's hot

Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Sean Cohen
 
Graylog for open stack 3 steps to know why
Graylog for open stack    3 steps to know whyGraylog for open stack    3 steps to know why
Graylog for open stack 3 steps to know why
Vietnam Open Infrastructure User Group
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
Glenn K. Lockwood
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
Altinity Ltd
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
haroonm
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
ScyllaDB
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
Vigen Sahakyan
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
ScyllaDB
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
Gluster.org
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
Ceph Community
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
Jonathan Symonds
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Mikrotik metarouter
Mikrotik metarouterMikrotik metarouter
Mikrotik metarouter
Achmad Mardiansyah
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
mumrah
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
Alexey Lesovsky
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
Ceph Community
 
Software Development in Uncertain Times (VoxxedDays Athens 2022)
Software Development in Uncertain Times (VoxxedDays Athens 2022)Software Development in Uncertain Times (VoxxedDays Athens 2022)
Software Development in Uncertain Times (VoxxedDays Athens 2022)
Michail Argyriou
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
Alexander Kukushkin
 
Patroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloudPatroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloud
Lucio Grenzi
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
asihan
 

What's hot (20)

Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
Storage 101: Rook and Ceph - Open Infrastructure Denver 2019
 
Graylog for open stack 3 steps to know why
Graylog for open stack    3 steps to know whyGraylog for open stack    3 steps to know why
Graylog for open stack 3 steps to know why
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
Materialize: a platform for changing data
Materialize: a platform for changing dataMaterialize: a platform for changing data
Materialize: a platform for changing data
 
PostgreSQL HA
PostgreSQL   HAPostgreSQL   HA
PostgreSQL HA
 
Seastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for CephSeastore: Next Generation Backing Store for Ceph
Seastore: Next Generation Backing Store for Ceph
 
Hadoop HDFS
Hadoop HDFSHadoop HDFS
Hadoop HDFS
 
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
Database Performance at Scale Masterclass: Workload Characteristics by Felipe...
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions Reference Architecture: Architecting Ceph Storage Solutions
Reference Architecture: Architecting Ceph Storage Solutions
 
MinIO January 2020 Briefing
MinIO January 2020 BriefingMinIO January 2020 Briefing
MinIO January 2020 Briefing
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Mikrotik metarouter
Mikrotik metarouterMikrotik metarouter
Mikrotik metarouter
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
PostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication CheatsheetPostgreSQL Streaming Replication Cheatsheet
PostgreSQL Streaming Replication Cheatsheet
 
Ceph RBD Update - June 2021
Ceph RBD Update - June 2021Ceph RBD Update - June 2021
Ceph RBD Update - June 2021
 
Software Development in Uncertain Times (VoxxedDays Athens 2022)
Software Development in Uncertain Times (VoxxedDays Athens 2022)Software Development in Uncertain Times (VoxxedDays Athens 2022)
Software Development in Uncertain Times (VoxxedDays Athens 2022)
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
Patroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloudPatroni: PostgreSQL HA in the cloud
Patroni: PostgreSQL HA in the cloud
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
 

Viewers also liked

Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFS
Roberto Franchini
 
What the hell is your software doing at runtime?
What the hell is your software doing at runtime?What the hell is your software doing at runtime?
What the hell is your software doing at runtime?
Roberto Franchini
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?
Roberto Franchini
 
Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite
Roberto Franchini
 
Java 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio SperdutoJava 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio Sperduto
Vitalij Zadneprovskij
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
Roberto Franchini
 
Metodi asincroni in spring
Metodi asincroni in springMetodi asincroni in spring
Metodi asincroni in spring
Vitalij Zadneprovskij
 
DevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of MicroservicesDevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of Microservices
Nicolas Fränkel
 

Viewers also liked (8)

Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFS
 
What the hell is your software doing at runtime?
What the hell is your software doing at runtime?What the hell is your software doing at runtime?
What the hell is your software doing at runtime?
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?
 
Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite
 
Java 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio SperdutoJava 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio Sperduto
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Metodi asincroni in spring
Metodi asincroni in springMetodi asincroni in spring
Metodi asincroni in spring
 
DevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of MicroservicesDevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of Microservices
 

Similar to GlusterFs: a scalable file system for today's and tomorrow's big data

Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Codemotion
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
CELI
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
Gluster.org
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
Theophanis Kontogiannis
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
Gluster.org
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
Gluster.org
 
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster.org
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
Sahina Bose
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstackopenstackindia
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
Gluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
Gluster.org
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)Sri Prasanna
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
NETWAYS
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systemstugrulh
 
Lec3 Dfs
Lec3 DfsLec3 Dfs
Lec3 Dfs
mobius.cn
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
Gluster.org
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
Gluster.org
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
Gluster.org
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
Ulf Wendel
 

Similar to GlusterFs: a scalable file system for today's and tomorrow's big data (20)

Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
 
Lec3 Dfs
Lec3 DfsLec3 Dfs
Lec3 Dfs
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 

Recently uploaded

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
Philip Schwarz
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
Globus
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Jay Das
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
Adele Miller
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 

Recently uploaded (20)

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
A Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of PassageA Sighting of filterA in Typelevel Rite of Passage
A Sighting of filterA in Typelevel Rite of Passage
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
First Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User EndpointsFirst Steps with Globus Compute Multi-User Endpoints
First Steps with Globus Compute Multi-User Endpoints
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdfEnhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
Enhancing Project Management Efficiency_ Leveraging AI Tools like ChatGPT.pdf
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
May Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdfMay Marketo Masterclass, London MUG May 22 2024.pdf
May Marketo Masterclass, London MUG May 22 2024.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 

GlusterFs: a scalable file system for today's and tomorrow's big data

  • 1. Roberto Franchini franchini@celi.it Codemotion Milano 29/11/2014
  • 2. GlusterFS A scalable distributed file system
  • 3. whoami(1) 15 years of experience, proud to be a programmer Writes software for information extraction, nlp, opinion mining (@scale ), and a lot of other buzzwords Implements scalable architectures Plays with servers Member of the JUG-Torino coordination team franchini@celi.it http://www.celi.it http://www.blogmeter.it github.com/robfrank github.com/uim-celi twitter.com/robfrankie linkedin.com/in/robfrank
  • 4. The problem Identify a distributed and scalable file system for today's and tomorrow's Big Data
  • 5. Once upon a time 2008: One nfs share 1,5TB ought to be enough for anybody 2010: Herd of shares (1,5TB x N) ought to be enough for anybody Nobody couldn’t stop the data flood It was the time for something new
  • 6. Requirements Can be enlarged on demand No dedicated HW OS is preferred and trusted No specialized API No specialized Kernel POSIX compliance Zilions of big and small files No NAS or SAN (€€€€€)
  • 7. Clustered Scale-out General Purpose Storage Platform − POSIX-y Distributed File System − ...and so much more Built on commodity systems − x86_64 Linux ++ − POSIX filesystems underneath (XFS, EXT4) No central metadata Server (NO SPOF) Modular architecture for scale and functionality
  • 8. Common use cases Large Scale File Server Media / Content Distribution Network (CDN) Backup / Archive / Disaster Recovery (DR) High Performance Computing (HPC) Infrastructure as a Service (IaaS) storage layer Database offload (blobs) Unified Object Store + File Access
  • 9. Features ACL and Quota support Fault-tolerance Peer to peer Self-healing Fast setup up Enlarge on demand Shrink on demand Snapshot On premise phisical or virtual On cloud
  • 11. Architecture Peer / Node − cluster servers (glusterfs server) − Runs the gluster daemons and participates in volumes Brick − A filesystem mountpoint on servers − A unit of storage used as a capacity building block
  • 12. Bricks on a node
  • 13. Architecture Translator − Logic between bricks or subvolume that generate a subvolume with certain characteristic − distribute, replica, stripe are special translators to generate simil-RAID configuration − perfomance translators Volume − Bricks combined and passed through translators − Ultimately, what's presented to the end user
  • 16. Distributed The default configuration Files “evenly” spread across bricks Similar to file-level RAID 0 Server/Disk failure could be catastrophic
  • 18. Replicated Files written synchronously to replica peers Files read synchronously, but ultimately serviced by the first responder Similar to file-level RAID 1
  • 20. Distributed + replicated Distribued + replicated Similar to file-level RAID 10 Most used layout
  • 22. Striped Individual files split among bricks (sparse files) Similar to block-level RAID 0 Limited Use Cases HPC Pre/Post Processing File size exceeds brick size
  • 25. Components glusterd Management daemon One instance on each GlusterFS server Interfaced through gluster CLI glusterfsd GlusterFS brick daemon One process for each brick on each server Managed by glusterd
  • 26. Components glusterfs Volume service daemon One process for each volume service NFS server, FUSE client, Self-Heal, Quota, ... mount.glusterfs FUSE native client mount extension gluster Gluster Console Manager (CLI)
  • 28. Clients: native FUSE kernel module allows the filesystem to be built and operated entirely in userspace Specify mount to any GlusterFS server Native Client fetches volfile from mount server, then communicates directly with all nodes to access data Recommended for high concurrency and high write performance Load is inherently balanced across distributed volumes
  • 29. Clients:NFS Standard NFS v3 clients Standard automounter is supported Mount to any server, or use a load balancer GlusterFS NFS server includes Network Lock Manager (NLM) to synchronize locks across clients Better performance for reading many small files from a single client Load balancing must be managed externally
  • 30. Clients: libgfapi Introduced with GlusterFS 3.4 User-space library for accessing data in GlusterFS Filesystem-like API Runs in application process no FUSE, no copies, no context switches ...but same volfiles, translators, etc.
  • 31. Clients: SMB/CIFS In GlusterFS 3.4 – Samba + libgfapi No need for local native client mount & re-export Significant performance improvements with FUSE removed from the equation Must be setup on each server you wish to connect to via CIFS CTDB is required for Samba clustering
  • 32. Clients: HDFS Access data within and outside of Hadoop No HDFS name node single point of failure / bottleneck Seamless replacement for HDFS Scales with the massive growth of big data
  • 34. Under the hood Elastic Hash Algorithm No central metadata No Performance Bottleneck Eliminates risk scenarios Location hashed intelligently on filename Unique identifiers (GFID), similar to md5sum
  • 35. Scalability 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server Scale out performance and availability Scale out capacitry
  • 36. Scalability Add disks to servers to increase storage size Add servers to increase bandwidth and storage size Add servers to increase availability (replica factor)
  • 37. What we do with glusterFS
  • 38. What we do with GFS Daily production of more than 10GB of Lucene inverted indexes stored on glusterFS more than 200GB/month Search stored indexes to extract different sets of documents for every customers YES: we open indexes directly on storage (it's POSIX!!!)
  • 39. 2010: first installation Version 3.0.x 8 (not dedicated) servers Distributed replicated No bound on brick size (!!!!) Ca 4TB avaliable NOTE: stuck to 3.0.x until 2012 due to problems on 3.1 and 3.2 series, then RH acquired gluster (RH Storage)
  • 40. 2012: (little) cluster New installation, version 3.3.2 4TB available on 8 servers (DELL c5000) still not dedicated 1 brick per server limited to 1TB 2TB-raid 1 on each server Still in production
  • 41. 2012: enlarge New installation, upgrade to 3.3.x 6TB available on 12 servers (still not dedicated) Enlarged to 9TB on 18 servers Bricks size bounded AND unbounded
  • 42. 2013: fail 18 not dedicated servers: too much 18 bricks of different sizes 2 big down due to bricks out of space Didn’t restart after a move but… All data were recovered (files are scattered on bricks, read from them!)
  • 43. 2014: consolidate 2 dedicated servers 12 x 3TB SAS raid6 4 bricks per server 28 TB available distributed replicated 4x1Gb bonded NIC ca 40 clients (FUSE) (other servers)
  • 44. Consolidate brick 1 brick 2 brick 3 brick 4 Gluster Server 1 brick 1 brick 2 brick 3 brick 4 Gluster Server 2
  • 45. Scale up brick 11 brick 12 brick 13 brick 31 Gluster Server 1 brick 21 brick 22 brick 32 brick 24 Gluster Server 2 brick 31 brick 32 brick 23 brick 14 Gluster Server 3
  • 46. Do Dedicated server (phisical or virtual) RAID 6 or RAID 10 (with small files) Multiple bricks of same size Plan to scale
  • 47. Do not Multi purpose server Bricks of different size Very small files Write to bricks
  • 48. Some raw tests read Total transferred file size: 23.10G bytes 43.46M bytes/sec write Total transferred file size: 23.10G bytes 38.53M bytes/sec
  • 49. Raw tests NOTE: ran in production under heavy load, no clean test room
  • 50. Resources http://www.gluster.org/ https://access.redhat.com/documentation/en- US/Red_Hat_Storage/ https://github.com/gluster http://www.redhat.com/products/storage-server/ http://joejulian.name/blog/category/glusterfs/ http://jread.us/2013/06/one-petabyte-red-hat-storage-and-glusterfs- project-overview/
  • 52.