SlideShare a Scribd company logo
1 of 53
Download to read offline
Roberto Franchini 
franchini@celi.it 
Codemotion Milano 
29/11/2014
GlusterFS 
A scalable distributed 
file system
whoami(1) 
15 years of experience, proud to be a programmer 
Writes software for information extraction, nlp, opinion mining 
(@scale ), and a lot of other buzzwords 
Implements scalable architectures 
Plays with servers 
Member of the JUG-Torino coordination team 
franchini@celi.it 
http://www.celi.it http://www.blogmeter.it 
github.com/robfrank github.com/uim-celi 
twitter.com/robfrankie linkedin.com/in/robfrank
The problem 
Identify a distributed and scalable 
file system 
for today's and tomorrow's 
Big Data
Once upon a time 
2008: One nfs share 
1,5TB ought to be enough for anybody 
2010: Herd of shares 
(1,5TB x N) ought to be enough for anybody 
Nobody couldn’t stop the data flood 
It was the time for something new
Requirements 
Can be enlarged on demand 
No dedicated HW 
OS is preferred and trusted 
No specialized API 
No specialized Kernel 
POSIX compliance 
Zilions of big and small files 
No NAS or SAN (€€€€€)
Clustered Scale-out General Purpose Storage 
Platform 
− POSIX-y Distributed File System 
− ...and so much more 
Built on commodity systems 
− x86_64 Linux ++ 
− POSIX filesystems underneath (XFS, 
EXT4) 
No central metadata Server (NO SPOF) 
Modular architecture for scale and functionality
Common 
use cases 
Large Scale File Server 
Media / Content Distribution Network (CDN) 
Backup / Archive / Disaster Recovery (DR) 
High Performance Computing (HPC) 
Infrastructure as a Service (IaaS) storage layer 
Database offload (blobs) 
Unified Object Store + File Access
Features 
ACL and Quota support 
Fault-tolerance 
Peer to peer 
Self-healing 
Fast setup up 
Enlarge on demand 
Shrink on demand 
Snapshot 
On premise phisical or virtual 
On cloud
Architecture
Architecture 
Peer / Node 
− cluster servers (glusterfs server) 
− Runs the gluster daemons and participates in volumes 
Brick 
− A filesystem mountpoint on servers 
− A unit of storage used as a capacity building block
Bricks on a node
Architecture 
Translator 
− Logic between bricks or subvolume that generate a 
subvolume with certain characteristic 
− distribute, replica, stripe are special translators to 
generate simil-RAID configuration 
− perfomance translators 
Volume 
− Bricks combined and passed through translators 
− Ultimately, what's presented to the end user
Volume
Volume types
Distributed 
The default configuration 
Files “evenly” spread across bricks 
Similar to file-level RAID 0 
Server/Disk failure could be catastrophic
Distributed
Replicated 
Files written synchronously to replica peers 
Files read synchronously, 
but ultimately serviced by the first responder 
Similar to file-level RAID 1
Replicated
Distributed + replicated 
Distribued + replicated 
Similar to file-level RAID 10 
Most used layout
Distributed replicated
Striped 
Individual files split among bricks (sparse files) 
Similar to block-level RAID 0 
Limited Use Cases 
HPC Pre/Post Processing 
File size exceeds brick size
Striped
Moving parts
Components 
glusterd 
Management daemon 
One instance on each GlusterFS server 
Interfaced through gluster CLI 
glusterfsd 
GlusterFS brick daemon 
One process for each brick on each server 
Managed by glusterd
Components 
glusterfs 
Volume service daemon 
One process for each volume service 
NFS server, FUSE client, Self-Heal, Quota, ... 
mount.glusterfs 
FUSE native client mount extension 
gluster 
Gluster Console Manager (CLI)
Clients
Clients: native 
FUSE kernel module allows the filesystem to be built and 
operated entirely in userspace 
Specify mount to any GlusterFS server 
Native Client fetches volfile from mount server, then 
communicates directly with all nodes to access data 
Recommended for high concurrency and high write 
performance 
Load is inherently balanced across distributed volumes
Clients:NFS 
Standard NFS v3 clients 
Standard automounter is supported 
Mount to any server, or use a load balancer 
GlusterFS NFS server includes Network Lock Manager 
(NLM) to synchronize locks across clients 
Better performance for reading many small files from a 
single client 
Load balancing must be managed externally
Clients: libgfapi 
Introduced with GlusterFS 3.4 
User-space library for accessing data in GlusterFS 
Filesystem-like API 
Runs in application process 
no FUSE, no copies, no context switches 
...but same volfiles, translators, etc.
Clients: SMB/CIFS 
In GlusterFS 3.4 – Samba + libgfapi 
No need for local native client mount & re-export 
Significant performance improvements with FUSE 
removed from the equation 
Must be setup on each server you wish to connect to via 
CIFS 
CTDB is required for Samba clustering
Clients: HDFS 
Access data within and outside of Hadoop 
No HDFS name node single point of failure / bottleneck 
Seamless replacement for HDFS 
Scales with the massive growth of big data
Scalability
Under the hood 
Elastic Hash Algorithm 
No central metadata 
No Performance Bottleneck 
Eliminates risk scenarios 
Location hashed intelligently on filename 
Unique identifiers (GFID), similar to md5sum
Scalability 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
3TB 
Gluster Server 
Scale out performance and availability 
Scale out capacitry
Scalability 
Add disks to servers to increase storage size 
Add servers to increase bandwidth and storage size 
Add servers to increase availability (replica factor)
What we do with glusterFS
What we do with GFS 
Daily production of more than 10GB of Lucene inverted 
indexes stored on glusterFS 
more than 200GB/month 
Search stored indexes to extract different sets of 
documents for every customers 
YES: we open indexes directly on storage 
(it's POSIX!!!)
2010: first installation 
Version 3.0.x 
8 (not dedicated) servers 
Distributed replicated 
No bound on brick size (!!!!) 
Ca 4TB avaliable 
NOTE: stuck to 3.0.x until 2012 due to problems on 3.1 and 
3.2 series, then RH acquired gluster (RH Storage)
2012: (little) cluster 
New installation, version 3.3.2 
4TB available on 8 servers (DELL c5000) 
still not dedicated 
1 brick per server limited to 1TB 
2TB-raid 1 on each server 
Still in production
2012: enlarge 
New installation, upgrade to 3.3.x 
6TB available on 12 servers (still not dedicated) 
Enlarged to 9TB on 18 servers 
Bricks size bounded AND unbounded
2013: fail 
18 not dedicated servers: too much 
18 bricks of different sizes 
2 big down due to bricks out of space 
Didn’t restart after a move 
but… 
All data were recovered 
(files are scattered on bricks, read from them!)
2014: consolidate 
2 dedicated servers 
12 x 3TB SAS raid6 
4 bricks per server 
28 TB available 
distributed replicated 
4x1Gb bonded NIC 
ca 40 clients (FUSE) (other 
servers)
Consolidate 
brick 1 
brick 2 
brick 3 
brick 4 
Gluster Server 1 
brick 1 
brick 2 
brick 3 
brick 4 
Gluster Server 2
Scale up 
brick 11 
brick 12 
brick 13 
brick 31 
Gluster Server 1 
brick 21 
brick 22 
brick 32 
brick 24 
Gluster Server 2 
brick 31 
brick 32 
brick 23 
brick 14 
Gluster Server 3
Do 
Dedicated server (phisical or virtual) 
RAID 6 or RAID 10 (with small files) 
Multiple bricks of same size 
Plan to scale
Do not 
Multi purpose server 
Bricks of different size 
Very small files 
Write to bricks
Some raw tests 
read 
Total transferred file size: 23.10G bytes 
43.46M bytes/sec 
write 
Total transferred file size: 23.10G bytes 
38.53M bytes/sec
Raw tests 
NOTE: ran in production under heavy load, no 
clean test room
Resources 
http://www.gluster.org/ 
https://access.redhat.com/documentation/en- 
US/Red_Hat_Storage/ 
https://github.com/gluster 
http://www.redhat.com/products/storage-server/ 
http://joejulian.name/blog/category/glusterfs/ 
http://jread.us/2013/06/one-petabyte-red-hat-storage-and-glusterfs- 
project-overview/
Thank you!
Roberto Franchini 
franchini@celi.it

More Related Content

What's hot

Network Mapper (NMAP)
Network Mapper (NMAP)Network Mapper (NMAP)
Network Mapper (NMAP)KHNOG
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotXiang Fu
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesNishith Agarwal
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Boden Russell
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumScyllaDB
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsDatabricks
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark InternalsKnoldus Inc.
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaBrendan Gregg
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in SparkDatabricks
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Databricks
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Databricks
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Databricks
 
google file system
google file systemgoogle file system
google file systemdiptipan
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 

What's hot (20)

Network Mapper (NMAP)
Network Mapper (NMAP)Network Mapper (NMAP)
Network Mapper (NMAP)
 
Real-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache PinotReal-time Analytics with Trino and Apache Pinot
Real-time Analytics with Trino and Apache Pinot
 
Hudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilitiesHudi architecture, fundamentals and capabilities
Hudi architecture, fundamentals and capabilities
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
4. linux file systems
4. linux file systems4. linux file systems
4. linux file systems
 
Distributed "Web Scale" Systems
Distributed "Web Scale" SystemsDistributed "Web Scale" Systems
Distributed "Web Scale" Systems
 
Using eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in CiliumUsing eBPF for High-Performance Networking in Cilium
Using eBPF for High-Performance Networking in Cilium
 
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and PitfallsRunning Apache Spark on Kubernetes: Best Practices and Pitfalls
Running Apache Spark on Kubernetes: Best Practices and Pitfalls
 
Apache Spark Internals
Apache Spark InternalsApache Spark Internals
Apache Spark Internals
 
What Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versaWhat Linux can learn from Solaris performance and vice-versa
What Linux can learn from Solaris performance and vice-versa
 
Dynamic Allocation in Spark
Dynamic Allocation in SparkDynamic Allocation in Spark
Dynamic Allocation in Spark
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
 
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
Running Apache Spark on a High-Performance Cluster Using RDMA and NVMe Flash ...
 
Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0Native Support of Prometheus Monitoring in Apache Spark 3.0
Native Support of Prometheus Monitoring in Apache Spark 3.0
 
google file system
google file systemgoogle file system
google file system
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Introduction to Linux
Introduction to LinuxIntroduction to Linux
Introduction to Linux
 
Hadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash CourseHadoop Summit Tokyo Apache NiFi Crash Course
Hadoop Summit Tokyo Apache NiFi Crash Course
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Detailed iSCSI presentation
Detailed iSCSI presentationDetailed iSCSI presentation
Detailed iSCSI presentation
 

Viewers also liked

Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSRoberto Franchini
 
What the hell is your software doing at runtime?
What the hell is your software doing at runtime?What the hell is your software doing at runtime?
What the hell is your software doing at runtime?Roberto Franchini
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Roberto Franchini
 
Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite Roberto Franchini
 
Java 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio SperdutoJava 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio SperdutoVitalij Zadneprovskij
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRoberto Franchini
 
DevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of MicroservicesDevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of MicroservicesNicolas Fränkel
 

Viewers also liked (8)

Codemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFSCodemotion Rome 2015. GlusterFS
Codemotion Rome 2015. GlusterFS
 
What the hell is your software doing at runtime?
What the hell is your software doing at runtime?What the hell is your software doing at runtime?
What the hell is your software doing at runtime?
 
Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?Where are yours vertexes and what are they talking about?
Where are yours vertexes and what are they talking about?
 
Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite Java application monitoring with Dropwizard Metrics and graphite
Java application monitoring with Dropwizard Metrics and graphite
 
Java 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio SperdutoJava 8: le nuove-interfacce di Ezio Sperduto
Java 8: le nuove-interfacce di Ezio Sperduto
 
Redis for duplicate detection on real time stream
Redis for duplicate detection on real time streamRedis for duplicate detection on real time stream
Redis for duplicate detection on real time stream
 
Metodi asincroni in spring
Metodi asincroni in springMetodi asincroni in spring
Metodi asincroni in spring
 
DevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of MicroservicesDevExperience - The Dark Side of Microservices
DevExperience - The Dark Side of Microservices
 

Similar to GlusterFs: a scalable file system for today's and tomorrow's big data

Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015Codemotion
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCELI
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionGluster.org
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRTheophanis Kontogiannis
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster.org
 
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013Gluster.org
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvSahina Bose
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstackopenstackindia
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdoseGluster.org
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetupasihan
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)Sri Prasanna
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredNETWAYS
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systemstugrulh
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmapGluster.org
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmapGluster.org
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmapGluster.org
 

Similar to GlusterFs: a scalable file system for today's and tomorrow's big data (20)

Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015Gluster FS  a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
Gluster FS a filesistem for Big Data | Roberto Franchini - Codemotion Rome 2015
 
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFSCeli @Codemotion 2014 - Roberto Franchini GlusterFS
Celi @Codemotion 2014 - Roberto Franchini GlusterFS
 
Lisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introductionLisa 2015-gluster fs-introduction
Lisa 2015-gluster fs-introduction
 
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GRGlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
GlusterFS Presentation FOSSCOMM2013 HUA, Athens, GR
 
GlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 MeetupGlusterFS Architecture - June 30, 2011 Meetup
GlusterFS Architecture - June 30, 2011 Meetup
 
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013GlusterFs Architecture & Roadmap - LinuxCon EU 2013
GlusterFs Architecture & Roadmap - LinuxCon EU 2013
 
Gluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephantGluster fs hadoop_fifth-elephant
Gluster fs hadoop_fifth-elephant
 
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
Gluster fs architecture_&_roadmap-vijay_bellur-linuxcon_eu_2013
 
Gluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlvGluster fs architecture_future_directions_tlv
Gluster fs architecture_future_directions_tlv
 
Glusterfs and openstack
Glusterfs  and openstackGlusterfs  and openstack
Glusterfs and openstack
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gluster intro-tdose
Gluster intro-tdoseGluster intro-tdose
Gluster intro-tdose
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
 
Distributed file systems (from Google)
Distributed file systems (from Google)Distributed file systems (from Google)
Distributed file systems (from Google)
 
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio ManfredOSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
OSDC 2010 | Use Distributed Filesystem as a Storage Tier by Fabrizio Manfred
 
Lec3 Dfs
Lec3 DfsLec3 Dfs
Lec3 Dfs
 
Distributed computing seminar lecture 3 - distributed file systems
Distributed computing seminar   lecture 3 - distributed file systemsDistributed computing seminar   lecture 3 - distributed file systems
Distributed computing seminar lecture 3 - distributed file systems
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 
20160401 Gluster-roadmap
20160401 Gluster-roadmap20160401 Gluster-roadmap
20160401 Gluster-roadmap
 
20160401 guster-roadmap
20160401 guster-roadmap20160401 guster-roadmap
20160401 guster-roadmap
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Hararemasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...masabamasaba
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in sowetomasabamasaba
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...SelfMade bd
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile EnvironmentVictorSzoltysek
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...masabamasaba
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 

Recently uploaded (20)

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
%+27788225528 love spells in Huntington Beach Psychic Readings, Attraction sp...
 
%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto%in Soweto+277-882-255-28 abortion pills for sale in soweto
%in Soweto+277-882-255-28 abortion pills for sale in soweto
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 

GlusterFs: a scalable file system for today's and tomorrow's big data

  • 1. Roberto Franchini franchini@celi.it Codemotion Milano 29/11/2014
  • 2. GlusterFS A scalable distributed file system
  • 3. whoami(1) 15 years of experience, proud to be a programmer Writes software for information extraction, nlp, opinion mining (@scale ), and a lot of other buzzwords Implements scalable architectures Plays with servers Member of the JUG-Torino coordination team franchini@celi.it http://www.celi.it http://www.blogmeter.it github.com/robfrank github.com/uim-celi twitter.com/robfrankie linkedin.com/in/robfrank
  • 4. The problem Identify a distributed and scalable file system for today's and tomorrow's Big Data
  • 5. Once upon a time 2008: One nfs share 1,5TB ought to be enough for anybody 2010: Herd of shares (1,5TB x N) ought to be enough for anybody Nobody couldn’t stop the data flood It was the time for something new
  • 6. Requirements Can be enlarged on demand No dedicated HW OS is preferred and trusted No specialized API No specialized Kernel POSIX compliance Zilions of big and small files No NAS or SAN (€€€€€)
  • 7. Clustered Scale-out General Purpose Storage Platform − POSIX-y Distributed File System − ...and so much more Built on commodity systems − x86_64 Linux ++ − POSIX filesystems underneath (XFS, EXT4) No central metadata Server (NO SPOF) Modular architecture for scale and functionality
  • 8. Common use cases Large Scale File Server Media / Content Distribution Network (CDN) Backup / Archive / Disaster Recovery (DR) High Performance Computing (HPC) Infrastructure as a Service (IaaS) storage layer Database offload (blobs) Unified Object Store + File Access
  • 9. Features ACL and Quota support Fault-tolerance Peer to peer Self-healing Fast setup up Enlarge on demand Shrink on demand Snapshot On premise phisical or virtual On cloud
  • 11. Architecture Peer / Node − cluster servers (glusterfs server) − Runs the gluster daemons and participates in volumes Brick − A filesystem mountpoint on servers − A unit of storage used as a capacity building block
  • 12. Bricks on a node
  • 13. Architecture Translator − Logic between bricks or subvolume that generate a subvolume with certain characteristic − distribute, replica, stripe are special translators to generate simil-RAID configuration − perfomance translators Volume − Bricks combined and passed through translators − Ultimately, what's presented to the end user
  • 16. Distributed The default configuration Files “evenly” spread across bricks Similar to file-level RAID 0 Server/Disk failure could be catastrophic
  • 18. Replicated Files written synchronously to replica peers Files read synchronously, but ultimately serviced by the first responder Similar to file-level RAID 1
  • 20. Distributed + replicated Distribued + replicated Similar to file-level RAID 10 Most used layout
  • 22. Striped Individual files split among bricks (sparse files) Similar to block-level RAID 0 Limited Use Cases HPC Pre/Post Processing File size exceeds brick size
  • 25. Components glusterd Management daemon One instance on each GlusterFS server Interfaced through gluster CLI glusterfsd GlusterFS brick daemon One process for each brick on each server Managed by glusterd
  • 26. Components glusterfs Volume service daemon One process for each volume service NFS server, FUSE client, Self-Heal, Quota, ... mount.glusterfs FUSE native client mount extension gluster Gluster Console Manager (CLI)
  • 28. Clients: native FUSE kernel module allows the filesystem to be built and operated entirely in userspace Specify mount to any GlusterFS server Native Client fetches volfile from mount server, then communicates directly with all nodes to access data Recommended for high concurrency and high write performance Load is inherently balanced across distributed volumes
  • 29. Clients:NFS Standard NFS v3 clients Standard automounter is supported Mount to any server, or use a load balancer GlusterFS NFS server includes Network Lock Manager (NLM) to synchronize locks across clients Better performance for reading many small files from a single client Load balancing must be managed externally
  • 30. Clients: libgfapi Introduced with GlusterFS 3.4 User-space library for accessing data in GlusterFS Filesystem-like API Runs in application process no FUSE, no copies, no context switches ...but same volfiles, translators, etc.
  • 31. Clients: SMB/CIFS In GlusterFS 3.4 – Samba + libgfapi No need for local native client mount & re-export Significant performance improvements with FUSE removed from the equation Must be setup on each server you wish to connect to via CIFS CTDB is required for Samba clustering
  • 32. Clients: HDFS Access data within and outside of Hadoop No HDFS name node single point of failure / bottleneck Seamless replacement for HDFS Scales with the massive growth of big data
  • 34. Under the hood Elastic Hash Algorithm No central metadata No Performance Bottleneck Eliminates risk scenarios Location hashed intelligently on filename Unique identifiers (GFID), similar to md5sum
  • 35. Scalability 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB 3TB Gluster Server Scale out performance and availability Scale out capacitry
  • 36. Scalability Add disks to servers to increase storage size Add servers to increase bandwidth and storage size Add servers to increase availability (replica factor)
  • 37. What we do with glusterFS
  • 38. What we do with GFS Daily production of more than 10GB of Lucene inverted indexes stored on glusterFS more than 200GB/month Search stored indexes to extract different sets of documents for every customers YES: we open indexes directly on storage (it's POSIX!!!)
  • 39. 2010: first installation Version 3.0.x 8 (not dedicated) servers Distributed replicated No bound on brick size (!!!!) Ca 4TB avaliable NOTE: stuck to 3.0.x until 2012 due to problems on 3.1 and 3.2 series, then RH acquired gluster (RH Storage)
  • 40. 2012: (little) cluster New installation, version 3.3.2 4TB available on 8 servers (DELL c5000) still not dedicated 1 brick per server limited to 1TB 2TB-raid 1 on each server Still in production
  • 41. 2012: enlarge New installation, upgrade to 3.3.x 6TB available on 12 servers (still not dedicated) Enlarged to 9TB on 18 servers Bricks size bounded AND unbounded
  • 42. 2013: fail 18 not dedicated servers: too much 18 bricks of different sizes 2 big down due to bricks out of space Didn’t restart after a move but… All data were recovered (files are scattered on bricks, read from them!)
  • 43. 2014: consolidate 2 dedicated servers 12 x 3TB SAS raid6 4 bricks per server 28 TB available distributed replicated 4x1Gb bonded NIC ca 40 clients (FUSE) (other servers)
  • 44. Consolidate brick 1 brick 2 brick 3 brick 4 Gluster Server 1 brick 1 brick 2 brick 3 brick 4 Gluster Server 2
  • 45. Scale up brick 11 brick 12 brick 13 brick 31 Gluster Server 1 brick 21 brick 22 brick 32 brick 24 Gluster Server 2 brick 31 brick 32 brick 23 brick 14 Gluster Server 3
  • 46. Do Dedicated server (phisical or virtual) RAID 6 or RAID 10 (with small files) Multiple bricks of same size Plan to scale
  • 47. Do not Multi purpose server Bricks of different size Very small files Write to bricks
  • 48. Some raw tests read Total transferred file size: 23.10G bytes 43.46M bytes/sec write Total transferred file size: 23.10G bytes 38.53M bytes/sec
  • 49. Raw tests NOTE: ran in production under heavy load, no clean test room
  • 50. Resources http://www.gluster.org/ https://access.redhat.com/documentation/en- US/Red_Hat_Storage/ https://github.com/gluster http://www.redhat.com/products/storage-server/ http://joejulian.name/blog/category/glusterfs/ http://jread.us/2013/06/one-petabyte-red-hat-storage-and-glusterfs- project-overview/
  • 52.