This document provides an overview and introduction to GlusterFS for system administrators. It covers the key topics of GlusterFS technology, scaling, architecture, data distribution and redundancy features, administration tasks like adding nodes and volumes, and general use cases. The presentation is aimed at experienced sysadmins helping them understand and deploy GlusterFS distributed file systems.
The document discusses integrating Samba with Gluster and FUSE. It notes that Samba expects POSIX behaviors that Gluster can provide, like cache coherency and POSIX locking support. FUSE presents a problem as it locks away Gluster internals. The document proposes that the Samba VFS layer could allow Gluster features to be selectively enabled or disabled per access method. Building a Samba VFS module for Gluster was discussed but example code was omitted.
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
GlusterFS is a scale-out distributed file system that aggregates storage over a network to provide a single unified namespace. It has a modular architecture and runs on commodity hardware without external metadata servers. Future directions for GlusterFS include distributed geo-replication, file snapshots, and erasure coding support. Challenges include improving scalability, supporting hard links and renames, reducing monitoring overhead, and lowering costs.
This document summarizes a presentation about software defined storage using the open source Gluster file system. It begins with an overview of storage concepts like reliability, performance, and scaling. It then discusses the history and types of storage and provides case studies of proprietary storage systems. The presentation introduces software defined storage and Gluster, describing its modular design, use in cloud computing, pros and cons. Key Gluster concepts are defined and its distributed and replicated volume types are explained. The presentation concludes with instructions for setting up and using Gluster.
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
This document provides an overview and summary of the Gluster distributed file system for system administrators. It begins with introductions and defines key Gluster concepts and components. It then discusses how Gluster provides scaling and redundancy through various volume types like distributed, replicated, and erasure coded volumes. The document outlines how data can be accessed using native Gluster clients or NFS/SMB. It also covers general Gluster administration tasks and capabilities like profiling and geo-replication.
Gluster is an open-source distributed scale-out storage system. It uses commodity hardware and has no centralized metadata server. Key concepts include bricks (storage units on servers), volumes (logical collections of bricks), and a trusted storage pool of nodes. Main volume types are distributed, replicated, distributed replicated, and striped. To set up Gluster, install packages, start services, create a storage pool, make volumes, and mount them on clients.
This document discusses integrating QEMU with GlusterFS to provide native storage access for virtual machines. Key points include:
- A new QEMU block driver allows specifying Gluster volumes directly without using FUSE, avoiding overhead.
- Benchmarks show the native integration outperforms FUSE mounting. Further improvements are planned through optimizations like zero-copy I/O.
- The libgfapi library provides direct I/O to Gluster volumes.
- Support is being added to libvirt and VDSM to specify Gluster volumes through standardized interfaces.
GlusterFS is scale-out software defined storage. It was presented at LISA15 in Washington D.C. from November 8-13, 2015. The presentation covered GlusterFS installation, configuration of trusted storage pools, creating and managing distributed, replicated, and other volume types, expanding and shrinking volumes, self-healing, and accessing data using native GlusterFS clients, NFS, and SMB/CIFS. Configuration details for CTDB and sharing volumes over SMB were also provided.
This document discusses integrating the open-source GlusterFS distributed file system with QEMU virtualization and the oVirt virtualization management platform. It provides an overview of GlusterFS and how it can be used as virtual machine image storage. It then describes integrating GlusterFS with QEMU either directly or through NFS, as well as new features in oVirt 3.1 for managing GlusterFS volumes from within oVirt. Upcoming integrations between oVirt and GlusterFS are also outlined.
The document discusses integrating Samba with Gluster and FUSE. It notes that Samba expects POSIX behaviors that Gluster can provide, like cache coherency and POSIX locking support. FUSE presents a problem as it locks away Gluster internals. The document proposes that the Samba VFS layer could allow Gluster features to be selectively enabled or disabled per access method. Building a Samba VFS module for Gluster was discussed but example code was omitted.
GlusterFs Architecture & Roadmap - LinuxCon EU 2013Gluster.org
GlusterFS is a scale-out distributed file system that aggregates storage over a network to provide a single unified namespace. It has a modular architecture and runs on commodity hardware without external metadata servers. Future directions for GlusterFS include distributed geo-replication, file snapshots, and erasure coding support. Challenges include improving scalability, supporting hard links and renames, reducing monitoring overhead, and lowering costs.
This document summarizes a presentation about software defined storage using the open source Gluster file system. It begins with an overview of storage concepts like reliability, performance, and scaling. It then discusses the history and types of storage and provides case studies of proprietary storage systems. The presentation introduces software defined storage and Gluster, describing its modular design, use in cloud computing, pros and cons. Key Gluster concepts are defined and its distributed and replicated volume types are explained. The presentation concludes with instructions for setting up and using Gluster.
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...Tommy Lee
This document provides an overview and summary of the Gluster distributed file system for system administrators. It begins with introductions and defines key Gluster concepts and components. It then discusses how Gluster provides scaling and redundancy through various volume types like distributed, replicated, and erasure coded volumes. The document outlines how data can be accessed using native Gluster clients or NFS/SMB. It also covers general Gluster administration tasks and capabilities like profiling and geo-replication.
Gluster is an open-source distributed scale-out storage system. It uses commodity hardware and has no centralized metadata server. Key concepts include bricks (storage units on servers), volumes (logical collections of bricks), and a trusted storage pool of nodes. Main volume types are distributed, replicated, distributed replicated, and striped. To set up Gluster, install packages, start services, create a storage pool, make volumes, and mount them on clients.
This document discusses integrating QEMU with GlusterFS to provide native storage access for virtual machines. Key points include:
- A new QEMU block driver allows specifying Gluster volumes directly without using FUSE, avoiding overhead.
- Benchmarks show the native integration outperforms FUSE mounting. Further improvements are planned through optimizations like zero-copy I/O.
- The libgfapi library provides direct I/O to Gluster volumes.
- Support is being added to libvirt and VDSM to specify Gluster volumes through standardized interfaces.
GlusterFS is scale-out software defined storage. It was presented at LISA15 in Washington D.C. from November 8-13, 2015. The presentation covered GlusterFS installation, configuration of trusted storage pools, creating and managing distributed, replicated, and other volume types, expanding and shrinking volumes, self-healing, and accessing data using native GlusterFS clients, NFS, and SMB/CIFS. Configuration details for CTDB and sharing volumes over SMB were also provided.
This document discusses integrating the open-source GlusterFS distributed file system with QEMU virtualization and the oVirt virtualization management platform. It provides an overview of GlusterFS and how it can be used as virtual machine image storage. It then describes integrating GlusterFS with QEMU either directly or through NFS, as well as new features in oVirt 3.1 for managing GlusterFS volumes from within oVirt. Upcoming integrations between oVirt and GlusterFS are also outlined.
The document provides an overview and future directions of Gluster distributed storage system. It discusses why Gluster is useful given increasing data volumes. It defines Gluster as a scale-out distributed storage system that aggregates storage over a network to provide a unified namespace. It outlines typical deployments and architecture, and describes various volume types like distributed, replicated, dispersed. It also covers access mechanisms, features, use cases and monitoring integration. Finally, it discusses recent releases and new features in development like data tiering, bitrot detection and sharding to improve performance and capabilities.
This document summarizes a presentation about implementing lease locks and client-side caching in GlusterFS. It discusses quick introductions to lease locks and why they should be implemented in GlusterFS. The design section outlines how lease locks would work, storing lease info on servers and using upcalls. Challenges discussed include network partitions, rebalancing, and inconsistencies. Future enhancements proposed are supporting more lease types, migration/healing, and lock recovery. Client-side caching benefits from lease locks through cache coherency and reducing response times.
The document discusses the current features and roadmap for GlusterFS. It summarizes the current stable releases, features added in recent versions, and plans for upcoming releases 3.7 and 4.0. The 3.7 release will focus on improvements to small file performance, tiering, rack awareness, trash support for undelete, and NFS Ganesha integration. The 4.0 release will aim to improve scalability and manageability with features like multiple networks support, new style replication, and REST APIs.
This document discusses how to use Wireshark to debug GlusterFS protocols. It begins by outlining some typical use cases for Wireshark in troubleshooting, development, and performance checking. It then provides details on the various protocols used in GlusterFS, including Gluster CLI, GlusterD Management, Gluster DUMP, GlusterD Friend, Gluster Portmap, GlusterFS Callback, GlusterFS Handshake, and GlusterFS. It also discusses how to capture network traces with Wireshark and browse the packets. An example use case is presented where Wireshark is used to debug Adobe InDesign crashes on a GlusterFS volume.
GlusterFS is a POSIX-compliant distributed file system that aggregates various storage bricks across commodity servers into a single global namespace. It has no single point of failure or performance bottleneck. Red Hat Storage is an enterprise implementation of GlusterFS. It uses a elastic hashing algorithm to distribute files across bricks without a centralized metadata server. Various translators and volumes types provide features like replication, distribution, striping, and geo-replication. Administration involves adding peers, creating and managing distributed volumes, and manipulating bricks.
The document discusses GlusterD 2.0, a redesign of the Gluster distributed file system management daemon. Some key points:
- GlusterD 1.0 had scalability and consistency issues that limited it to hundreds of nodes. GlusterD 2.0 was rewritten from scratch in Go for better performance.
- GlusterD 2.0 uses etcd for centralized management and configuration storage. It has REST APIs and plugins for modularity.
- Components include REST interfaces, etcd backend, RPC framework, transaction system, and a flexible volume generator.
- Upgrades from Gluster 3.x to 4.x will be disruptive but provide a migration path. Gluster
Gluster is a software defined distributed storage system. It uses translators to implement storage semantics like distribution and replication across server bricks. Clients can mount volumes using FUSE or access them via NFS or REST APIs. Translators are implemented as shared libraries and define fop methods and callbacks to handle storage operations in a distributed manner across the cluster. Errors are handled by unwinding the stack and returning failure codes to clients.
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Gluster.org
This document discusses setting up a highly available NFS server on GlusterFS scale-out storage systems using NFS-Ganesha. It provides an overview of GlusterFS architecture, describes how NFS-Ganesha integrates with GlusterFS using libgfapi to provide NFS access. It also discusses how to set up an active-active clustered NFS solution using tools like Pacemaker, Corosync and shared storage to provide high availability and load balancing of the NFS service across multiple nodes.
The document outlines Gluster's roadmap, including recent improvements to versions 3.5-3.7 like bitrot detection and sharding, and plans for upcoming releases 3.8 and 4.0 such as tiering support, REST APIs, new style replication, and improving the distributed hashing translator to scale to 1000 servers. It also provides an overview of Gluster's architecture and quick start instructions.
This document discusses using GlusterFS for Hadoop. GlusterFS is an open source distributed file system that aggregates storage and provides a unified global namespace. It can be used with Hadoop as the underlying storage system instead of HDFS. Using GlusterFS offers advantages like no need for a metadata server and ability to use the same storage for both MapReduce jobs and application data. It also supports features like geo-replication and erasure coding that are useful for big data workloads.
GlusterFS is a distributed file system that shards and replicates files across multiple servers without a central metadata server. It uses modular "translators" to handle functions like replication and distribution. Some challenges GlusterFS faces include multi-tenancy, distributed quota management, efficient data rebalancing, reducing replication latency, optimizing directory traversal, and handling many small files. The speaker argues these challenges are not unique to GlusterFS and that incremental, modular improvements are preferable to monolithic solutions.
Red Hat Gluster Storage (GlusterFS) is a software-only, scale-out storage solution built on GlusterFS, a general purpose distributed file system. GlusterFS aggregates storage across commodity servers to provide high performance, scalable storage. It uses a stackable, userspace design and has features like elasticity, high availability, simple management, data replication and distribution, snapshots, and encryption. GlusterFS deployments involve forming a trusted storage pool of servers, exporting storage using bricks, and creating logical volumes from the bricks for clients to mount and access over various protocols.
The document outlines Gluster's roadmap, including recent improvements to versions 3.5-3.7 like bitrot detection and sharding, and plans for upcoming releases 3.8 and 4.0 such as tiering support, REST APIs, new style replication, and improving the distributed hashing translator. It also provides an overview of Gluster's architecture and quick start instructions.
This document summarizes GlusterFS, an open-source scale-out network filesystem. It discusses GlusterFS concepts like servers, trusted storage pools, bricks and volumes. It describes the distributed, replicated and dispersed volume types. Additional features like geo-replication, snapshots, quotas and data tiering are covered. The document provides an overview of GlusterFS architecture, components like translators and processes. It also discusses performance considerations and accessing volumes via FUSE, NFS and SMB protocols.
This document provides an introduction and overview of Gluster, an open source scale-out network-attached storage file system. It discusses what Gluster is, its architecture using distributed and replicated volumes, and provides a quick start guide for setting up a basic Gluster volume. The document also outlines several use cases for Gluster and recently added features. It concludes by describing how readers can get involved in the Gluster community.
In this session, we'll discuss new volume types in Red Hat Gluster Storage. We will talk about erasure codes and storage tiers, and how they can work together. Future directions will also be touched on, including rule based classifiers and data transformations.
You will learn about:
How erasure codes lower the cost of storage.
How to configure and manage an erasure coded volume.
How to tune Gluster and Linux to optimize erasure code performance.
Using erasure codes for archival workloads.
How to utilize an SSD inexpensively as a storage tier.
Gluster's erasure code and storage tiering design.
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
At Red Hat Storage Day New York on 1/19/16, Red Hat's Sayan Saha took attendees through an overview of Red Hat Gluster Storage that included future plans for the product, Red Hat's plans for container storage, and the company's plans for CephFS.
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
Dustin L. Black will give a live demo on administering Red Hat Storage Server from 6-7pm. The session will provide an overview of Red Hat Storage technology including GlusterFS, use cases, architecture, and functionality like volumes, layered functionality, asynchronous replication and data access methods. It will also demonstrate these concepts in a live demo.
Gluster tiering allows for the logical composition of diverse storage units like SSDs and HDDs. It uses fast storage like SSDs as a cache for slower storage like HDDs. Files are migrated between tiers based on usage patterns to optimize for access speeds. The tiering implementation in Gluster uses a metadata store and changetime recorder to track file access and make decisions about tier migrations. Integration with the Gluster distributed hash table and volume rebalancing process allows for dynamic attaching and detaching of tiers.
The document discusses open source software. It defines open source as using open specifications that anyone can use and working together in a community. Open source software allows users to use, distribute, modify, and distribute modifications of the software freely. Some advantages of open source software include lower costs, availability of source code, prevention of vendor lock-in, extensive auditing, and flexibility. The document provides examples of open source projects and tools and discusses how to get involved in open source by finding a relevant project, introducing yourself, and starting with a simple task.
This document provides an agenda and overview for a Gluster tutorial presentation. It includes sections on Gluster basics, initial setup using test drives and VMs, extra Gluster features like snapshots and quota, and tips for maintenance and troubleshooting. Hands-on examples are provided to demonstrate creating a Gluster volume across two servers and mounting it as a filesystem. Terminology around bricks, translators, and the volume file are introduced.
The document provides an overview and future directions of Gluster distributed storage system. It discusses why Gluster is useful given increasing data volumes. It defines Gluster as a scale-out distributed storage system that aggregates storage over a network to provide a unified namespace. It outlines typical deployments and architecture, and describes various volume types like distributed, replicated, dispersed. It also covers access mechanisms, features, use cases and monitoring integration. Finally, it discusses recent releases and new features in development like data tiering, bitrot detection and sharding to improve performance and capabilities.
This document summarizes a presentation about implementing lease locks and client-side caching in GlusterFS. It discusses quick introductions to lease locks and why they should be implemented in GlusterFS. The design section outlines how lease locks would work, storing lease info on servers and using upcalls. Challenges discussed include network partitions, rebalancing, and inconsistencies. Future enhancements proposed are supporting more lease types, migration/healing, and lock recovery. Client-side caching benefits from lease locks through cache coherency and reducing response times.
The document discusses the current features and roadmap for GlusterFS. It summarizes the current stable releases, features added in recent versions, and plans for upcoming releases 3.7 and 4.0. The 3.7 release will focus on improvements to small file performance, tiering, rack awareness, trash support for undelete, and NFS Ganesha integration. The 4.0 release will aim to improve scalability and manageability with features like multiple networks support, new style replication, and REST APIs.
This document discusses how to use Wireshark to debug GlusterFS protocols. It begins by outlining some typical use cases for Wireshark in troubleshooting, development, and performance checking. It then provides details on the various protocols used in GlusterFS, including Gluster CLI, GlusterD Management, Gluster DUMP, GlusterD Friend, Gluster Portmap, GlusterFS Callback, GlusterFS Handshake, and GlusterFS. It also discusses how to capture network traces with Wireshark and browse the packets. An example use case is presented where Wireshark is used to debug Adobe InDesign crashes on a GlusterFS volume.
GlusterFS is a POSIX-compliant distributed file system that aggregates various storage bricks across commodity servers into a single global namespace. It has no single point of failure or performance bottleneck. Red Hat Storage is an enterprise implementation of GlusterFS. It uses a elastic hashing algorithm to distribute files across bricks without a centralized metadata server. Various translators and volumes types provide features like replication, distribution, striping, and geo-replication. Administration involves adding peers, creating and managing distributed volumes, and manipulating bricks.
The document discusses GlusterD 2.0, a redesign of the Gluster distributed file system management daemon. Some key points:
- GlusterD 1.0 had scalability and consistency issues that limited it to hundreds of nodes. GlusterD 2.0 was rewritten from scratch in Go for better performance.
- GlusterD 2.0 uses etcd for centralized management and configuration storage. It has REST APIs and plugins for modularity.
- Components include REST interfaces, etcd backend, RPC framework, transaction system, and a flexible volume generator.
- Upgrades from Gluster 3.x to 4.x will be disruptive but provide a migration path. Gluster
Gluster is a software defined distributed storage system. It uses translators to implement storage semantics like distribution and replication across server bricks. Clients can mount volumes using FUSE or access them via NFS or REST APIs. Translators are implemented as shared libraries and define fop methods and callbacks to handle storage operations in a distributed manner across the cluster. Errors are handled by unwinding the stack and returning failure codes to clients.
Introduction to highly_availablenfs_server_on_scale-out_storage_systems_based...Gluster.org
This document discusses setting up a highly available NFS server on GlusterFS scale-out storage systems using NFS-Ganesha. It provides an overview of GlusterFS architecture, describes how NFS-Ganesha integrates with GlusterFS using libgfapi to provide NFS access. It also discusses how to set up an active-active clustered NFS solution using tools like Pacemaker, Corosync and shared storage to provide high availability and load balancing of the NFS service across multiple nodes.
The document outlines Gluster's roadmap, including recent improvements to versions 3.5-3.7 like bitrot detection and sharding, and plans for upcoming releases 3.8 and 4.0 such as tiering support, REST APIs, new style replication, and improving the distributed hashing translator to scale to 1000 servers. It also provides an overview of Gluster's architecture and quick start instructions.
This document discusses using GlusterFS for Hadoop. GlusterFS is an open source distributed file system that aggregates storage and provides a unified global namespace. It can be used with Hadoop as the underlying storage system instead of HDFS. Using GlusterFS offers advantages like no need for a metadata server and ability to use the same storage for both MapReduce jobs and application data. It also supports features like geo-replication and erasure coding that are useful for big data workloads.
GlusterFS is a distributed file system that shards and replicates files across multiple servers without a central metadata server. It uses modular "translators" to handle functions like replication and distribution. Some challenges GlusterFS faces include multi-tenancy, distributed quota management, efficient data rebalancing, reducing replication latency, optimizing directory traversal, and handling many small files. The speaker argues these challenges are not unique to GlusterFS and that incremental, modular improvements are preferable to monolithic solutions.
Red Hat Gluster Storage (GlusterFS) is a software-only, scale-out storage solution built on GlusterFS, a general purpose distributed file system. GlusterFS aggregates storage across commodity servers to provide high performance, scalable storage. It uses a stackable, userspace design and has features like elasticity, high availability, simple management, data replication and distribution, snapshots, and encryption. GlusterFS deployments involve forming a trusted storage pool of servers, exporting storage using bricks, and creating logical volumes from the bricks for clients to mount and access over various protocols.
The document outlines Gluster's roadmap, including recent improvements to versions 3.5-3.7 like bitrot detection and sharding, and plans for upcoming releases 3.8 and 4.0 such as tiering support, REST APIs, new style replication, and improving the distributed hashing translator. It also provides an overview of Gluster's architecture and quick start instructions.
This document summarizes GlusterFS, an open-source scale-out network filesystem. It discusses GlusterFS concepts like servers, trusted storage pools, bricks and volumes. It describes the distributed, replicated and dispersed volume types. Additional features like geo-replication, snapshots, quotas and data tiering are covered. The document provides an overview of GlusterFS architecture, components like translators and processes. It also discusses performance considerations and accessing volumes via FUSE, NFS and SMB protocols.
This document provides an introduction and overview of Gluster, an open source scale-out network-attached storage file system. It discusses what Gluster is, its architecture using distributed and replicated volumes, and provides a quick start guide for setting up a basic Gluster volume. The document also outlines several use cases for Gluster and recently added features. It concludes by describing how readers can get involved in the Gluster community.
In this session, we'll discuss new volume types in Red Hat Gluster Storage. We will talk about erasure codes and storage tiers, and how they can work together. Future directions will also be touched on, including rule based classifiers and data transformations.
You will learn about:
How erasure codes lower the cost of storage.
How to configure and manage an erasure coded volume.
How to tune Gluster and Linux to optimize erasure code performance.
Using erasure codes for archival workloads.
How to utilize an SSD inexpensively as a storage tier.
Gluster's erasure code and storage tiering design.
Red Hat Gluster Storage, Container Storage and CephFS PlansRed_Hat_Storage
At Red Hat Storage Day New York on 1/19/16, Red Hat's Sayan Saha took attendees through an overview of Red Hat Gluster Storage that included future plans for the product, Red Hat's plans for container storage, and the company's plans for CephFS.
Dustin Black - Red Hat Storage Server Administration Deep DiveGluster.org
Dustin L. Black will give a live demo on administering Red Hat Storage Server from 6-7pm. The session will provide an overview of Red Hat Storage technology including GlusterFS, use cases, architecture, and functionality like volumes, layered functionality, asynchronous replication and data access methods. It will also demonstrate these concepts in a live demo.
Gluster tiering allows for the logical composition of diverse storage units like SSDs and HDDs. It uses fast storage like SSDs as a cache for slower storage like HDDs. Files are migrated between tiers based on usage patterns to optimize for access speeds. The tiering implementation in Gluster uses a metadata store and changetime recorder to track file access and make decisions about tier migrations. Integration with the Gluster distributed hash table and volume rebalancing process allows for dynamic attaching and detaching of tiers.
The document discusses open source software. It defines open source as using open specifications that anyone can use and working together in a community. Open source software allows users to use, distribute, modify, and distribute modifications of the software freely. Some advantages of open source software include lower costs, availability of source code, prevention of vendor lock-in, extensive auditing, and flexibility. The document provides examples of open source projects and tools and discusses how to get involved in open source by finding a relevant project, introducing yourself, and starting with a simple task.
This document provides an agenda and overview for a Gluster tutorial presentation. It includes sections on Gluster basics, initial setup using test drives and VMs, extra Gluster features like snapshots and quota, and tips for maintenance and troubleshooting. Hands-on examples are provided to demonstrate creating a Gluster volume across two servers and mounting it as a filesystem. Terminology around bricks, translators, and the volume file are introduced.
This document discusses developing applications and integrating with GlusterFS. It describes using the libgfapi library to access GlusterFS volumes, providing improved speed over FUSE. The document outlines the Python and C APIs for libgfapi and provides an example of using the Python API. It also discusses rapidly prototyping GlusterFS translators using the Python Glupy library and provides a simple "hello world" translator example.
The document discusses plans for expanding the Gluster community and project in 2012-2013. It describes how GlusterFS was initially a standalone project but has begun integrating with other open source projects like OpenStack, Hadoop, KVM and others. It then outlines strategic plans for 2013 to have one website for multiple projects, increase integrations with leading technologies, more aggressive software releases, improved documentation, and expanded community workshops around the world to evangelize GlusterFS.
This document discusses using Wireshark to debug GlusterFS network traffic. It provides an overview of Wireshark, how to capture GlusterFS packets, identify the basic GlusterFS protocols, and build filters. Specific examples are given to identify packets from a client to a GlusterFS brick, determine which volume and server the brick is on, and filter on process ID, user ID, and RPC procedures. Statistics collection and decrypting SSL traffic are also briefly covered.
The document discusses the GlusterFS APIs and libgfapi basics. It describes how libgfapi allows manually creating a context, loading a volume file, and making individual calls like glfs_open and glfs_write. It also provides a Python example of using libgfapi to create a file. The document outlines the basics of the GlusterFS translator including adding functionality from storage bricks to the user, and the translator environment of stacking requests and unwinding responses.
This document summarizes deduplication in storage systems. It discusses what deduplication is, how it works by identifying and storing only unique data blocks using hashing, and why it is used to reduce storage costs and network usage. The document also covers the types of deduplication at the file and block level, where and when deduplication occurs, challenges involved, and current work on Yet Another Deduplication Library (YADL) as an open source deduplication library.
The document introduces the Disperse Translator, which allows for configurable fault tolerance in Gluster volumes using erasure codes. Key features include adjustable redundancy levels, minimized storage waste, and reduced bandwidth usage. It works by dispersing and encoding file chunks across bricks. The current implementation provides a functional disperse translator and healing processes, with future plans to add CLI support and optimize performance.
The document summarizes the mClock algorithm for providing quality of service (QoS) guarantees for storage input/output (IO) resource allocation in virtualized environments. The mClock algorithm uses a combination of weight-based and constraint-based scheduling to dynamically allocate storage IO bandwidth proportional to weights while ensuring reservations and limits are met. It assigns real-time tags to requests based on reservations, limits, and shares to schedule requests and adjusts tags when throughput varies to maintain proportional allocation and meet guarantees.
The document discusses the current features and roadmap for GlusterFS. It summarizes the current stable releases, features added in recent versions, and plans for upcoming releases 3.7 and 4.0. The 3.7 release will focus on small file performance enhancements, tiering, rack awareness, trash support, and NFS Ganesha integration. The 4.0 release will aim to improve scalability and manageability with features like multiple networks support, new style replication, and REST APIs.
The document discusses the status and history of NFS and UFO protocols. It notes that many bugs have been fixed in NFS and details UFO's origins as being based on Swift 1.4.8, its packaging in Fedora 18, and plans for future performance enhancements and integration with Keystone authentication. The document contains several sections written by Kaleb KEITHLEY covering NFS status, UFO history, and its future.
LizardFS v4.0 was recently released. It is a distributed, scalable, fault-tolerant and highly available file system that allows combining disk space from many servers into a single namespace. Key features include snapshots, QoS, data replication with standard and XOR replicas, georeplication, metadata replication, LTO library support, quotas, POSIX compliance, trash functionality, and monitoring tools. It has an architecture with separate metadata and chunk servers and can scale out by adding new servers without downtime. It is suitable for applications like archives, virtual machine storage, backups, and more due to its enterprise-class features running on commodity hardware.
This document provides an overview and summary of the GlusterFS distributed file system. It begins with an agenda that covers what GlusterFS is, why it was created, how it works, where it is used, and upcoming features. The rest of the document dives into these topics, explaining key concepts like trusted storage pools, bricks, volumes and volume types. It also covers access mechanisms, features, integration with OpenStack and oVirt, monitoring, and new capabilities planned for upcoming versions like data tiering and NFSv4 support.
Challenges with Gluster and Persistent Memory with Dan LambrightGluster.org
This document discusses challenges in using persistent memory (SCM) with distributed storage systems like Gluster. It notes that SCM provides faster access than SSDs but must address latency throughout the storage stack, including network transfer times and CPU overhead. The document examines how Gluster's design amplifies lookup operations and proposes caching file metadata at clients to reduce overhead. It also suggests using SCM as a tiered cache layer and optimizing replication strategies to fully leverage the speed of SCM.
Présentation rapide de Mesos et Marathon faites au Docker metup 2016 de Rennes.
Vous trouverez le code associé : https://github.com/Lawouach/platform-showcase-for-microservices
Continuous delivery with jenkins, docker and exoscaleJulia Mateo
This document discusses using Docker, Jenkins, and Exoscale for continuous delivery. It defines continuous delivery and continuous deployment. Docker is presented as a way to deploy applications as containers to facilitate fast, robust deployments. The document demonstrates setting up a test environment with Jenkins, Docker plugins, and the Docker registry for continuous integration and delivery of a sample web application. It also discusses strategies for deploying to production environments like canary releasing and blue-green deployments using Docker.
This document summarizes a presentation about building a negative lookup caching translator for GlusterFS. The presentation demonstrates adding caching functionality to speed up lookups by caching previous misses. It shows the steps to hook the translator together, build it, configure it, debug it, and test its performance. Finally, it briefly introduces glupy, a new project for writing GlusterFS translators in Python, and demonstrates a Python implementation of the negative lookup cache.
This is to introduce the related components in SUSE Linux Enterprise High Availability Extension product to build High Available Storage (ha-lvm/drbd/iscsi/nfs, clvm, ocfs2, cluster-raid1).
This document discusses using SaltStack to manage Alienvault infrastructure. SaltStack is an open source tool for configuration management and remote execution that can control and deploy configurations to all servers. It has a simple architecture with a master server and minion clients. Custom modules, states, grains and templates can extend SaltStack's capabilities. Targeting allows applying configurations selectively based on server attributes. This enables centralized yet flexible management of an entire Alienvault deployment.
SUSE Expert Days Paris 2018 - SUSE HA Cluster Multi-DeviceSUSE
This document summarizes a presentation about SUSE Linux Enterprise High Availability Cluster Multi-Device. It discusses the main features of SUSE HA including policy driven clusters, cluster aware filesystems, and continuous data replication. It then describes the HA storage stack architecture and various options for doing HA storage including DRBD, clustered LVM2, and Cluster-MD. Cluster-MD is presented as a software-based RAID storage that provides redundancy at the device level across multiple nodes. Performance comparisons show Cluster-MD outperforming clustered LVM mirroring. Extensions to Cluster-MD are discussed including expanding the size of a Cluster-MD device.
Red Hat Enterprise Linux: Open, hyperconverged infrastructureRed_Hat_Storage
The next generation of IT will be built around flexible infrastructures and operational efficiencies, lowering costs and increasing overall business value in the organization.
A hyperconverged infrastructure that's built on Red Hat supported technologies--including Linux, Gluster storage, and oVirt virtualization manager--will run on commodity x86 servers using the performance of local storage, to deliver a cost-effective, modular, highly scalable, and secure hyperconverged solution.
5 Levels of High Availability: From Multi-instance to Hybrid CloudRafał Leszko
The document discusses 5 levels of high availability for applications and services, from single instance deployments to hybrid cloud. Level 1 involves deploying multiple instances within an availability zone or region. Level 2 adds deployment across availability zones for redundancy if one zone fails. Level 3 spans multiple regions for redundancy if an entire region fails. Level 4 involves deploying across multiple cloud providers. The highest level, Level 5 hybrid cloud, provides redundancy across cloud and on-premises infrastructure. Each level increases availability but also complexity and potential latency. The document analyzes tradeoffs between consistency, latency, and functionality at each level.
5 levels of high availability from multi instance to hybrid cloudRafał Leszko
The document discusses 5 levels of high availability architecture for applications and services, ranging from single instance up to multi-cloud deployments. It covers Level 0 with a single instance, Level 1 using multiple instances in the same region, Level 2 spanning multiple availability zones, Level 3 across multiple regions, and Level 4 leveraging multiple cloud providers. Each level increases availability at the cost of greater complexity and potential for higher latency.
This document discusses using GlusterFS, an open source distributed file system, to store large amounts of non-structured data in a scalable way using commodity hardware. GlusterFS can scale to thousands of petabytes by clustering storage components over a network and managing data in a single global namespace. It uses a stackable translator architecture and self-healing capabilities to deliver high performance and reliability for diverse workloads. Red Hat acquired Gluster Inc. in 2011 and continues to develop and support GlusterFS as part of their storage product offerings.
This document provides an introduction and overview of Gluster, an open source scale-out network-attached storage file system. It discusses what Gluster is, its architecture using distributed and replicated volumes, a quick start guide, use cases, features, and how to get involved in the community. The presentation aims to explain the benefits and capabilities of Gluster for scalable, high performance storage.
In this day and age, data grows so fast it’s not uncommon for those of us using a relational database to reach the limits of its capacity. In this session, Kwangbock Lee explains how Samsung uses ClustrixDB to handle fast-growing data without manual database sharding. He highlights lessons learned, including a few hiccups along the way, and shares Samsung's experience migrating to ClustrixDB.
This session will cover performance-related developments in Red Hat Gluster Storage 3 and share best practices for testing, sizing, configuration, and tuning.
Join us to learn about:
Current features in Red Hat Gluster Storage, including 3-way replication, JBOD support, and thin-provisioning.
Features that are in development, including network file system (NFS) support with Ganesha, erasure coding, and cache tiering.
New performance enhancements related to the area of remote directory memory access (RDMA), small-file performance, FUSE caching, and solid state disks (SSD) readiness.
Slides presented at Percona Live Europe Open Source Database Conference 2019, Amsterdam, 2019-10-01.
Imagine a world where all Wikipedia articles disappear due to a human error or software bug. Sounds unreal? According to some estimations, it would take an excess of hundreds of million person-hours to be written again. To prevent that scenario from ever happening, our SRE team at Wikimedia recently refactored the relational database recovery system.
In this session, we will discuss how we backup 550TB of MariaDB data without impacting the 15 billion page views per month we get. We will cover what were our initial plans to replace the old infrastructure, how we achieved recovering 2TB databases in less than 30 minutes while maintaining per-table granularity, as well as the different types of backups we implemented. Lastly, we will talk about lessons learned, what went well, how our original plans changed and future work.
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
QuestDB es una base de datos open source de alto rendimiento. Mucha gente nos comentaba que les gustaría usarla como servicio, sin tener que gestionar las máquinas. Así que nos pusimos manos a la obra para desarrollar una solución que nos permitiese lanzar instancias de QuestDB con provisionado, monitorización, seguridad o actualizaciones totalmente gestionadas.
Unos cuantos clusters de Kubernetes más tarde, conseguimos lanzar nuestra oferta de QuestDB Cloud. Esta charla es la historia de cómo llegamos ahí. Hablaré de herramientas como Calico, Karpenter, CoreDNS, Telegraf, Prometheus, Loki o Grafana, pero también de retos como autenticación, facturación, multi-nube, o de a qué tienes que decir que no para poder sobrevivir en la nube.
GlusterFs: a scalable file system for today's and tomorrow's big dataRoberto Franchini
Roberto Franchini gave a presentation on GlusterFS, an open source, scalable distributed file system. GlusterFS allows storage to be scaled out by adding more servers with disks. It uses a flexible peer-to-peer architecture with data distributed across server bricks. Franchini discussed how his company implemented GlusterFS to store large indexes for information extraction tasks, scaling from 4TB to 28TB as storage needs grew. He provided recommendations on best practices like dedicated servers, same-sized bricks, and planning for future expansion.
GlusterFS : un file system open source per i big data di oggi e domani - Robe...Codemotion
GlusterFS (www.gluster.org) è un file system distribuito open source, scalabile fino ai petabytes. La presentazione ha lo scopo di mostrare le feature di questo FS e la nostra esperienza, che parte nel 2010 con un cluster da 4TB all'odierno da 30TB: perché è stato scelto, principali features, evoluzione, fallimenti (anche quelli), futuro. Alcune feature: accesso in user-space, protocollo nativo, NFS, SMB . Replicazione, distribuzione, striping dei file o una loro combinazione (e.g: distributed striped replicated). All'interno dell'ecosistema Hadoop può sostituire HDFS.
This document discusses using GlusterFS for Hadoop. GlusterFS is an open-source distributed file system that aggregates storage and provides a unified namespace. It can be used as the storage layer for Hadoop, replacing HDFS. Using GlusterFS provides advantages like a POSIX-compliant file system, ability to use the same storage for MapReduce and application data, and features like geo-replication and erasure coding. GlusterFS also integrates with projects like Apache Spark, Ambari, and OpenStack.
This document discusses testing Kubernetes and OpenShift at scale. It describes installing large clusters of 1000+ nodes, using scalability test tools like the Kubernetes performance test repo and OpenShift SVT repo to load clusters and generate traffic. Sample results show loading clusters with thousands of pods and projects, and peaks in master node resource usage when loading and deleting hundreds of pods simultaneously.
This document provides an overview of installing and configuring a 3 node GPFS cluster. It discusses using 8 shared LUNs across the 3 servers to simulate having disks from 2 different V7000 storage arrays for redundancy. The disks will be divided into 2 failure groups, with hdisk1-4 in one failure group representing one simulated array, and hdisk5-8 in the other failure group representing the other simulated array. This is to ensure redundancy in case of failure of an entire storage array.
Performance characterization in large distributed file system with gluster fsNeependra Khare
This document discusses the key features and capabilities of the Gluster distributed file system including its ability to provide highly scalable storage across multiple petabytes using scale-out clusters. It can replicate data across geographic locations for disaster recovery and uses an algorithmic approach to avoid metadata bottlenecks. Gluster leverages commodity hardware and is software-only with no SAN requirements. It supports various deployment models including physical, virtual, cloud, and hybrid and uses open protocols like NFS, CIFS, and REST.
Red Hat GFS (Global File System) is a cluster file system that allows nodes in a cluster to simultaneously access a shared block storage device. It employs distributed metadata and multiple journals to operate optimally in a cluster. GFS uses a lock manager to coordinate I/O and maintain file system integrity. It provides benefits like simplified data infrastructure management, maximized storage resource use, seamless cluster scaling, and high performance access to data. GFS can be deployed with different configurations to suit various needs for performance, scalability, and cost. It provides data sharing, a consistent namespace, and features required for enterprise environments.
1049: Best and Worst Practices for Deploying IBM Connections - IBM Connect 2016panagenda
Depending on deployment size, operating system and security considerations you have different options to configure IBM Connections. This session show good and bad examples on how to do it from multiple customer deployments. Christoph Stoettner describes things he found and how you can optimize your systems. Main topics include simple (documented) tasks that should be applied, missing documentation, automated user synchronization, TDI solutions and user synchronization, performance tuning, security optimizing and planning Single Sign On for mail, IBM Sametime and SPNEGO. This is valuable information that will help you to be successful in your next IBM Connections deployment project.
A presentation from Christoph Stoettner (panagenda).
Similar to Glusterfs for sysadmins-justin_clift (20)
The document discusses the lifecycle of a Gluster volume, including creation, maintenance through software upgrades and hardware repairs, and decommission.
During creation, hardware must be homogenous and the layout distributed across racks for high availability. Maintenance involves batch upgrading software on racks serially to avoid quorum loss, and replacing hardware by first copying data in blocks and then using replace-brick. Decommissioning full hardware requires copying all data in blocks first to another node before replacing to ensure integrity and no downtime.
nfusr: a new userspace NFS client based on libnfs - Shreyas SiravaraGluster.org
nfusr is a userspace NFSv3 client built on top of libnfs and designed with four key goals: (1) allowing connections to multiple NFS servers for load balancing and high availability, (2) retrying spurious errors to improve reliability, (3) enabling quick iteration and easy upgrades through its modern C++ codebase, and (4) collecting client-side metrics to measure performance and reliability as experienced by users. The open source nfusr client aims to offer an improved alternative to the traditional kernel-based NFS client.
Facebook’s upstream approach to GlusterFS - David HassonGluster.org
Facebook has been using GlusterFS since 2012 and has increasingly contributed major improvements to the open source project. They launched a public GitHub branch last year and have been porting their 200+ patches to the upstream codebase. Facebook also significantly increased their testing of the GlusterFS code from 50 tests to around 250 tests to ensure quality. Their goal is to have all Facebook patches merged into the GlusterFS master branch before the 4.0 release.
The document discusses quality of service (QoS) features added to Gluster at Facebook to allow tenants sharing volumes to get a fair share of I/O performance. It describes how tenants are identified by tagging file operations with namespace IDs. A weighted fair queuing algorithm was added to the I/O threads translator to maintain separate queues per tenant and ensure each gets a share of operations proportional to their weight. Testing showed the approach can control ratios of I/O rates between high and low priority tenants. Future plans include full production rollout of these changes and exploring other queueing algorithms.
Richard Wareing presented on using XFS realtime subvolumes to improve GlusterFS metadata performance. Traditional solutions like page caching are limited, while dedicated metadata stores add complexity. XFS realtime subvolumes combine benefits by storing metadata on SSDs for improved performance without changing GlusterFS core. Facebook is working on kernel patches to optimize realtime allocation and integration. The presentation addressed strengths and weaknesses of GlusterFS and opportunities to improve scaling and code quality.
VDO is a Linux device mapper driver that provides data deduplication and compression. It has two kernel modules that work together to provide data efficiency at the block layer and maintain the deduplication index. VDO can increase storage efficiency and reduce costs for applications like Gluster file storage. It works by creating a VDO logical volume on a block device and then creating a file system on the logical volume. Real-world testing of VDO on container images showed a data reduction of over 80%.
Releases: What are contributors responsible forGluster.org
The document discusses the responsibilities of contributors for software releases. It states that contributors should file issues on GitHub for new features and use Bugzilla to report bugs. It emphasizes that documentation, testing, reviews, and tracking bugs in the release process are important responsibilities. Contributors are encouraged to discuss ideas first, design features, write code and documentation, and thoroughly test their work before seeking reviews.
RIO Distribution: Reconstructing the onion - Shyamsundar RanganathanGluster.org
This document discusses reconstructing GlusterFS's volume graph distribution from DHT to RIO. RIO stands for Relation Inherited Object distribution and distributes inode and data objects across different subvolumes. The key points are:
1) Inodes are separated into metadata and data objects stored on different "rings" or subvolumes for improved scalability and consistency.
2) Relations between objects like parent-child inodes and file inodes to data objects determine their locations through inheritance.
3) The distribution provides better performance, scalability and rebalancing compared to the previous DHT method.
4) Changes to on-disk formats, consistency handling and availability layers are required to fully implement the R
GlusterFS can be used with Kubernetes in several ways:
1) As a volume driver to provide persistent storage and shared access to data across containers using existing GlusterFS volumes.
2) Through local volumes which use hostPath provisioning to leverage GlusterFS mounts but are not suitable for production.
3) With Heketi which provides dynamic provisioning of GlusterFS volumes through a REST API and integration with Kubernetes.
4) Potentially through Rook which aims to integrate storage services like Ceph and GlusterFS to provide turnkey storage and currently supports Ceph.
The document discusses the strengths, weaknesses, opportunities, and threats (SWOT) of Gluster, an open source distributed file system. Some of Gluster's key strengths include its ability to provide hybrid cloud storage, its growing community support and predictable release cycles, and its integration with ecosystems like Kubernetes and OpenStack. Weaknesses include difficulties with usability, management and monitoring at large scales, lack of automation for testing and operations, and discoverability issues with documentation. Opportunities lie in deeper container/autonomic computing integration and leveraging new hardware, while threats include competition from public cloud and proprietary storage vendors.
GD2 is a new management system for Gluster 4.0 written in Go that provides better scalability, integration and maintenance compared to the previous version. It provides basic cluster and volume management capabilities currently through commands, HTTP API, and GRPC. Future versions will focus on stabilizing features and adding capabilities like automatic volume management, migration support and centralized logging.
Brick multiplexing allows multiple storage bricks in GlusterFS to be managed by a single process, reducing resource usage. Performance testing showed no degradation with brick multiplexing enabled, and it allows faster scaling to support more persistent volumes. Memory usage is lower with brick multiplexing, allowing more volumes to be supported on the same hardware. Brick multiplexing improves scalability and is recommended to be left enabled.
The document discusses different ways projects can fail their users, developers, and community. It notes bugs and poor documentation can fail users. Tests taking too long and difficult automation can fail developers. A project can fail its community by not recognizing contributors, not handling patches kindly, and not fixing documentation issues raised by users. It also discusses whether commitments around performance testing, writing tests with Glusto, and fixing static analysis errors are being kept.
This document discusses integrating Heketi functionality into Glusterd2. It provides an agenda that covers Glusterd.Next, Heketi, and potential Heketi features in Glusterd2. It describes Glusterd.Next goals like a RESTful API and treating volumes as the smallest unit. It introduces Heketi and how it implements several Glusterd.Next ideas like hiding volume creation complexity and managing multiple Gluster clusters. Potential Heketi features in Glusterd2 discussed include supporting volume creation by bricks or size and making adding new volume types easier. A demo of integration is also mentioned.
Webinar: Designing a schema for a Data WarehouseFederico Razzoli
Are you new to data warehouses (DWH)? Do you need to check whether your data warehouse follows the best practices for a good design? In both cases, this webinar is for you.
A data warehouse is a central relational database that contains all measurements about a business or an organisation. This data comes from a variety of heterogeneous data sources, which includes databases of any type that back the applications used by the company, data files exported by some applications, or APIs provided by internal or external services.
But designing a data warehouse correctly is a hard task, which requires gathering information about the business processes that need to be analysed in the first place. These processes must be translated into so-called star schemas, which means, denormalised databases where each table represents a dimension or facts.
We will discuss these topics:
- How to gather information about a business;
- Understanding dictionaries and how to identify business entities;
- Dimensions and facts;
- Setting a table granularity;
- Types of facts;
- Types of dimensions;
- Snowflakes and how to avoid them;
- Expanding existing dimensions and facts.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
2. JUSTIN CLIFT – JCLIFT@REDHAT.COM2
#whoami
● Experienced SysAdmin (Solaris and
Linux) for many, many years
● Mostly worked on Mission Critical
systems in corporate/enterprise
environments (eg. Telco, banking,
Insurance)
● Has been helping build Open Source
Communities after hours for many
years (PostgreSQL, OpenOffice)
● Dislikes networks being a bottleneck
(likes Infiniband. A lot :>)
● Joined Red Hat mid 2010
jclift@redhat.com
3. JUSTIN CLIFT – JCLIFT@REDHAT.COM3
Agenda
● Technology Overview
● Scaling Up and Out
● A Peek at GlusterFS Logic
● Redundancy and Fault Tolerance
● Data Access
● General Administration
● Use Cases
● Common Pitfalls
5. JUSTIN CLIFT – JCLIFT@REDHAT.COM5
What is GlusterFS?
● POSIX-Like Distributed File System
● No Metadata Server
● Network Attached Storage (NAS)
● Heterogeneous Commodity Hardware
● Aggregated Storage and Memory
● Standards-Based – Clients, Applications, Networks
● Flexible and Agile Scaling
● Capacity – Petabytes and beyond
● Performance – Thousands of Clients
● Single Global Namespace
6. JUSTIN CLIFT – JCLIFT@REDHAT.COM6
GlusterFS vs. Traditional Solutions
● A basic NAS has limited scalability and redundancy
● Other distributed filesystems limited by metadata
● SAN is costly & complicated but high performance &
scalable
● GlusterFS
● Linear Scaling
● Minimal Overhead
● High Redundancy
● Simple and Inexpensive Deployment
8. JUSTIN CLIFT – JCLIFT@REDHAT.COM8
Terminology
● Brick
● A filesystem mountpoint
● A unit of storage used as a GlusterFS building block
● Translator
● Logic between the bits and the Global Namespace
● Layered to provide GlusterFS functionality
● Volume
● Bricks combined and passed through translators
● Node / Peer
● Server running the gluster daemon and sharing
volumes
9. JUSTIN CLIFT – JCLIFT@REDHAT.COM9
Foundation Components
● Private Cloud (Datacenter)
● Common Commodity x86_64 Servers
● Public Cloud
● Amazon Web Services (AWS)
● EC2 + EBS
10. JUSTIN CLIFT – JCLIFT@REDHAT.COM10
Disk, LVM, and Filesystems
● Direct-Attached Storage (DAS)
-or-
● Just a Bunch Of Disks (JBOD)
● Hardware RAID
● Logical Volume Management (LVM)
● XFS, EXT3/4, BTRFS
● Extended attributes (xattr's) support required
● XFS is strongly recommended for GlusterFS
11. JUSTIN CLIFT – JCLIFT@REDHAT.COM11
Gluster Components
● glusterd
● Elastic volume management daemon
● Runs on all export servers
● Interfaced through gluster CLI
● glusterfsd
● GlusterFS brick daemon
● One process for each brick
● Managed by glusterd
19. JUSTIN CLIFT – JCLIFT@REDHAT.COM19
Elastic Hash Algorithm
● No central metadata
● No Performance Bottleneck
● Eliminates risk scenarios
● Location hashed intelligently on path and filename
● Unique identifiers, similar to md5sum
● The “Elastic” Part
● Files assigned to virtual volumes
● Virtual volumes assigned to multiple bricks
● Volumes easily reassigned on the fly
25. JUSTIN CLIFT – JCLIFT@REDHAT.COM25
Geo Replication
● Asynchronous across LAN, WAN, or Internet
● Master-Slave model -- Cascading possible
● Continuous and incremental
● Data is passed between defined master and slave only
26. JUSTIN CLIFT – JCLIFT@REDHAT.COM26
Replicated Volumes vs Geo-replication
Replicated Volumes Geo-replication
Mirrors data across clusters Mirrors data across geographically
distributed clusters
Provides high-availability Ensures backing up of data for disaster
recovery
Synchronous replication (each and every
file operation is sent across all the bricks)
Asynchronous replication (checks for the
changes in files periodically and syncs
them on detecting differences)
33. JUSTIN CLIFT – JCLIFT@REDHAT.COM33
GlusterFS Native Client (FUSE)
● FUSE kernel module allows the filesystem to be built
and operated entirely in userspace
● Specify mount to any GlusterFS node
● Native Client fetches volfile from mount server, then
communicates directly with all nodes to access data
● Recommended for high concurrency and high write
performance
● Load is inherently balanced across distributed volumes
34. JUSTIN CLIFT – JCLIFT@REDHAT.COM34
NFS
● Standard NFS v3 clients
● Note: Mount with vers=3 option
● Standard automounter is supported
● Mount to any node, or use a load balancer
● GlusterFS NFS server includes Network Lock Manager
(NLM) to synchronize locks across clients
● Better performance for reading many small files from a
single client
● Load balancing must be managed externally
35. JUSTIN CLIFT – JCLIFT@REDHAT.COM35
SMB/CIFS
● GlusterFS volume is first mounted with the Native
Client
● Redundantly on the GlusterFS peer
-or-
● On an external server
● Native mount point is then shared via Samba
● Must be setup on each node you wish to connect to via
CIFS
● Load balancing must be managed externally
38. JUSTIN CLIFT – JCLIFT@REDHAT.COM38
Adding Nodes (peers) and Volumes
gluster> peer probe server3
gluster> peer status
Number of Peers: 2
Hostname: server2
Uuid: 5e987bda-16dd-43c2-835b-08b7d55e94e5
State: Peer in Cluster (Connected)
Hostname: server3
Uuid: 1e0ca3aa-9ef7-4f66-8f15-cbc348f29ff7
State: Peer in Cluster (Connected)
gluster> volume create my-dist-vol server2:/brick2 server3:/brick3
gluster> volume info my-dist-vol
Volume Name: my-dist-vol
Type: Distribute
Status: Created
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: server2:/brick2
Brick2: server3:/brick3
gluster> volume start my-dist-vol
Distributed Volume
Peer Probe
39. JUSTIN CLIFT – JCLIFT@REDHAT.COM39
Distributed Striped Replicated Volume
gluster> volume create test-volume replica 2 stripe 2 transport tcp
server1:/exp1 server1:/exp2 server2:/exp3 server2:/exp4
server3:/exp5 server3:/exp6 server4:/exp7 server4:/exp8
Multiple bricks of a replicate volume are present on the same server. This setup is not
optimal.
Do you still want to continue creating the volume? (y/n) y
Creation of volume test-volume has been successful. Please start the volume to access
data.
<- test-volume<- test-volume
<- distributed files -><- distributed files ->
stripe 2stripe 2
replica 2replica 2
40. JUSTIN CLIFT – JCLIFT@REDHAT.COM40
Distributed Striped Replicated Volume
gluster> volume info test-volume
Volume Name: test-volume
Type: Distributed-Striped-Replicate
Volume ID: 8f8b8b59-d1a1-42fe-ae05-abe2537d0e2d
Status: Created
Number of Bricks: 2 x 2 x 2 = 8
Transport-type: tcp
Bricks:
Brick1: server1:/exp1
Brick2: server2:/exp3
Brick3: server1:/exp2
Brick4: server2:/exp4
Brick5: server3:/exp5
Brick6: server4:/exp7
Brick7: server3:/exp6
Brick8: server4:/exp8
gluster> volume create test-volume stripe 2 replica 2 transport tcp
server1:/exp1 server2:/exp3 server1:/exp2 server2:/exp4
server3:/exp5 server4:/exp7 server3:/exp6 server4:/exp8
Creation of volume test-volume has been successful. Please start the volume to access
data.
42. JUSTIN CLIFT – JCLIFT@REDHAT.COM42
Migrating Data / Replacing Bricks
gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 start
gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 status
Current File = /usr/src/linux-headers-2.6.31-14/block/Makefile
Number of files migrated = 10567
Migration complete
gluster> volume replace-brick my-dist-vol server3:/brick3 server5:/brick5 commit
43. JUSTIN CLIFT – JCLIFT@REDHAT.COM43
Volume Options
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
gluster> volume set my-dist-vol nfs.volume-access read-only
gluster> volume set my-dist-vol nfs.disable on
gluster> volume set my-dist-vol features.read-only on
gluster> volume set my-dist-vol performance.cache-size 67108864
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
NFS
Auth
Other advanced options
44. JUSTIN CLIFT – JCLIFT@REDHAT.COM44
Volume Top Command
gluster> volume set my-dist-vol auth.allow 192.168.1.*
gluster> volume set my-dist-vol auth.reject 10.*
gluster> volume top my-dist-vol read brick server3:/brick3 list-cnt 3
Brick: server:/export/dir1
==========Read file stats========
read filename
call count
116 /clients/client0/~dmtmp/SEED/LARGE.FIL
64 /clients/client0/~dmtmp/SEED/MEDIUM.FIL
54 /clients/client2/~dmtmp/SEED/LARGE.FIL
● Many top commands are available for analysis of
files, directories, and bricks
● Read and write performance test commands available
● Perform active dd tests and measure throughput
48. JUSTIN CLIFT – JCLIFT@REDHAT.COM48
Common Solutions
● Media / Content Distribution Network (CDN)
● Backup / Archive / Disaster Recovery (DR)
● Large Scale File Server
● Home directories
● High Performance Computing (HPC)
● Infrastructure as a Service (IaaS) storage layer
49. JUSTIN CLIFT – JCLIFT@REDHAT.COM49
Hadoop – Map Reduce
● Access data within and outside of Hadoop
● No HDFS name node single point of failure / bottleneck
● Seamless replacement for HDFS
● Scales with the massive growth of big data
50. JUSTIN CLIFT – JCLIFT@REDHAT.COM50
CIC Electronic Signature Solutions
● Challenge
● Must leverage economics of the cloud
● Storage performance in the cloud too slow
● Need to meet demanding client SLA’s
● Solution
● Red Hat Storage Software Appliance
● Amazon EC2 and Elastic Block Storage (EBS)
● Benefits
● Faster development and delivery of new products
● SLA’s met with headroom to spare
● Accelerated cloud migration
● Scale-out for rapid and simple expansion
● Data is highly available for 24/7 client access
Hybrid Cloud: Electronic Signature Solutions
● Reduced time-to-
market for new
products
● Meeting all client SLAs
● Accelerating move to
the cloud
51. JUSTIN CLIFT – JCLIFT@REDHAT.COM51
Pandora Internet Radio
● Challenge
● Explosive user & title growth
● As many as 12 file formats for each song
● ‘Hot’ content and long tail
● Solution
● Three data centers, each with a six-node GlusterFS
cluster
● Replication for high availability
● 250+ TB total capacity
● Benefits
● Easily scale capacity
● Centralized management; one administrator to manage
day-to-day operations
● No changes to application
● Higher reliability
Private Cloud: Media Serving
● 1.2 PB of audio served
per week
● 13 million files
● Over 50 GB/sec peak
traffic
52. JUSTIN CLIFT – JCLIFT@REDHAT.COM52
Brightcove
• Over 1 PB currently in
Gluster
• Separate 4 PB project
in the works
Private Cloud: Media Serving
● Challenge
● Explosive customer & title growth
● Massive video in multiple locations
● Costs rising, esp. with HD formats
● Solution
● Complete scale-out based on commodity DAS/JBOD
and GlusterFS
● Replication for high availability
● 1PB total capacity
● Benefits
● Easily scale capacity
● Centralized management; one administrator to manage
day-to-day operations
● Higher reliability
● Path to multi-site
53. JUSTIN CLIFT – JCLIFT@REDHAT.COM53
Pattern Energy
• Rapid and advance
weather predictions
• Maximizing energy
assets
• Cost savings and
avoidance
High Performance Computing for Weather Prediction
●
Challenge
●
Need to deliver rapid advance weather predictions
●
Identify wind and solar abundance in advance
●
More effectively perform preventative maintenance and
repair
●
Solution
●
32 HP compute nodes
●
Red Hat SSA for high throughput and availability
●
20TB+ total capacity
●
Benefits
●
Predicts solar and wind patterns 3 to 5 days in advance
●
Maximize energy production and repair times
●
Avoid costs of outsourcing weather predictions
●
Solution has paid for itself many times over
55. JUSTIN CLIFT – JCLIFT@REDHAT.COM55
Split-Brain Syndrome
● Communication lost between replicated peers
● Clients write separately to multiple copies of a file
● No automatic fix
● May be subjective which copy is right – ALL may be!
● Admin determines the “bad” copy and removes it
● Self-heal will correct the volume
● Trigger a recursive stat to initiate
● Proactive self-healing in GlusterFS 3.3
56. JUSTIN CLIFT – JCLIFT@REDHAT.COM56
Quorum Enforcement
● Disallows writes (EROFS) on non-quorum peers
● Significantly reduces files affected by split-brain
● Preferred when data integrity is the priority
● Not preferred when application integrity is the priority
57. JUSTIN CLIFT – JCLIFT@REDHAT.COM57
Your Storage Servers are Sacred!
● Don't touch the brick filesystems directly!
● They're Linux servers, but treat them like appliances
● Separate security protocols
● Separate access standards
● Don't let your Jr. Linux admins in!
● A well-meaning sysadmin can quickly break your
system or destroy your data
59. 59
Do it!
● Build a test environment in VMs in just minutes!
● Get the bits:
● www.gluster.org has packages available for many
Linux distributions (CentOS, Fedora, RHEL,
Ubuntu)
60. Thank You!
● GlusterFS:
www.gluster.org
● Justin Clift - jclift@redhat.com
GlusterFS for SysAdmins
Slides Available at:
http://www.gluster.org/community/documentation/index.php/Presentations
Editor's Notes
-Dustin Black
-Red Hat Certified Architect
-Senior Technical Account Manager with Red Hat Global Support Services
-More than 10 years of systems and infrastructure experience, with focus on Linux, UNIX, and networking
-I am not a coder. I&apos;ll hack a script together, or read code from others reasonably well, but I would never presume to be a developer.
-I believe strongly in the power of openness in most all aspects of life and business. Openness only improves the interests we all share.
I hope to make the GlusterFS concepts more tangible. I want you to walk away with the confidence to start working with GlusterFS today.
-Commodity hardware: aggregated as building blocks for a clustered storage resource.
-Standards-based: No need to re-architect systems or applications, and no long-term lock-in to proprietary systems or protocols.
-Simple and inexpensive scalability.
-Scaling is non-interruptive to client access.
-Aggregated resources into unified storage volume abstracted from the hardware.
-Bricks are “stacked” to increase capacity
-Translators are “stacked” to increase functionality
-So if you&apos;re going to deploy GlusterFS, where do you start?
-Remember, for a datacenter deployment, we&apos;re talking about commodity server hardware as the foundation.
-In the public cloud space, your option right now is Amazon Web Services, with EC2 and EBS.
-RHS is the only supported high-availability option for EBS.
-XFS is the only filesystem supported with RHS.
-Extended attribute support is necessary because the file hash is stored there
-gluster console commands can be run directly, or in interactive mode. Similar to virsh, ntpq
-The native client uses fuse to build complete filesystem functionality without gluster itself having to operate in kernel space. This offers benefits in system stability and time-to-end-user for code updates.
No metadata == No Performance Bottleneck or single point of failure (compared to single metadata node) or corruption issues (compared to distributed metadata).
Hash calculation is faster than metadata retrieval
Elastic hash is the core of how gluster scales linerally
Modular building blocks for functionality, like bricks are for storage
When you configure geo-replication, you do so on a specific master (or source) host, replicating to a specific slave (or destination) host. Geo-replication does not scale linearly since all data is passed through the master and slave nodes specifically.
Because geo-replication is configured per-volume, you can gain scalability by choosing different master and slave geo-rep nodes for each volume.
Limited Read/Write performance increase, but in some cases the overhead can actually cause a performance degredation
-Graphic is wrong --
-Replica 0 is exp1 and exp3
-Replica 1 is exp2 and exp4
This graphic from the RHS 2.0 beta documentation actually represents a non-optimal configuration. We&apos;ll discuss this in more detail later.
-Native client will be the best choice when you have many nodes concurrently accessing the same data
-Client access to data is naturally load-balanced because the client is aware of the volume structure and the hash algorithm.
...mount with nfsvers=3 in modern distros that default to nfs 4
The need for this seems to be a bug, and I understand it is in the process of being fixed.
NFS will be the best choice when most of the data access by one client and for small files. This is mostly due to the benefits of native NFS caching.
Load balancing will need to be accomplished by external mechanisms
-Use the GlusterFS native client first to mount the volume on the Samba server, and then share that mount point with Samba via normal methods.
-GlusterFS nodes can act as Samba servers (packages are included), or it can be an external service.
-Load balancing will need to be accomplished by external mechanisms
An inode size smaller than 512 leaves no room for extended attributes (xattr). This means that every active inode will require a separate block for these. This has both a performance hit as well as a disk space usage penalty.
-peer status command shows all other peer nodes – excludes the local node
-I understand this to be a bug that&apos;s in the process of being fixed
-Support for striped replicated volumes is added in RHS 2.0
-I&apos;m using this example because it&apos;s straight out of the documentation, but I want to point out that this is not an optimal configuration.
-With this configuration, the replication happens between bricks on the same node.
-We should alter the order of the bricks here so that the replication is between nodes.
-Brick order is corrected to ensure replication is between bricks on different nodes.
-Replica is always processed first in building the volume, regardless of whether it&apos;s before or after stripe on the command line.
-So, a &apos;replica 2&apos; will create the replica between matching pairs of bricks in the order that the bricks are passed to the command. &apos;replica 3&apos; will be matching trios of bricks, and so on.
Must add and remove in multiples of the replica and stripe counts
??where is this stored, and how does it impact performance when on??
Specifying the double-colin between the remote host name and destination tells geo-replication that the destination is a glusterfs volume name. With a single colon, it will treat the destination as an absolute filesystem path.
Preceding the remote host with user@ causes geo-replication to interpret the communication protocol as ssh, which is generally preferred. If you use the simple remote syntax of:
hostname:volume
It will cause the local system to mount the remote volume locally with the native client, which will result in a necessary performance degradation because of the added fuse overhead. Over a WAN, this can be significant.
Cloud-based online video platform
Cloud-based online video platform
Patch written by Jeff Darcy
Question notes:
-Vs. CEPH
-CEPH is object-based at its core, with distributed filesystem as a layered function. GlusterFS is file-based at its core, with object methods (UFO) as a layered function.
-CEPH stores underlying data in files, but outside the CEPH constructs they are meaningless. Except for striping, GlusterFS files maintain complete integrity at the brick level.
-With CEPH, you define storage resources and data architecture (replication) separate, and CEPH actively and dynamically manages the mapping of the architecture to the storage. With GlusterFS, you manually manage both the storage resources and the data architecture.