This document provides an overview of block and inode allocation in IBM Spectrum Scale. It begins with relevant basics about Spectrum Scale including the cluster architecture, node roles, and the layers from disk to namespace. It then covers the two main allocation types: map-based allocation which is used for storage pools and filesets due to their static nature, and share-based allocation which is used for quotas given their more dynamic nature and administrator-initiated changes. Map-based allocation divides resources into regions that nodes can independently manage, while share-based allocation uses a quota sharing approach.
This document discusses Spectrum Scale memory usage. It outlines Spectrum Scale basics like clusters, nodes, and filesystems. It describes the different Spectrum Scale memory pools: pagepool for data, shared segment for metadata references, and external heap for daemons. It provides information on calculating memory needs based on parameters like files to cache, stat cache size, nodes, and access patterns. Other topics covered include related Linux memory usage and out of scope memory components.
Secondary storage devices like hard disks and CD-ROMs store data using magnetic or optical methods. Hard disks use magnetic platters to store binary data as magnetic polarity, organized into tracks, sectors, cylinders, and clusters. CD-ROMs use pits and lands encoded with a binary signal to store data optically along a spiral. Both devices allow fast random access to data through logical addressing schemes despite the physical layout of the storage medium.
This document discusses swap space management. It explains that swap space uses disk space as an extension of main memory through swapping and paging. It discusses how operating systems may support multiple swap spaces on separate disks to balance load. It also notes that it is better to overestimate than underestimate swap space needs to avoid crashing the system from running out of space. The document then covers locations for swap space, including within the file system or a separate partition, and tradeoffs of each approach.
This document discusses swap space in Linux systems. It explains that swap space uses disk space as virtual memory to hold process images when physical RAM is full. Swap space can be located in a separate disk partition or within the normal file system. Linux uses a swap map data structure to track which pages are stored in swap space. The goal of swap space is to make RAM usage more efficient and prevent the system from slowing down or crashing when physical memory is exhausted.
This document summarizes key aspects of disk structure, disk scheduling, disk management, swap space management, and Windows 2000. It discusses how disks are logically structured and addressed, factors that impact disk access time like seek time and rotational latency. It covers disk formatting, partitioning, boot blocks, handling bad blocks. It explains how operating systems manage swap space, where it is located, and examples of swap space use. It provides an overview of Windows 2000, describing its design, components like the kernel, virtual memory manager, I/O manager, and environmental subsystems.
Chapter 12 discusses mass storage systems and their role in operating systems. It describes the physical structure of disks and tapes and how they are accessed. Disks are organized into logical blocks that are mapped to physical sectors. Disks connect to computers via I/O buses and controllers. RAID systems improve reliability through redundancy across multiple disks. Operating systems provide services for disk scheduling, management, and swap space. Tertiary storage uses tape drives and removable disks to archive less frequently used data in large installations.
This presentation gives an overview of physical storage technologies and the various ways of accessing storage on a computer or a server. Presented at School of Engineering and Applied Science, Ahmedabad University as a part of Software Engineering course.
This document discusses mass storage structures including magnetic disks, solid state disks, disk structure, disk attachment methods like host-attached storage, network-attached storage, and storage area networks. It also covers disk scheduling algorithms, disk management topics, swap space management, RAID structures, and stable storage implementation. Magnetic disks are organized into platters, tracks, cylinders, and sectors. Solid state disks use flash memory or DRAM instead of magnetic platters. Disks can be attached directly to hosts or accessed over a network. Disk scheduling algorithms aim to minimize seek times and rotational latency when servicing multiple requests. RAID and swap space management improve reliability, performance and memory management respectively.
This document discusses Spectrum Scale memory usage. It outlines Spectrum Scale basics like clusters, nodes, and filesystems. It describes the different Spectrum Scale memory pools: pagepool for data, shared segment for metadata references, and external heap for daemons. It provides information on calculating memory needs based on parameters like files to cache, stat cache size, nodes, and access patterns. Other topics covered include related Linux memory usage and out of scope memory components.
Secondary storage devices like hard disks and CD-ROMs store data using magnetic or optical methods. Hard disks use magnetic platters to store binary data as magnetic polarity, organized into tracks, sectors, cylinders, and clusters. CD-ROMs use pits and lands encoded with a binary signal to store data optically along a spiral. Both devices allow fast random access to data through logical addressing schemes despite the physical layout of the storage medium.
This document discusses swap space management. It explains that swap space uses disk space as an extension of main memory through swapping and paging. It discusses how operating systems may support multiple swap spaces on separate disks to balance load. It also notes that it is better to overestimate than underestimate swap space needs to avoid crashing the system from running out of space. The document then covers locations for swap space, including within the file system or a separate partition, and tradeoffs of each approach.
This document discusses swap space in Linux systems. It explains that swap space uses disk space as virtual memory to hold process images when physical RAM is full. Swap space can be located in a separate disk partition or within the normal file system. Linux uses a swap map data structure to track which pages are stored in swap space. The goal of swap space is to make RAM usage more efficient and prevent the system from slowing down or crashing when physical memory is exhausted.
This document summarizes key aspects of disk structure, disk scheduling, disk management, swap space management, and Windows 2000. It discusses how disks are logically structured and addressed, factors that impact disk access time like seek time and rotational latency. It covers disk formatting, partitioning, boot blocks, handling bad blocks. It explains how operating systems manage swap space, where it is located, and examples of swap space use. It provides an overview of Windows 2000, describing its design, components like the kernel, virtual memory manager, I/O manager, and environmental subsystems.
Chapter 12 discusses mass storage systems and their role in operating systems. It describes the physical structure of disks and tapes and how they are accessed. Disks are organized into logical blocks that are mapped to physical sectors. Disks connect to computers via I/O buses and controllers. RAID systems improve reliability through redundancy across multiple disks. Operating systems provide services for disk scheduling, management, and swap space. Tertiary storage uses tape drives and removable disks to archive less frequently used data in large installations.
This presentation gives an overview of physical storage technologies and the various ways of accessing storage on a computer or a server. Presented at School of Engineering and Applied Science, Ahmedabad University as a part of Software Engineering course.
This document discusses mass storage structures including magnetic disks, solid state disks, disk structure, disk attachment methods like host-attached storage, network-attached storage, and storage area networks. It also covers disk scheduling algorithms, disk management topics, swap space management, RAID structures, and stable storage implementation. Magnetic disks are organized into platters, tracks, cylinders, and sectors. Solid state disks use flash memory or DRAM instead of magnetic platters. Disks can be attached directly to hosts or accessed over a network. Disk scheduling algorithms aim to minimize seek times and rotational latency when servicing multiple requests. RAID and swap space management improve reliability, performance and memory management respectively.
This document discusses mass storage systems. It begins with an overview of disk structure, including details on disk performance characteristics like seek time and rotational latency. It then covers topics like disk scheduling algorithms, disk management in operating systems, swap space management, RAID structures, and implementing stable storage. RAID levels like mirroring and striping with parity are explained. The document provides information on technologies like solid-state disks, magnetic tape, storage arrays, and network-attached storage.
By combining two best of breed solutions, one can create very powerful big data crunching solution.
Hadoop is a very popular big data solution with poor agility and not so great data transformation capabilities - despite what many Hadoop hype riding companies are trying to pitch.
By combining Hadoop's strengths with other very powerful open source technology - CloverETL, we get a nice synergy of both.
The document discusses mass storage systems, including disk structure, disk scheduling algorithms, disk management, RAID structure, disk attachment methods, stable storage implementation, and tertiary storage devices. It provides details on disk formatting, swap space management, different RAID levels, network attached storage, stable storage implementation, removable media like tapes and optical disks, operating system issues, and hierarchical storage management.
This document discusses mass storage systems and their management by operating systems. It covers disk structure, disk scheduling algorithms, disk management including partitioning and file systems, swap space management, RAID configurations, and implementing stable storage. The objectives are to describe mass storage devices, explain their performance characteristics, evaluate disk scheduling, and discuss operating system services for storage like RAID.
Useful documents for engineering students of CSE, and specially for students of aryabhatta knowledge university, Bihar (A.K.U. Bihar). It covers following topics: Disk structure, disk scheduling (FCFS, SSTF, SCAN, C-SCAN)
This document discusses mass storage systems and disk structure. It covers topics such as disk formatting, mapping logical blocks to physical sectors, disk attachment methods like SCSI and Fibre Channel, and disk scheduling algorithms. It also summarizes disk management techniques including partitioning, file systems, and swap space. Additional sections cover RAID configurations, stable storage implementation, and snapshot and replication features.
Secondary storage devices are used to store and retrieve data outside of the computer's main memory. They include internal hard drives and removable media like USB drives, CDs, DVDs, and tapes. Secondary storage saves data permanently, allows portability between devices, and comes in various sizes and formats. Common types discussed are fixed internal hard drives using magnetic disks, removable optical disks like CDs and DVDs, magnetic tapes for backups, and floppy disks which were an early portable storage type but have been replaced by higher capacity devices.
The document discusses various aspects of disk storage and organization in computer systems. It describes how disks are logically organized into blocks and mapped onto physical sectors. Different disk scheduling algorithms like FCFS, SSTF, SCAN, and C-SCAN are covered that aim to optimize disk head movement and request servicing time. The document also discusses disk formatting, swap space management in operating systems, and the architecture and subsystems of the Windows 2000 operating system.
This document discusses managing swap space in Linux. It defines swap space as a preconfigured area on disk that is used when physical RAM is full. It describes creating swap space using a dedicated partition or swap file. The steps shown create a partition or file, initialize it as swap using mkswap, enable it temporarily with swapon or permanently in fstab, and test the new swap. Managing swap space allows using disk space like RAM when memory is exhausted and aids in memory management and multiprogramming.
This document summarizes information about secondary storage devices and mass storage technologies. It discusses the physical structure and performance characteristics of magnetic disks, as well as technologies like solid state drives and tape drives. It also covers disk addressing, interfaces like SCSI and Fibre Channel, storage arrays, disk scheduling algorithms, RAID technologies, and operating system services for mass storage like swap space and journaling file systems.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
This document discusses mass storage systems and file systems. It begins with an overview of disk structure, including disk geometry, performance characteristics, and disk scheduling algorithms. RAID structures are also introduced. The document then covers file system concepts like file attributes and operations. It describes directory structures, file sharing, and file protection methods. Overall it provides a comprehensive overview of mass storage and file system interfaces from the operating system perspective.
This document discusses different types of storage devices, categorizing them as magnetic or optical. Magnetic storage devices include floppy disks, hard disks, and magnetic tape. Optical storage devices include CD-ROM, DVD-ROM, CD-R, and CD-RW. The document explains how data is stored on magnetic disks using polarized particles and on optical disks using pits and lands that reflect light differently. It provides details on formatting disks and the areas created, capacities of different devices, and speeds of CD-ROM and DVD drives.
The document discusses different types of parallelism that can be utilized in parallel database systems: I/O parallelism to retrieve relations from multiple disks in parallel, interquery parallelism to run different queries simultaneously, intraquery parallelism to parallelize operations within a single query, and intraoperation parallelism to parallelize individual operations like sort and join. It also covers techniques for partitioning relations across disks and handling skew to balance the workload.
GlusterFS is an open-source networked file system that provides scalable, distributed storage for OpenStack. It aggregates disk storage from multiple servers into a single global namespace that is accessible from anywhere. GlusterFS uses a distributed hashing technique to spread files across servers and can be configured for replication or erasure coding for redundancy. It integrates with OpenStack components like Swift, Cinder, and Glance to provide block, object, and file-level storage services for virtual machines. Future integration plans include a Cinder-like file service (FaaS) and support for Windows and NFS shares.
Disk drives store data on platters divided into concentric tracks which are further divided into sectors. Modern hard drives typically contain multiple platters and read/write heads that can access both sides of each platter. Disks are addressed sequentially rather than by geometry to simplify access. Various disk scheduling algorithms aim to minimize seek time and rotational latency by ordering requests to reduce head movement between tracks. Swap space on disk is used to extend virtual memory.
This document discusses mass storage devices. It defines mass storage devices as storage systems with capacities of trillions of bytes that use multiple storage media units like disks, tapes, or CD-ROMs as a single storage device. It provides examples of disk arrays using multiple magnetic disks, automated tape libraries using multiple tapes, and CD-ROM jukeboxes using multiple CD-ROMs. It then discusses disk arrays and RAIDs more in depth, describing how they provide enhanced storage capacity, performance, and reliability through techniques like mirroring and striping.
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...Maciek Jozwiak
Spanner is a globally distributed database that provides synchronous replication across data centers. It uses atomic clocks and GPS to synchronize time at a global scale and assign commit timestamps to distributed transactions. The database has a hierarchical data model and supports SQL-like queries while automatically sharding, replicating, and rebalancing data across servers.
Magnetic disks provide most secondary storage and are relatively simple. Each disk contains one or more flat, circular platters coated with magnetic material. Disks are logically divided into tracks and sectors for reading and writing data. Disks spin rapidly and have read/write heads that can move to different tracks on the platters. Disk scheduling algorithms like SSTF aim to minimize access times by prioritizing requests located near the heads' current position. Disks can be attached directly via I/O ports or over a network using NAS or SAN storage. Disk management includes formatting, handling bad blocks, and using swap space on disk as an extension of main memory.
This document discusses parallel databases and different types of parallelism that can be utilized in databases including I/O parallelism, interquery parallelism, intraquery parallelism, and interoperation parallelism. It provides details on how relations can be partitioned across multiple disks for I/O parallelism using techniques like round robin, hash partitioning, and range partitioning. It also compares these partitioning techniques in terms of supporting full relation scans, point queries, and range queries.
This document discusses mass storage systems. It begins with an overview of disk structure, including details on disk performance characteristics like seek time and rotational latency. It then covers topics like disk scheduling algorithms, disk management in operating systems, swap space management, RAID structures, and implementing stable storage. RAID levels like mirroring and striping with parity are explained. The document provides information on technologies like solid-state disks, magnetic tape, storage arrays, and network-attached storage.
By combining two best of breed solutions, one can create very powerful big data crunching solution.
Hadoop is a very popular big data solution with poor agility and not so great data transformation capabilities - despite what many Hadoop hype riding companies are trying to pitch.
By combining Hadoop's strengths with other very powerful open source technology - CloverETL, we get a nice synergy of both.
The document discusses mass storage systems, including disk structure, disk scheduling algorithms, disk management, RAID structure, disk attachment methods, stable storage implementation, and tertiary storage devices. It provides details on disk formatting, swap space management, different RAID levels, network attached storage, stable storage implementation, removable media like tapes and optical disks, operating system issues, and hierarchical storage management.
This document discusses mass storage systems and their management by operating systems. It covers disk structure, disk scheduling algorithms, disk management including partitioning and file systems, swap space management, RAID configurations, and implementing stable storage. The objectives are to describe mass storage devices, explain their performance characteristics, evaluate disk scheduling, and discuss operating system services for storage like RAID.
Useful documents for engineering students of CSE, and specially for students of aryabhatta knowledge university, Bihar (A.K.U. Bihar). It covers following topics: Disk structure, disk scheduling (FCFS, SSTF, SCAN, C-SCAN)
This document discusses mass storage systems and disk structure. It covers topics such as disk formatting, mapping logical blocks to physical sectors, disk attachment methods like SCSI and Fibre Channel, and disk scheduling algorithms. It also summarizes disk management techniques including partitioning, file systems, and swap space. Additional sections cover RAID configurations, stable storage implementation, and snapshot and replication features.
Secondary storage devices are used to store and retrieve data outside of the computer's main memory. They include internal hard drives and removable media like USB drives, CDs, DVDs, and tapes. Secondary storage saves data permanently, allows portability between devices, and comes in various sizes and formats. Common types discussed are fixed internal hard drives using magnetic disks, removable optical disks like CDs and DVDs, magnetic tapes for backups, and floppy disks which were an early portable storage type but have been replaced by higher capacity devices.
The document discusses various aspects of disk storage and organization in computer systems. It describes how disks are logically organized into blocks and mapped onto physical sectors. Different disk scheduling algorithms like FCFS, SSTF, SCAN, and C-SCAN are covered that aim to optimize disk head movement and request servicing time. The document also discusses disk formatting, swap space management in operating systems, and the architecture and subsystems of the Windows 2000 operating system.
This document discusses managing swap space in Linux. It defines swap space as a preconfigured area on disk that is used when physical RAM is full. It describes creating swap space using a dedicated partition or swap file. The steps shown create a partition or file, initialize it as swap using mkswap, enable it temporarily with swapon or permanently in fstab, and test the new swap. Managing swap space allows using disk space like RAM when memory is exhausted and aids in memory management and multiprogramming.
This document summarizes information about secondary storage devices and mass storage technologies. It discusses the physical structure and performance characteristics of magnetic disks, as well as technologies like solid state drives and tape drives. It also covers disk addressing, interfaces like SCSI and Fibre Channel, storage arrays, disk scheduling algorithms, RAID technologies, and operating system services for mass storage like swap space and journaling file systems.
An Overview of Spanner: Google's Globally Distributed DatabaseBenjamin Bengfort
Spanner is a globally distributed database that provides external consistency between data centers and stores data in a schema based semi-relational data structure. Not only that, Spanner provides a versioned view of the data that allows for instantaneous snapshot isolation across any segment of the data. This versioned isolation allows Spanner to provide globally consistent reads of the database at a particular time allowing for lock-free read-only transactions (and therefore no communications overhead for consensus during these types of reads). Spanner also provides externally consistent reads and writes with a timestamp-based linear execution of transactions and two phase commits. Spanner is the first distributed database that provides global sharding and replication with strong consistency semantics.
This document discusses mass storage systems and file systems. It begins with an overview of disk structure, including disk geometry, performance characteristics, and disk scheduling algorithms. RAID structures are also introduced. The document then covers file system concepts like file attributes and operations. It describes directory structures, file sharing, and file protection methods. Overall it provides a comprehensive overview of mass storage and file system interfaces from the operating system perspective.
This document discusses different types of storage devices, categorizing them as magnetic or optical. Magnetic storage devices include floppy disks, hard disks, and magnetic tape. Optical storage devices include CD-ROM, DVD-ROM, CD-R, and CD-RW. The document explains how data is stored on magnetic disks using polarized particles and on optical disks using pits and lands that reflect light differently. It provides details on formatting disks and the areas created, capacities of different devices, and speeds of CD-ROM and DVD drives.
The document discusses different types of parallelism that can be utilized in parallel database systems: I/O parallelism to retrieve relations from multiple disks in parallel, interquery parallelism to run different queries simultaneously, intraquery parallelism to parallelize operations within a single query, and intraoperation parallelism to parallelize individual operations like sort and join. It also covers techniques for partitioning relations across disks and handling skew to balance the workload.
GlusterFS is an open-source networked file system that provides scalable, distributed storage for OpenStack. It aggregates disk storage from multiple servers into a single global namespace that is accessible from anywhere. GlusterFS uses a distributed hashing technique to spread files across servers and can be configured for replication or erasure coding for redundancy. It integrates with OpenStack components like Swift, Cinder, and Glance to provide block, object, and file-level storage services for virtual machines. Future integration plans include a Cinder-like file service (FaaS) and support for Windows and NFS shares.
Disk drives store data on platters divided into concentric tracks which are further divided into sectors. Modern hard drives typically contain multiple platters and read/write heads that can access both sides of each platter. Disks are addressed sequentially rather than by geometry to simplify access. Various disk scheduling algorithms aim to minimize seek time and rotational latency by ordering requests to reduce head movement between tracks. Swap space on disk is used to extend virtual memory.
This document discusses mass storage devices. It defines mass storage devices as storage systems with capacities of trillions of bytes that use multiple storage media units like disks, tapes, or CD-ROMs as a single storage device. It provides examples of disk arrays using multiple magnetic disks, automated tape libraries using multiple tapes, and CD-ROM jukeboxes using multiple CD-ROMs. It then discusses disk arrays and RAIDs more in depth, describing how they provide enhanced storage capacity, performance, and reliability through techniques like mirroring and striping.
A brief introduction to Hadoop distributed file system. How a file is broken into blocks, written and replicated on HDFS. How missing replicas are taken care of. How a job is launched and its status is checked. Some advantages and disadvantages of HDFS-1.x
Google Spanner - Synchronously-Replicated, Globally-Distributed, Multi-Versio...Maciek Jozwiak
Spanner is a globally distributed database that provides synchronous replication across data centers. It uses atomic clocks and GPS to synchronize time at a global scale and assign commit timestamps to distributed transactions. The database has a hierarchical data model and supports SQL-like queries while automatically sharding, replicating, and rebalancing data across servers.
Magnetic disks provide most secondary storage and are relatively simple. Each disk contains one or more flat, circular platters coated with magnetic material. Disks are logically divided into tracks and sectors for reading and writing data. Disks spin rapidly and have read/write heads that can move to different tracks on the platters. Disk scheduling algorithms like SSTF aim to minimize access times by prioritizing requests located near the heads' current position. Disks can be attached directly via I/O ports or over a network using NAS or SAN storage. Disk management includes formatting, handling bad blocks, and using swap space on disk as an extension of main memory.
This document discusses parallel databases and different types of parallelism that can be utilized in databases including I/O parallelism, interquery parallelism, intraquery parallelism, and interoperation parallelism. It provides details on how relations can be partitioned across multiple disks for I/O parallelism using techniques like round robin, hash partitioning, and range partitioning. It also compares these partitioning techniques in terms of supporting full relation scans, point queries, and range queries.
This document summarizes key concepts related to I/O structure in operating systems, including disk structure, disk scheduling, disk management, and swap-space management. It discusses how disks are logically addressed and mapped to physical sectors. It also describes different disk scheduling algorithms like SSTF, SCAN, C-SCAN and factors that influence algorithm selection. The document outlines the processes involved in low-level formatting, logical formatting, and handling bad blocks. It concludes with an overview of swap-space management in various operating systems.
This document provides an overview of various data storage technologies including RAID, DAS, NAS, and SAN. It discusses RAID levels like RAID 0, 1, 5 which provide data striping and redundancy. Direct attached storage (DAS) connects directly to servers but cannot be shared, while network attached storage (NAS) uses file sharing protocols over IP networks. Storage area networks (SAN) use dedicated storage networks like Fibre Channel and iSCSI to provide block-level access to consolidated storage. The key is choosing the right solution based on capacity, performance, scalability, availability, data protection needs, and budget.
The document summarizes mass storage systems including disk structure, disk scheduling algorithms, disk management, RAID structure, and tertiary storage devices. It discusses how disks are logically addressed and mapped to physical sectors. It describes common disk scheduling algorithms like FCFS, SSTF, SCAN, and C-SCAN and factors in selecting an algorithm. It also outlines disk formatting, partitioning, bad block handling, and swap space management in operating systems.
The document discusses mass storage systems including disk structure, disk scheduling algorithms, disk management, RAID structure, disk attachment methods, stable storage implementation, and tertiary storage devices. It provides details on how disks are logically structured and mapped, common disk scheduling algorithms like FCFS, SSTF, SCAN, and C-SCAN, and how operating systems manage disks through partitioning and formatting. It also summarizes RAID levels, approaches to stable storage, and examples of tertiary storage devices like tapes, optical disks, and removable magnetic disks.
This chapter discusses mass storage systems including disk structure, disk scheduling algorithms, RAID structure, and tertiary storage devices. Disks are addressed as logical blocks mapped to physical sectors. Common disk attachment methods are SCSI, FC, NAS, and SAN. Disk scheduling algorithms like FCFS, SSTF, SCAN, and C-SCAN aim to minimize seek times. RAID and file systems manage disks. Tertiary storage uses removable media like floppies, magnetic, and optical disks for low-cost storage.
This document discusses storage management and disk structure. It covers mass storage structure including magnetic disks, disk platters, tracks, cylinders, sectors, and read/write heads. It then discusses disk structure in operating systems and concepts like surfaces, tracks, cylinders, and read/write heads. Finally, it covers scheduling and management techniques like long term, short term, and medium term schedulers as well as RAID structures and their benefits like increased reliability and performance.
This document discusses the structure and components of disk drives and operating system disk management. It covers topics like disk formatting, logical block addressing, zones and rotational speeds on disks, and different disk scheduling algorithms like FCFS, SSTF, SCAN, C-SCAN, and LOOK. It also summarizes file systems, swap space management, and networking protocols in operating systems.
This document provides an overview of mass storage structure and operating system services for mass storage. It discusses the physical structure of magnetic disks and tapes. It also covers disk scheduling algorithms, disk management including partitioning and file systems, swap space management, RAID structures, and implementing stable storage.
This document provides an overview of mass storage structure and operating system services related to mass storage. It discusses the physical structure of magnetic disks and tapes, including disk formatting, mapping, and scheduling algorithms. It also covers disk management techniques like partitioning and bad sector handling. Additional topics include swap space management, RAID configurations, and implementing stable storage using write-ahead logging.
This document provides an overview of mass storage structure and operating system services related to mass storage. It discusses the physical structure of magnetic disks and tapes as secondary storage devices. It covers disk scheduling algorithms, disk management including partitioning and file systems, and swap space management. The document also describes RAID structures for improving reliability and I/O performance as well as concepts related to stable storage implementation.
The document discusses file systems and deadlocks. It covers key aspects of file systems like space management, file names, directories, and metadata. It also discusses different types of file systems and file operations. The document then covers deadlocks, characterizing them and describing methods to handle deadlocks through prevention, avoidance, detection, and recovery.
The document provides an overview of log structured file systems. It discusses how log structured file systems work by writing all data and metadata sequentially to a circular buffer called a log to improve write performance. It also describes how log structured file systems address issues like limited disk space through garbage collection and provide simpler crash recovery without requiring a file system check.
- HDFS Federation allows Hadoop to scale beyond the limitations of a single namespace by splitting the namespace across multiple independent namenodes. Each namenode manages its own namespace volume consisting of a namespace and block pool.
- A client-side mount table provides a virtual unified namespace by mapping namespace volumes to namenodes, hiding the federation details from users and applications.
- HDFS Federation provides wire compatibility by requiring clients to use the same version of Hadoop as the servers, and supports existing HDFS functionality like append, sticky bits, and new APIs like FileContext.
The document discusses Ceph, an open-source distributed storage system. It provides an overview of Ceph's architecture and components, how it works, and considerations for setting up a Ceph cluster. Key points include: Ceph provides unified block, file and object storage interfaces and can scale exponentially. It uses CRUSH to deterministically map data across a cluster for redundancy. Setup choices like network, storage nodes, disks, caching and placement groups impact performance and must be tuned for the workload.
This document discusses parallel and distributed database architectures. It covers topics such as parallel databases, distributed databases, client-server architectures, and different types of parallelism like I/O parallelism, interquery parallelism, and intraquery parallelism. It also describes techniques for data partitioning and load balancing in parallel databases to handle issues like skew. Examples of commercial parallel database systems like IBM DB2 and Oracle RAC are provided.
This document discusses parallel and distributed database systems. It begins by describing centralized and client-server database architectures. It then covers parallel databases, including types of parallelism like I/O, inter-query, and intra-query parallelism. Distributed databases are also introduced, focusing on distributed data storage, transactions, concurrency control, and query processing. Specific architectures like client-server, shared-nothing, and shared disk are explained. Common techniques for data partitioning and parallel query execution are outlined.
This document discusses mass storage structure and operating system services for mass storage. It covers disk structure, disk scheduling algorithms, disk management including swap space and file systems, RAID structure, and stable storage implementation. It also describes magnetic tape and compares its performance characteristics to disk drives.
The Layer Cloning File System (LCFS) has been created to encourage increased innovation in a fundamental technology that boots all containers, improving the speed of downloading, booting, tearing-down, and building containers.
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...alexjohnson7307
Predictive maintenance is a proactive approach that anticipates equipment failures before they happen. At the forefront of this innovative strategy is Artificial Intelligence (AI), which brings unprecedented precision and efficiency. AI in predictive maintenance is transforming industries by reducing downtime, minimizing costs, and enhancing productivity.
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
Dive into the realm of operating systems (OS) with Pravash Chandra Das, a seasoned Digital Forensic Analyst, as your guide. 🚀 This comprehensive presentation illuminates the core concepts, types, and evolution of OS, essential for understanding modern computing landscapes.
Beginning with the foundational definition, Das clarifies the pivotal role of OS as system software orchestrating hardware resources, software applications, and user interactions. Through succinct descriptions, he delineates the diverse types of OS, from single-user, single-task environments like early MS-DOS iterations, to multi-user, multi-tasking systems exemplified by modern Linux distributions.
Crucial components like the kernel and shell are dissected, highlighting their indispensable functions in resource management and user interface interaction. Das elucidates how the kernel acts as the central nervous system, orchestrating process scheduling, memory allocation, and device management. Meanwhile, the shell serves as the gateway for user commands, bridging the gap between human input and machine execution. 💻
The narrative then shifts to a captivating exploration of prominent desktop OSs, Windows, macOS, and Linux. Windows, with its globally ubiquitous presence and user-friendly interface, emerges as a cornerstone in personal computing history. macOS, lauded for its sleek design and seamless integration with Apple's ecosystem, stands as a beacon of stability and creativity. Linux, an open-source marvel, offers unparalleled flexibility and security, revolutionizing the computing landscape. 🖥️
Moving to the realm of mobile devices, Das unravels the dominance of Android and iOS. Android's open-source ethos fosters a vibrant ecosystem of customization and innovation, while iOS boasts a seamless user experience and robust security infrastructure. Meanwhile, discontinued platforms like Symbian and Palm OS evoke nostalgia for their pioneering roles in the smartphone revolution.
The journey concludes with a reflection on the ever-evolving landscape of OS, underscored by the emergence of real-time operating systems (RTOS) and the persistent quest for innovation and efficiency. As technology continues to shape our world, understanding the foundations and evolution of operating systems remains paramount. Join Pravash Chandra Das on this illuminating journey through the heart of computing. 🌟
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
HCL Notes and Domino License Cost Reduction in the World of DLAU
Give or take a block
1. Give or take
Block and inode allocation in Spectrum
Scale
Tomer Perry
Spectrum Scale Development
<tomp@il.ibm.com>
Louise Bourgeois Give or Take 2002
4. Some relevant Scale Basics
”The Cluster”
• Cluster – A group of operating system
instances, or nodes, on which an
instance of Spectrum Scale is deployed
• Network Shared Disk - Any block
storage (disk, partition, or LUN) given to
Spectrum Scale for its use
• NSD server – A node making an NSD
available to other nodes over the network
• GPFS filesystem – Combination of data
and metadata which is deployed over
NSDs and managed by the cluster
• Client – A node running Scale Software
accessing a filesystem
nsd
nsd
srv
nsd nsd
nsd
srv
nsd
srv
nsd
srv
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
CCCCC
5. Some relevant Scale Basics
MultiCluster
• A Cluster can share some or all of its
filesystems with other clusters
• Such cluster can be referred to as
“Storage Cluster”
• A cluster that don’t have any local
filesystems can be referred to as “Client
Cluster”
• A Client cluster can connect to 31
clusters ( outbound)
• A Storage cluster can be mounted by
unlimited number of client clusters
( Inbound) – 16383 really.
• Client cluster nodes “joins” the storage
cluster upon mount
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
CCCCCCCCCCCCCCCCCCCCCCCCC
6. Some relevant Scale Basics
Node Roles
• While the general concept in Scale is “all nodes were made equal” some has special roles
Config Servers
Holds cluster
configuration files
4.1 onward supports
CCR - quorum
Not in the data path
Cluster Manager
One of the quorum
nodes
Manage leases,
failures and recoveries
Selects filesystem
managers
mm*mgr -c
Filesystem Manager
One of the “manager”
nodes
One per filesystem
Manage filesystem
configurations ( disks)
Space allocation
Quota management
Token Manager/s
Multiple per filesystem
Each manage portion
of tokens for each
filesystem based on
inode number
7. Some relevant Scale Basics
From disk to namespace
●
Scale filesystem ( GPFS) can be described as loosely coupled layers
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
NSD NSD NSD NSD NSD NSD NSD NSD
FS disk FS disk FS disk FS disk FS disk FS disk FS disk FS disk
Storage Pool Storage Pool
Filesystem
Fileset FilesetFilesetFileset
8. Some relevant Scale Basics
From disk to namespace
●
Scale filesystem ( GPFS) can be described as loosely coupled layers
●
Different layers have different properties
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
NSD NSD NSD NSD NSD NSD NSD NSD
FS disk FS disk FS disk FS disk FS disk FS disk FS disk FS disk
Storage Pool Storage Pool
Filesystem
Fileset FilesetFilesetFileset
Size/MediaType/device
Name/Servers
Type/Pool/FailureGroup
BlockSize/Replication/LogSize/Inodes….
inodes/quotas/AFM/parent...
9. Some relevant Scale Basics
From disk to namespace
●
Scale filesystem ( GPFS) can be described as loosely coupled layers
●
Different layers have different properties
●
Can be divided into “name space virtualization” and “Storage virtualization”
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
Disk/
LUN/
vdisk
NSD NSD NSD NSD NSD NSD NSD NSD
FS disk FS disk FS disk FS disk FS disk FS disk FS disk FS disk
Storage Pool Storage Pool
Filesystem
Fileset FilesetFilesetFileset
Storage Virtualization
Namespace Virtualization
11. Resource allocation
In distributed file systems
• One the main challenges in distributed software in general, and distributed filesystem in
particular, is how to efficiently manage resources while maintaining scalability
• Spectrum Scale approach was always “Scalability through autonomy” ( tokens, metanode,
lease protocol etc.)
• In the context of this session, we will limit the discussion to space allocation and
management: subblocks and inodes ( essentially, inode allocation is actually “metadata
space management”)
• Common approach, taking CAP theorem into account, is to use eventual consistency in
order to minimize the performance impact while still providing reasonable functionality.
12. Resource allocation
One size doesn’t fit all
• Although there are many similarities between subblock allocation and quotas, there are
substantial differences which eventually lead us to use different mechanisms for each
Type Managed entity Change rate Allocation method
Space
Allocation
Storage Pools
Filesets ( inodes)
Adding disks
Adding filesets
Map based
Quotas
Users/Groups/Filesets
Users/Groups@Filesets
Administrator
initiated
Share based
13. Resource allocation
One size doesn’t fit all
• Although there are many similarities between subblock allocation and quotas, there are
substantial differences which eventually lead us to use different mechanisms for each
Type Managed entity Change rate Allocation method
Space
Allocation
Storage Pools
Filesets ( inodes)
Adding disks
Adding filesets
Map based
Quotas
Users/Groups/Filesets
Users/Groups@Filesets
Administrator
initiated
Share based
Static
Dynamic
15. Allocation map based management
• Due to the relatively “static” nature of a filesystem space management it is possible to use
the allocation map based approach
• One of the main advantages of this approach is that with proper planning, its relatively
easier to allow each node to manage its own allocation independently
• The basic approach is to try and divide the managed resource into “regions”
• When a node needs to resources to be allocated, the filesystem manager assign a region
to that node
• From now on, the node can use all the resources in that region
• The number of regions is affected by the “-n” filesystem/pool creation paramter
( while one can change the -n parameter using mmchfs, it will only apply to pools created
after the change)
16. Block allocation map
• Divided into a fixed number of n equal size “regions”
– Each region contains bits representing blocks on all disks
– Different nodes can use different regions and still stripe across all disks ( many more regions
than nodes) – based on the -n parameter on storage pool creation
Disk 1
last blockblock 1
region 0 region 1 region 2 region n-1
block 0
Disk 2
Disk 3
17. Block allocation map
• When adding more disks:
– Alloc map file is extended to provide room for additional bits
– Each allocation region no consists of two separately stored segments
block n/2-1block 1
region 1, segment 0
Disk 1
Disk 2
Disk 3
region 0, segment 0 region 2, segment 0
block 0
Disk 4
last blockblock n/2+1block n/2
region n-1, segment 0
region 0, segment 1 region 1, segment 1 region 2, segment 1 region n-1, segment 1
18. Inode Allocation map
• The inode allocation map follow the same principal of the block allocation map
• That said, there are differences due to the way inodes are being used:
– inode have different “allocation states” ( free/used/created/deleted)
– The use of multiple inode spaces ( independent filesets) might imply non
contiguous allocation, allocation “holes” and other properties that might make
management a bit more complex
– Inode use is usually more “latency sensitive” - thus require using function shipping
methods in order to maintain good performance ( file creates etc.)
– Inode expansion might be manual ( preallocating) or automatic ( upon file creation)
• Inode space expansion logic is outside the scope of this session
–
–
19. Allocation map
• So what should I use for -n on file system creation?
• Try to estimate the number of nodes that “might” use the filesystem in the future
• Don’t just “use a large number”. The overhead of managing many segments/regions might
be constly
• As long as the number is not “way of” - it should be OK.
–
–
●
21. Quotas 101
• While the main goal of this session is to explain how quotas works in Scale, we should still
define some basic quota related terminology
• “A disk quota is a limit set by a system administrator that restricts certain aspects of file
system usage on modern operating systems. The function of using disk quotas is to
allocate limited disk space in a reasonable way” (source: wikipedia)
• There are two main types of quota: disk quotas and file quotas. The former limits the
maximum size a “quota entity” can use while the latter define the number of file system
elements that a “quota entity” can use ( “quota entity” might be a user, a group or a fileset)
• Other common terms are:
– Soft quota: a.k.a. warning level. When used with grace period implies real limit
– Hard quota: the effective limit. A quota entity can’t use more than that limit
– Grace period: a time frame during which soft quota can be exceeded before it
becomes a hard limit
22. Spectrum Scale quotas
• The dynamic nature of quotas, makes it extremely difficult to balance between flexibility
and performance impact
• Thus, an eventual consistency approach was chosen – using “quota shares”
• Basically, the quota manager ( part of the filesystem manager role) assign shares to
requesting clients. A client can use the share unless being revoked ( more details to come)
• Since the quota manager don’t know how much of a given share is actually being used, a
new term was introduced: in_doubt.
• The in_doubt value represent the shares that are “out there”. Its a factor of the share size
and the number of nodes mounting the filesystem
• In order to get an exact quota usage, the quota manager needs to contact all mounting
node in order to revoke the current shares they have.
●
23. Spectrum Scale quotas
”The share”
• Since we don’t want a client to contact the quota server for each request, we use the
concept of shares ( block and file share)
• Every time a node want to allocate a resource, it will contact the server and will receive a
share of resources
• In the past, the share size was static ( 20) but configurable. So one would need to balance
between performance impact and functionality ( what if in_doubt is bigger than the
quota…)
• Nowdays, dynamic shares are being used. It starts with qMinBlockShare/qMinFileShare (
* blocks/inodes) and might change based on usage patterns
●
24. Spectrum Scale quotas
”The share flow”
• The quota client node will calculate how much blocks/inodes it should request based
on historical data and will send the request to the quota server. It might be extremely
aggressive with that…
• The quota server will consider the request from a much wider perspective: Number of
nodes, remaining quota ( -in_doubt), grace period etc.
• The quota server will reply to the client request with the granted resources ( blocks or
inodes)
●
25. Spectrum Scale quotas
”The share flow” - under pressure
• It is also important to understand what’s going on when we’re getting “close to the
limit”. It always takes in_doubt into account as if its used
– Soft limit exceeded:
●
The grace counter kicks in
●
SoftQuotaExceeded callback is being executed ( if configured)
– Grace period ends:
●
Quota exceeded error
– Getting close to hard limit:
●
Quota server revoke from all clients for each request...performance impact
●
26. Spectrum Scale quotas
tips and tricks
• Sometimes, balancing the impact of quotas might be challenging. For example,
imagine and application which all its threads are being “throttled” due to quota revokes
slowing down access to other quota objects which are not “close to the edge” ( who
said Ganesha?).
• In those cases, it might be better to “fail fast”
• There are two major ways to achieve that:
– Use soft quotas only with short grace period ( 1 sec for example). In this case,
shortly after the soft quotas were exceeded, the app will get quota exceeded
error. It will use “a little more than the limit”
– Use the ( undocumented) qRevokeWatchThreshold parameter. Every time
we’re hitting HL-
(Usage+inDoubt)<qRevokeWatchThreshold*qMinShare we need to do
a revoke. We will add this revoke to the watch list. We won’t do another revoke
to that object for the next 60sec. So, if another share request will come in – we