This document discusses storage virtualization techniques. It covers what can be virtualized (file system and block levels), where virtualization can occur (host-based, network-based, storage-based), and how virtualization is implemented (in-band and out-of-band). Examples of storage virtualization include logical volume management (LVM) on Linux hosts, SAN volume controllers, and virtualization features in disk arrays. Key benefits are improved manageability, availability, scalability and security of storage resources.
A VSAN is a logical partition of a physical SAN that allows storage resources to be virtually isolated and independently managed. VSANs improve availability, scalability and security by allowing unused storage capacity on virtual servers to be pooled and accessed as needed while isolating traffic to contain problems. VSANs can be implemented via host-based, network-based or storage-based approaches.
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
IBM Spectrum Scale (formerly Elastic Storage) provides software defined storage capabilities using standard commodity hardware. It delivers automated, policy-driven storage services through orchestration of the underlying storage infrastructure. Key features include massive scalability up to a yottabyte in size, built-in high availability, data integrity, and the ability to non-disruptively add or remove storage resources. The software provides a single global namespace, inline and offline data tiering, and integration with applications like HDFS to enable analytics on existing storage infrastructure.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
Software-defined storage abstracts storage resources from physical hardware for greater flexibility and programmability. Storage virtualization pools physical storage into a single virtual storage device that is easier to manage. Hyperconverged storage bundles compute, storage, and networking resources together for simpler management. An essential IT disaster recovery program anticipates disasters, plans responses, and enables quick resumption of operations.
This document provides an overview of storage networking technologies, including storage area networks (SANs), fibre channel protocols, RAID configurations, and storage virtualization. It describes the basic components and functions of storage systems, such as primary and secondary storage, hard disk drives, RAID arrays, optical storage, and solid state drives. It also explains SAN fabrics, zoning, and how storage virtualization pools multiple storage devices to appear as a single device.
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
The document discusses big data storage concepts including cluster computing, distributed file systems, and different database types. It covers cluster structures like symmetric and asymmetric, distribution models like sharding and replication, and database types like relational, non-relational and NewSQL. Sharding partitions large datasets across multiple machines while replication stores duplicate copies of data to improve fault tolerance. Distributed file systems allow clients to access files stored across cluster nodes. Relational databases are schema-based while non-relational databases like NoSQL are schema-less and scale horizontally.
A VSAN is a logical partition of a physical SAN that allows storage resources to be virtually isolated and independently managed. VSANs improve availability, scalability and security by allowing unused storage capacity on virtual servers to be pooled and accessed as needed while isolating traffic to contain problems. VSANs can be implemented via host-based, network-based or storage-based approaches.
Elastic storage in the cloud session 5224 final v2BradDesAulniers2
IBM Spectrum Scale (formerly Elastic Storage) provides software defined storage capabilities using standard commodity hardware. It delivers automated, policy-driven storage services through orchestration of the underlying storage infrastructure. Key features include massive scalability up to a yottabyte in size, built-in high availability, data integrity, and the ability to non-disruptively add or remove storage resources. The software provides a single global namespace, inline and offline data tiering, and integration with applications like HDFS to enable analytics on existing storage infrastructure.
Big Data Architecture Workshop - Vahid Amiridatastack
Big Data Architecture Workshop
This slide is about big data tools, thecnologies and layers that can be used in enterprise solutions.
TopHPC Conference
2019
Software-defined storage abstracts storage resources from physical hardware for greater flexibility and programmability. Storage virtualization pools physical storage into a single virtual storage device that is easier to manage. Hyperconverged storage bundles compute, storage, and networking resources together for simpler management. An essential IT disaster recovery program anticipates disasters, plans responses, and enables quick resumption of operations.
This document provides an overview of storage networking technologies, including storage area networks (SANs), fibre channel protocols, RAID configurations, and storage virtualization. It describes the basic components and functions of storage systems, such as primary and secondary storage, hard disk drives, RAID arrays, optical storage, and solid state drives. It also explains SAN fabrics, zoning, and how storage virtualization pools multiple storage devices to appear as a single device.
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...raghdooosh
The document discusses big data storage concepts including cluster computing, distributed file systems, and different database types. It covers cluster structures like symmetric and asymmetric, distribution models like sharding and replication, and database types like relational, non-relational and NewSQL. Sharding partitions large datasets across multiple machines while replication stores duplicate copies of data to improve fault tolerance. Distributed file systems allow clients to access files stored across cluster nodes. Relational databases are schema-based while non-relational databases like NoSQL are schema-less and scale horizontally.
The document discusses key concepts related to distributed file systems including:
1. Files are accessed using location transparency where the physical location is hidden from users. File names do not reveal storage locations and names do not change when locations change.
2. Remote files can be mounted to local directories, making them appear local while maintaining location independence. Caching is used to reduce network traffic by storing recently accessed data locally.
3. Fault tolerance is improved through techniques like stateless server designs, file replication across failure independent machines, and read-only replication for consistency. Scalability is achieved by adding new nodes and using decentralized control through clustering.
The document discusses distributed file systems. It defines a distributed file system as a classical model of a file system distributed across multiple machines to promote sharing of dispersed files. Key aspects discussed include:
- Files are accessed using the same operations (create, read, etc.) regardless of physical location.
- Systems aim to make file locations transparent to clients through techniques like replication and unique file identifiers.
- Caching is used to improve performance by retaining recently accessed data locally to reduce remote access.
- Consistency must be maintained when copies are updated.
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxJohn Burwell
Software is eating infrastructure. Migrating reliability and
scalability responsibilities up the stack from specialized hardware to software, cloud orchestration platforms such as Apache CloudStack (ACS) and object stores such as Riak CS increase the utilization and density of compute and storage resources by dynamically shifting workloads based on demand. Together, these platform can saturate compute and storage of 1000s of commodity hosts with strong operational visibility and end-user self-service.
This presentation explores cloud design strategies to achieve high availability and reliability using commodity components. It then applies these strategies using Apache CloudStack and Riak CS.
Windows Server 2012 introduces new storage technologies like Storage Spaces and SMB 3.0 that can replace traditional SANs. These technologies provide high performance storage with easier administration and lower costs when used together. They enable virtualized storage through storage pools and spaces, storage resilience through hardware redundancy, and optimization of storage utilization.
Hadoop is a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It includes HDFS, a distributed file system, and MapReduce, a programming model for large-scale data processing. HDFS stores data reliably across clusters and allows computations to be processed in parallel near the data. The key components are the NameNode, DataNodes, JobTracker and TaskTrackers. HDFS provides high throughput access to application data and is suitable for applications handling large datasets.
Introduction to types of cloud storage and overview and comparison of the SoftLayer Storage Services. Topics covered include Block and File offerings"Codename: Prime", Consistent Performance, Mass Storage Servers (QuantaStor), and Backup (EVault, R1Soft), Object Storage (OpenStack Swift), CDN, Data Transfer Service, and Aspera.
002-Storage Basics and Application Environments V1.0.pptxDrewMe1
Storage Basics and Application Environments is a document that discusses storage concepts, hardware, protocols, and data protection basics. It begins by defining storage and describing different types including block storage, file storage, and object storage. It then covers basic concepts of storage hardware such as disks, disk arrays, controllers, enclosures, and I/O modules. Storage protocols like SCSI, NVMe, iSCSI, and Fibre Channel are also introduced. Additional concepts like RAID, LUNs, multipathing, and file systems are explained. The document provides a high-level overview of fundamental storage topics.
A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
This session will cover private and public cloud storage options, including flash, disk and tape, to address the different types of cloud storage requirements. It will also explain the use of Active File Management for local space management and global access to files, and support for file-and-sync.
This document provides an overview of various data storage technologies including RAID, DAS, NAS, and SAN. It discusses RAID levels like RAID 0, 1, 5 which provide data striping and redundancy. Direct attached storage (DAS) connects directly to servers but cannot be shared, while network attached storage (NAS) uses file sharing protocols over IP networks. Storage area networks (SAN) use dedicated storage networks like Fibre Channel and iSCSI to provide block-level access to consolidated storage. The key is choosing the right solution based on capacity, performance, scalability, availability, data protection needs, and budget.
Virtualization is the creation of a virtual -- rather than actual -- version of something, such as an operating system (OS), a server, a storage device or network resources. Virtualization uses software that simulates hardware functionality to create a virtual system.
This document provides an overview of Module 2 which covers implementing advanced file services in Windows Server 2012. It includes lessons on configuring iSCSI storage, BranchCache, and optimizing storage usage. The first lesson explains what iSCSI is and how to configure an iSCSI target and initiator. It also covers implementing high availability, security, and demonstrations of configuring an iSCSI target and connecting to iSCSI storage. The second lesson covers how BranchCache works and requirements for the different modes. It includes demonstrations of configuring BranchCache server and client settings. The third lesson explains features for optimizing storage like File Server Resource Manager (FSRM), file classification, data deduplication, and tiered storage.
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
The storage topology in vogue seems to cycle every few years. Internal storage is followed by centralized Storage Area Networks only to be superseded by one-size-fits-all Hyperconverged models - until scalability constraints led to distributed storage. Then comes NVMe, offering blistering speeds that all of these storage stacks struggle with. Kubernetes inspires Container Attached Storage aspiring to be the perfect model, so why is disaggregated storage now making an appearance?
This talk considers the motivations behind yet another storage topology and examines a modern, flexible architecture for delivering high-performance storage under Kubernetes.
This talk was given by Nick Connolly for DoK Day Europe @ KubeCon 2022.
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
Link: https://youtu.be/YhktX1W0geM
https://go.dok.community/slack
https://dok.community/
From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)
The storage topology in vogue seems to cycle every few years. Internal storage is followed by centralized Storage Area Networks only to be superseded by one-size-fits-all Hyperconverged models - until scalability constraints led to distributed storage. Then comes NVMe, offering blistering speeds that all of these storage stacks struggle with. Kubernetes inspires Container Attached Storage aspiring to be the perfect model, so why is disaggregated storage now making an appearance?
This talk considers the motivations behind yet another storage topology and examines a modern, flexible architecture for delivering high-performance storage under Kubernetes.
-----
Nick Connolly is a pioneer of storage virtualisation and the Chief Scientist at DataCore, where his background in real-time computing and multiprocessing led to the creation of a world-class high-performance storage stack on Windows. He holds patents ranging from highly scalable algorithms through to data protection techniques. Recently he has been working with OpenEBS to bring the power and performance of NVMe to Kubernetes.
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...ssuserec8a711
1. Cloud storage systems store multiple copies of data across many servers in various locations so that if one system fails, the data can be accessed from another location.
2. Storage providers use virtualization software to aggregate storage assets from various devices into a single cloud storage system called StorageGRID.
3. StorageGRID creates a virtualization layer that retrieves storage from different storage devices and manages it through a common file system interface over the internet.
This document discusses distributed file systems (DFS), which allow files to be dispersed across networked machines. A DFS includes clients that access files, servers that store files, and services that provide file access. Key features of DFS include mapping logical file names to physical storage locations, transparency so file locations are hidden, and replication to improve availability and performance. DFS supports either stateful or stateless access, with stateful requiring unique identifiers but enabling features like read-ahead. Namespaces and replication help organize files across multiple servers.
This document discusses distributed file systems. It defines a distributed file system as a classical file system model that is distributed across multiple machines to promote sharing of dispersed files. The key aspects covered are that clients, servers, and storage are dispersed across machines and clients should view a distributed file system the same way as a centralized file system, with the distribution hidden at a lower level. Performance concerns for distributed file systems include throughput and response time.
Storage devices are used to store data outside of a computer's main memory. There are different types of storage including primary storage like RAM and cache that is directly accessible by the CPU. Secondary storage like hard disks requires accessing through input/output channels. Tertiary storage uses robotic mechanisms to store data offline. Linux uses disk partitioning to organize storage across physical disks using schemes like MBR and GPT. Logical volumes and RAID provide additional abstraction and redundancy. Network storage solutions like NAS export file systems over a network while SANs export block storage using protocols like Fibre Channel and iSCSI.
This document proposes a system diagram for a live and video-on-demand OTT service with the following key requirements:
- Support up to 1000 concurrent subscribers streaming at 3 Mbps on average and 1 Gbps peak egress bandwidth
- Provide 7 days of catch-up TV and 30,000 hours of VOD content stored on 70 TB of dedicated storage
- Transcode linear channels and re-ingest VOD assets using Harmonic and Cambria transcoders
- Perform HLS packaging and origin services using the Harmonic Video OS (VOS) system with URL rewriting by load balancers
- Utilize an existing Mediagrid 4000 storage array connected to dual 10GbE NICs on the V
The document provides instructions for using a video editor app to create clips and generate URLs. It describes selecting a timeshift window by hour, minute, and AM/PM then defining a start and end point. It explains how to get HLS format URLs for catchup playback and make permanent VOD clips by generating them, then finding the completed job in the asset library.
More Related Content
Similar to 409793049-Storage-Virtualization-pptx.pptx
The document discusses key concepts related to distributed file systems including:
1. Files are accessed using location transparency where the physical location is hidden from users. File names do not reveal storage locations and names do not change when locations change.
2. Remote files can be mounted to local directories, making them appear local while maintaining location independence. Caching is used to reduce network traffic by storing recently accessed data locally.
3. Fault tolerance is improved through techniques like stateless server designs, file replication across failure independent machines, and read-only replication for consistency. Scalability is achieved by adding new nodes and using decentralized control through clustering.
The document discusses distributed file systems. It defines a distributed file system as a classical model of a file system distributed across multiple machines to promote sharing of dispersed files. Key aspects discussed include:
- Files are accessed using the same operations (create, read, etc.) regardless of physical location.
- Systems aim to make file locations transparent to clients through techniques like replication and unique file identifiers.
- Caching is used to improve performance by retaining recently accessed data locally to reduce remote access.
- Consistency must be maintained when copies are updated.
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxJohn Burwell
Software is eating infrastructure. Migrating reliability and
scalability responsibilities up the stack from specialized hardware to software, cloud orchestration platforms such as Apache CloudStack (ACS) and object stores such as Riak CS increase the utilization and density of compute and storage resources by dynamically shifting workloads based on demand. Together, these platform can saturate compute and storage of 1000s of commodity hosts with strong operational visibility and end-user self-service.
This presentation explores cloud design strategies to achieve high availability and reliability using commodity components. It then applies these strategies using Apache CloudStack and Riak CS.
Windows Server 2012 introduces new storage technologies like Storage Spaces and SMB 3.0 that can replace traditional SANs. These technologies provide high performance storage with easier administration and lower costs when used together. They enable virtualized storage through storage pools and spaces, storage resilience through hardware redundancy, and optimization of storage utilization.
Hadoop is a framework for distributed storage and processing of large datasets across clusters of commodity hardware. It includes HDFS, a distributed file system, and MapReduce, a programming model for large-scale data processing. HDFS stores data reliably across clusters and allows computations to be processed in parallel near the data. The key components are the NameNode, DataNodes, JobTracker and TaskTrackers. HDFS provides high throughput access to application data and is suitable for applications handling large datasets.
Introduction to types of cloud storage and overview and comparison of the SoftLayer Storage Services. Topics covered include Block and File offerings"Codename: Prime", Consistent Performance, Mass Storage Servers (QuantaStor), and Backup (EVault, R1Soft), Object Storage (OpenStack Swift), CDN, Data Transfer Service, and Aspera.
002-Storage Basics and Application Environments V1.0.pptxDrewMe1
Storage Basics and Application Environments is a document that discusses storage concepts, hardware, protocols, and data protection basics. It begins by defining storage and describing different types including block storage, file storage, and object storage. It then covers basic concepts of storage hardware such as disks, disk arrays, controllers, enclosures, and I/O modules. Storage protocols like SCSI, NVMe, iSCSI, and Fibre Channel are also introduced. Additional concepts like RAID, LUNs, multipathing, and file systems are explained. The document provides a high-level overview of fundamental storage topics.
A Distributed File System(DFS) is simply a classical model of a file system distributed across multiple machines.The purpose is to promote sharing of dispersed files.
Inter connect2016 yss1841-cloud-storage-options-v4Tony Pearson
This session will cover private and public cloud storage options, including flash, disk and tape, to address the different types of cloud storage requirements. It will also explain the use of Active File Management for local space management and global access to files, and support for file-and-sync.
This document provides an overview of various data storage technologies including RAID, DAS, NAS, and SAN. It discusses RAID levels like RAID 0, 1, 5 which provide data striping and redundancy. Direct attached storage (DAS) connects directly to servers but cannot be shared, while network attached storage (NAS) uses file sharing protocols over IP networks. Storage area networks (SAN) use dedicated storage networks like Fibre Channel and iSCSI to provide block-level access to consolidated storage. The key is choosing the right solution based on capacity, performance, scalability, availability, data protection needs, and budget.
Virtualization is the creation of a virtual -- rather than actual -- version of something, such as an operating system (OS), a server, a storage device or network resources. Virtualization uses software that simulates hardware functionality to create a virtual system.
This document provides an overview of Module 2 which covers implementing advanced file services in Windows Server 2012. It includes lessons on configuring iSCSI storage, BranchCache, and optimizing storage usage. The first lesson explains what iSCSI is and how to configure an iSCSI target and initiator. It also covers implementing high availability, security, and demonstrations of configuring an iSCSI target and connecting to iSCSI storage. The second lesson covers how BranchCache works and requirements for the different modes. It includes demonstrations of configuring BranchCache server and client settings. The third lesson explains features for optimizing storage like File Server Resource Manager (FSRM), file classification, data deduplication, and tiered storage.
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
The storage topology in vogue seems to cycle every few years. Internal storage is followed by centralized Storage Area Networks only to be superseded by one-size-fits-all Hyperconverged models - until scalability constraints led to distributed storage. Then comes NVMe, offering blistering speeds that all of these storage stacks struggle with. Kubernetes inspires Container Attached Storage aspiring to be the perfect model, so why is disaggregated storage now making an appearance?
This talk considers the motivations behind yet another storage topology and examines a modern, flexible architecture for delivering high-performance storage under Kubernetes.
This talk was given by Nick Connolly for DoK Day Europe @ KubeCon 2022.
Disaggregated Container Attached Storage - Yet Another Topology with What Pur...DoKC
Link: https://youtu.be/YhktX1W0geM
https://go.dok.community/slack
https://dok.community/
From the DoK Day EU 2022 (https://youtu.be/Xi-h4XNd5tE)
The storage topology in vogue seems to cycle every few years. Internal storage is followed by centralized Storage Area Networks only to be superseded by one-size-fits-all Hyperconverged models - until scalability constraints led to distributed storage. Then comes NVMe, offering blistering speeds that all of these storage stacks struggle with. Kubernetes inspires Container Attached Storage aspiring to be the perfect model, so why is disaggregated storage now making an appearance?
This talk considers the motivations behind yet another storage topology and examines a modern, flexible architecture for delivering high-performance storage under Kubernetes.
-----
Nick Connolly is a pioneer of storage virtualisation and the Chief Scientist at DataCore, where his background in real-time computing and multiprocessing led to the creation of a world-class high-performance storage stack on Windows. He holds patents ranging from highly scalable algorithms through to data protection techniques. Recently he has been working with OpenEBS to bring the power and performance of NVMe to Kubernetes.
Introduction to Enterprise Data Storage, Direct Attached Storage, Storage Ar...ssuserec8a711
1. Cloud storage systems store multiple copies of data across many servers in various locations so that if one system fails, the data can be accessed from another location.
2. Storage providers use virtualization software to aggregate storage assets from various devices into a single cloud storage system called StorageGRID.
3. StorageGRID creates a virtualization layer that retrieves storage from different storage devices and manages it through a common file system interface over the internet.
This document discusses distributed file systems (DFS), which allow files to be dispersed across networked machines. A DFS includes clients that access files, servers that store files, and services that provide file access. Key features of DFS include mapping logical file names to physical storage locations, transparency so file locations are hidden, and replication to improve availability and performance. DFS supports either stateful or stateless access, with stateful requiring unique identifiers but enabling features like read-ahead. Namespaces and replication help organize files across multiple servers.
This document discusses distributed file systems. It defines a distributed file system as a classical file system model that is distributed across multiple machines to promote sharing of dispersed files. The key aspects covered are that clients, servers, and storage are dispersed across machines and clients should view a distributed file system the same way as a centralized file system, with the distribution hidden at a lower level. Performance concerns for distributed file systems include throughput and response time.
Storage devices are used to store data outside of a computer's main memory. There are different types of storage including primary storage like RAM and cache that is directly accessible by the CPU. Secondary storage like hard disks requires accessing through input/output channels. Tertiary storage uses robotic mechanisms to store data offline. Linux uses disk partitioning to organize storage across physical disks using schemes like MBR and GPT. Logical volumes and RAID provide additional abstraction and redundancy. Network storage solutions like NAS export file systems over a network while SANs export block storage using protocols like Fibre Channel and iSCSI.
Similar to 409793049-Storage-Virtualization-pptx.pptx (20)
This document proposes a system diagram for a live and video-on-demand OTT service with the following key requirements:
- Support up to 1000 concurrent subscribers streaming at 3 Mbps on average and 1 Gbps peak egress bandwidth
- Provide 7 days of catch-up TV and 30,000 hours of VOD content stored on 70 TB of dedicated storage
- Transcode linear channels and re-ingest VOD assets using Harmonic and Cambria transcoders
- Perform HLS packaging and origin services using the Harmonic Video OS (VOS) system with URL rewriting by load balancers
- Utilize an existing Mediagrid 4000 storage array connected to dual 10GbE NICs on the V
The document provides instructions for using a video editor app to create clips and generate URLs. It describes selecting a timeshift window by hour, minute, and AM/PM then defining a start and end point. It explains how to get HLS format URLs for catchup playback and make permanent VOD clips by generating them, then finding the completed job in the asset library.
The document discusses Harmonic's VOS cloud-native media processing and delivery software. It provides an overview of VOS's key capabilities including its use of advanced cloud technology, infrastructure neutrality allowing deployment on bare metal or cloud data centers, advanced orchestration of media workflows, portability between private and public clouds, and support for playout, broadcast, OTT and delivery functions. The document also outlines VOS's flexible pricing models.
This document discusses different types of virtualization including memory, server, storage, and network virtualization. It provides details on each type such as how memory virtualization abstracts physical memory and allows for higher utilization. Server virtualization enables multiple operating systems to run on a single physical server. Storage virtualization presents a logical view of physical storage. Network virtualization creates logical networks independent of the physical network. The document also discusses benefits of each type of virtualization such as cost savings, improved performance, flexibility and management.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
Software Engineering and Project Management - Introduction, Modeling Concepts...Prakhyath Rai
Introduction, Modeling Concepts and Class Modeling: What is Object orientation? What is OO development? OO Themes; Evidence for usefulness of OO development; OO modeling history. Modeling
as Design technique: Modeling, abstraction, The Three models. Class Modeling: Object and Class Concept, Link and associations concepts, Generalization and Inheritance, A sample class model, Navigation of class models, and UML diagrams
Building the Analysis Models: Requirement Analysis, Analysis Model Approaches, Data modeling Concepts, Object Oriented Analysis, Scenario-Based Modeling, Flow-Oriented Modeling, class Based Modeling, Creating a Behavioral Model.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
Design and optimization of ion propulsion dronebjmsejournal
Electric propulsion technology is widely used in many kinds of vehicles in recent years, and aircrafts are no exception. Technically, UAVs are electrically propelled but tend to produce a significant amount of noise and vibrations. Ion propulsion technology for drones is a potential solution to this problem. Ion propulsion technology is proven to be feasible in the earth’s atmosphere. The study presented in this article shows the design of EHD thrusters and power supply for ion propulsion drones along with performance optimization of high-voltage power supply for endurance in earth’s atmosphere.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
2. Agenda
• Overview
Introduction
What to be virtualized
Where to be virtualized
How to be virtualized
• Case study
On linux system
• RAID
• LVM
• NFS
In distributed system
• Vastsky
• Lustre
• Ceph
• HDFS
3. Overview
• Introduction
• What to be virtualized ?
Block, File system
• Where to be virtualized ?
Host-based, Network-based, Storage-based
• How to be virtualized ?
In-band, Out-of-band
5. Introduction
• Common storage architecture :
DAS - Direct Attached Storage
• Storage device was directly attached to a
server or workstation, without a storage
network in between.
NAS - Network Attached Storage
• File-level computer data storage
connected to a computer network
providing data access to heterogeneous
clients.
SAN - Storage Area Network
• Attach remote storage devices to servers
in such a way that the devices appear as
locally attached to the operating system.
6. Introduction
• Desirable properties of storage virtualization:
Manageability
• Storage resource should be easily configured and deployed.
Availability
• Storage hardware failures should not affect the application.
Scalability
• Storage resource can easily scale up and down.
Security
• Storage resource should be securely isolated.
7. Introduction
• Storage concept and technique
Storage resource mapping table
Redundant data
Multi-path
Data sharing
Tiering
8. Concept and Technique
• Storage resource mapping table
Maintain tables to map storage resource to target.
Dynamic modify table entries for thin provisioning.
Use table to isolate different storage address space.
9. Concept and Technique
• Redundant data
Maintain replicas to provide high availability.
Use RAID technique to improve performance and availability.
10. Concept and Technique
• Multi-path
A fault-tolerance and performance
enhancement technique.
There is more than one physical path
between the host and storage devices
through the buses, controllers,
switches, and bridge devices
connecting them.
11. Concept and Technique
• Data sharing
Use data de-duplication technique to eliminate duplicated data.
Save and improve the usage of storage space
12. Concept and Technique
• Tiering
Automatic migrate data across storage resources with different
properties according to the significance or access frequency of data.
Example: iMac fusion drive
Storage Policies Access Group
14. What To Be Virtualized
• Layers can be virtualized
File system
• Provide compatible system call
interface to user space
applications.
Block device
• Provide compatible block device
interface to file system.
• Through the interface such as
SCSI, SAS, ATA, SATA, etc.
Kernel
Space
User
Space
Application
System call interface
File System
Block interface
Device driver
Storage Device
15. File System Level
• Data and Files
What is data ?
• Data is information that has been converted to a machine-readable,
digital binary format.
• Control information indicates how data should be processed.
• Applications may embed control information in user data for formatting or
presentation.
• Data and its associated control information is organized into discrete units
as files or records.
What is file ?
• Files are the common containers for user data, application code, and
operating system executables and parameters.
16. File System Level
• About the files
Metadata
• The control information for file management is known as metadata.
• File metadata includes file attributes and pointers to the location of file
data content.
• File metadata may be segregated from a file's data content.
• Metadata on file ownership and permissions is used in file access.
• File timestamp metadata facilitates automated processes such as backup
and life cycle management.
Different file systems
• In Unix systems, file metadata is contained in the i-node structure.
• In Windows systems, file metadata is contained in records of file attributes.
17. File System Level
• File system
What is file system ?
• A file system is a software layer responsible for organizing and policing the
creation, modification, and deletion of files.
• File systems provide a hierarchical organization of files into directories and
subdirectories.
• The B-tree algorithm facilitates more rapid search and retrieval of files by
name.
• File system integrity is maintained through duplication of master tables,
change logs, and immediate writes off file changes.
Different file systems
• In Unix, the super block contains information on the current state of the
file system and its resources.
• In Windows NTFS, the master file table contains information on all file
entries and status.
18. File System Level
• File system level virtualization
File system maintains metadata
(i-node) of each file.
Translate file access requests to
underlining file system.
Sometime divide large file into small
sub-files (chunks) for parallel access,
which improves the performance
19. Block Device Level
• Block level data
The file system block
• The atomic unit of file system management is the file system block.
• A file's data may span multiple file system blocks.
• A file system block is composed of a consecutive range of disk block
addresses.
Data in disk
• Disk drives read and write data to media through cylinder, head, and
sector geometry.
• Microcode on a disk translates between disk block numbers and
cylinder/head/sector locations.
• This translation is an elementary form of virtualization.
20. Block Device Level
• Block device interface
SCSI (Small Computer System Interface)
• The exchange of data blocks between the host system and storage is
governed by the SCSI protocol.
• The SCSI protocol is implemented in a client/server model.
• The SCSI protocol is responsible for block exchange but does not define
how data blocks will be placed on disk.
• Multiple instances of SCSI client/server sessions may run concurrently
between a server and storage.
21. Block Device Level
• Logical unit and Logical volume
Logical unit
• The SCSI command processing entity within the storage target represents a
logical unit (LU) and is assigned a logical unit number (LUN) for identification
by the host platform.
• LUN assignment can be manipulated through LUN mapping, which
substitutes virtual LUN numbers for actual ones.
Logical volume
• A volume represents the storage capacity of one or more disk drives.
• Logical volume management may sit between the file system and the device
drivers that control system I/O.
• Volume management is responsible for creating and maintaining metadata
about storage capacity.
• Volumes are an archetypal form of storage virtualization.
22. Block Device Level
• Data block level virtualization
LUN & LBA
• A single block of information is
addressed using a logical unit
identifier (LUN) and an offset within
that LUN, which known as a Logical
Block Address (LBA).
Apply address space remapping
• The address space mapping is
between a logical disk and a logical
unit presented by one or more
storage controllers.
24. Where To Be Virtualized
• Storage interconnection
The path to storage
• The storage interconnection provides the data path between
servers and storage.
• The storage interconnection is composed of both hardware and
software components.
• Operating systems provide drivers for I/O to storage assets.
• Storage connectivity for hosts is provided by host bus adapters
(HBAs) or network interface cards (NICs).
25. Where To Be Virtualized
• Storage interconnection protocol
Fibre Channel
• Usually for high performance requirements.
• Supports point-to-point, arbitrated loop, and fabric interconnects.
• Device discovery is provided by the simple name server (SNS).
• Fibre Channel fabrics are self-configuring via fabric protocols.
iSCSI ( internet SCSI )
• For moderate performance requirements.
• Encapsulates SCSI commands, status and data in TCP/IP.
• Device discovery by the Internet Storage Name Service (iSNS).
• iSCSI servers can be integrated into Fibre Channel SANs through IP storage
routers.
26. Where To Be Virtualized
• Abstraction of physical storage
Physical to virtual
• The cylinder, head and sector geometry of individual disks is virtualized
into logical block addresses (LBAs).
• For storage networks, the physical storage system is identified by a
network address / LUN pair.
• Combining RAID and JBOD assets to create a virtualized mirror must
accommodate performance differences.
Metadata integrity
• Storage metadata integrity requires redundancy for failover or load
balancing.
• Virtualization intelligence may need to interface with upper layer
applications to ensure data consistency.
27. Where To Be Virtualized
• Different approaches :
Host-based approach
• Implemented as a software
running on host systems.
Network-based approach
• Implemented on network devices.
Storage-based approach
• Implemented on storage target
subsystem.
28. Host-based Virtualization
• Host-based approach
File level
• Run virtualized file system on the
host to map files into data blocks,
which distributed among several
storage devices.
Block level
• Run logical volume management
software on the host to intercept I/O
requests and redirect them to
storage devices.
Provide services
• Software RAID
Sub-file
1
Sub-file
2
Sub-file
3
Block 1 Block 2 Block 1 Block 2 Block 1
29. Host-based Virtualization
• Important issues
Storage metadata servers
• Storage metadata may be shared by multiple servers.
• Shared metadata enables a SAN file system view for multiple servers.
• Provides virtual to real logical block address mapping for client.
• A distributed SAN file system requires file locking mechanisms to preserve
data integrity.
Host-based storage APIs
• May be implemented by the operating system to provide a common
interface to disparate virtualized resources.
• Microsoft's virtual disk service (VDS) provides a management interface for
dynamic generation of virtualized storage.
30. Host-based Virtualization
• A typical example :
LVM
• Software layer between the file
system and the disk driver.
• Executed by the host CPU.
• Lack hardware-assist for functions
such as software RAID.
• Independence from vendor-specific
storage architectures.
• Dynamic capacity allocation to
expand or shrink volumes.
• Support alternate pathing for high
availability.
31. Host-based Virtualization
• Host-based implementation
Pros
• No additional hardware or infrastructure requirements
• Simple to design and implement
• Improve storage utilization
Cons
• Storage utilization optimized only on a per host base
• Software implementation is dependent to each operating system
• Consume CPU clock cycle for virtualization
Examples
• LVM, NFS
32. Network-based Virtualization
• Network-based approach
File level
• Seldom implement file level
virtualization on network device.
Block level
• Run software on dedicated
appliances or intelligent switches
and routers.
Provide services
• Multi-path
• Storage pooling
Block 1 Block 2 Block 1 Block 2 Block 1
33. Network-based Virtualization
• Requirements of storage network
Intelligent services
• Logon services
• Simple name server
• Change notification
• Network address assignment
• Zoning
Fabric switch should provide
• Connectivity for all storage transactions
• Interoperability between disparate servers,
operating systems, and target devices
34. Network-based Virtualization
• Techniques for fabric switch virtualization
Hosted on departmental switches
• A PC engine provisioned as an option blade.
Data center directors
• Should be able to preserve the five nines availability characteristic of
director-class switches.
• Dedicated virtualization ASICs provide
high-performance frame processing
and block address mapping.
Interoperability between
different implementations
will become a priority.
35. Network-based Virtualization
• Interoperability issue
FAIS ( Fabric Application Interface Standard )
• Define a set of standard APIs to integrate applications and switches.
• FAIS separates control information and data paths.
• The control path processor (CPP) supports the FAIS APIs and upper layer
storage virtualization application.
• The data path controller (DPC) executes the virtualized SCSI I/Os under the
management of one or more CPPs
36. Network-based Virtualization
• Network-based implementation
Pros
• True heterogeneous storage virtualization
• No need for modification of host or storage system
• Multi-path technique improve the access performance
Cons
• Complex interoperability matrices - limited by vendors support
• Difficult to implement fast metadata updates in switch device
• Usually require to build specific network equipments (e.g., Fibre Channel)
Examples
• IBM SVC ( SAN Volume Controller ), EMC Invista
37. Storage-based Virtualization
• Storage-based approach
File level
• Run software on storage device to
provide file based data storage
services to host through network.
Block level
• Embeds the technology in the target
storage devices.
Provide services
• Storage pooling
• Replication and RAID
• Data sharing and tiering Sub-file
1
Sub-file
2
Sub-file
3
Sub-file
1.bak
Sub-file
2.bak
Block 1 Block 1 Block 1
Block 1 Block 1 Block 1
Replica Replica Replica
38. Storage-based Virtualization
• Array-based virtualization
Storage controller
• Provide basic disk virtualization in the form of RAID management,
mirroring, and LUN mapping or masking.
• Allocate a single LUN to multiple servers.
• Offer Fibre Channel, iSCSI,
and SCSI protocol.
Cache memory
• Enhance performance.
Storage assets coordination
• Coordination between
multiple storage systems
is necessary to ensure high
availability.
39. Storage-based Virtualization
• Data replication
Array-based data replication
• Referred to as disk-to-disk replication.
• Requires that a storage controller function concurrently as both an
initiator and target.
Synchronous vs. Asynchronous
• Synchronous data replication ensures that a write operation to a
secondary disk array is completed before the primary array
acknowledges task completion to the server.
• Asynchronous data replication provides write completion by the
primary array, although the transaction may still be pending to the
secondary array.
41. Storage-based Virtualization
• Other features
Point-in-time copy ( snapshot )
• Provide point-in-time copies of an entire storage volume.
• Snapshot copies may be written to secondary storage arrays.
• Provide an efficient means to quickly recover a known good volume state
in the event of data from the host.
Distributed modular virtualization
• Decoupling storage controller logic from physical disk banks provides
flexibility for supporting heterogeneous disk assets and facilitates
distributed virtualization intelligence.
• Accommodates class of storage services and data lifecycle management.
42. Storage-based Virtualization
Decoupling storage controller intelligence and virtualization engines from
physical disk banks facilitates multi-protocol block data access and
accommodation of a broad range of disk architectures.
Distributed Modular Virtualization
43. Storage-based Virtualization
• Storage-based implementation
Pros
• Provide most of the benefits of storage virtualization
• Reduce additional latency to individual IO
Cons
• Storage utilization optimized only across the connected controllers
• Replication and data migration only possible across the connected
controllers and the same vendors devices
Examples
• Disk array products
45. In-band Virtualization
• Implementation methods :
In-band
• Also known as symmetric,
virtualization devices actually sit in
the data path between the host
and storage.
• Hosts perform IO to the virtualized
device and never interact with the
actual storage device.
Pros
• Easy to implement
Cons
• Bad scalability & Bottle neck
Data Data Data
Control
Message
Control
Message
Control
Message
Control
Message
Control
Message
Control
Message
46. Out-of-band Virtualization
• Implementation methods :
Out-of-band
• Also known as asymmetric,
virtualization devices are
sometimes called metadata
servers.
• Require additional software in the
host which knows the first request
location of the actual data.
Pros
• Scalability & Performance
Cons
• Hard to implement
Control
Message
Control
Message
Control
Message
Control
Message
Control
Message
Control
Message
Data Data Data
47. Other Virtualization Services
In a virtualized storage pool, virtual assets may be
dynamically resized and allocated to servers by
drawing on the total storage capacity of the SAN
Pooling Heterogeneous
Storage Assets
Heterogeneous Mirroring
Heterogeneous mirroring offers more flexible options
than conventional mirroring, including three-way
mirroring within storage capacity carved from
different storage systems
48. Other Virtualization Services
Heterogeneous data replication enables duplication of storage data
between otherwise incompatible storage systems.
Heterogeneous Data Replication
49. Summary
• Storage virtualization technique :
Virtualization layer
• File level and block level
Virtualization location
• Host, network and storage base
Virtualization method
• In-band and out-of-band
• Storage virtualization services
Storage pooling and sharing
Data replication and mirroring
Snapshot and multi-pathing
51. STORAGE VIRTUALIZATION
ON LINUX SYSTEM
• Case-study, virtualization on linux system
• Block-based
• Redundant Array of Independent Disks (RAID)
• Logical Volume Management (LVM)
• File-based
• Network File System (NFS)
52. RAID
• RAID (redundant array of independent disks)
• Originally: redundant array of inexpensive disks
RAID schemes provide different balance between the key goals:
• Reliability
• Availability
• Performance
• Capacity
53. RAID level
• The most used:
RAID0
• block-level striping without parity or mirroring
RAID1
• mirroring without parity or striping
RAID1+0
• referred to as RAID 1+0, mirroring and striping
RAID2
RAID3
RAID4
RAID5
• block-level striping with distributed parity
RAID5+0
• referred to as RAID 5+0, distributed parity and striping
RAID6
54. RAID 0
• RAID 0: Block-level striping
without parity or mirroring
It has no (or zero) redundancy.
It provides improved performance
and additional storage
It has no fault tolerance. Any drive
failure destroys the array, and the
likelihood of failure increases with
more drives in the array.
figure from: http://storage-system.fujitsu.com/jp/term/raid/
55. RAID 1
• RAID 1: Mirroring without parity or
striping
Data is written identically to two drives, thereby
producing a "mirrored set";
A read request is serviced by one of the two
drives containing with least seek time plus
rotational latency.
A write request updates the stripes of both drives.
The write performance depends on the slower of
the two.
At least two drives are required to constitute such
an array.
The array continues to operate as long as at least
one drive is functioning.
• Space efficiency
1 / N
• N = 2
• Fault tolerance
N – 1
• N = 2
figure from: http://storage-system.fujitsu.com/jp/term/raid/
56. RAID 5
• RAID5: Block-level striping with
distributed parity
distributes parity on different disk
requires at least 3 disks
• Space efficiency
1 − 1/N
• Fault tolerance
1
figure from: http://storage-system.fujitsu.com/jp/term/raid/
60. Logical Volume Management
• LVM project is implemented in two components:
In user space
• Some management utilities and configuration tools
Ex. lvm , dmsetup
• Programming interface with a well-designed library
Ex. libdevmapper.h
In kernel space
• Implement device mapper framework
• Provide different mapped device targets
Ex. linear , stripe , mirror …etc.
65. Logical Volume Management
• File system in operating system will invoke a set of block
device system calls.
Device Mapper framework
reload operation functions
71. Network File System
• What is NFS ?
NFS is a POSIX-compliant distributed file system
• Work distributedly as server-client model
NFS builds on the Remote Procedure Call (RPC) system.
The Network File System is an open standard defined in RFCs.
• Some features :
Shared POSIX file system
Common module in linux kernel
well performance
72. Network File System
• Dynamic Port and its handle
In NFSv3, service listens on random tcp port.
NFS use RPC(Remote Procedure Call) to get the port of service.
73. Network File System
• Consistency and concurrency in NFS
Lockd offers a write lock to handle concurrent update.
Statd handles the consistency between server and clients.
74. STORAGE VIRTUALIZATION
IN DISTRIBUTED SYSTEM
• Case-study, virtualization in distributed system
• Block-based
• VastSky
• File-based
• Lustre
• Object-based
• Ceph
• HDFS
75. VastSky
• Overview
VastSky is a linux-based cluster storage system, which provides logical
volumes to users by aggregating disks over a network.
• Three kinds of servers
storage manager
• Maintaining a database which describes physical and logical resources in a system.
• e.g. create and attach logical volumes.
head servers
• Running user applications or virtual machines which actually use VastSky logical
volumes.
storage servers
• Storage servers have physical disks which are used to store user data.
• They are exported over the network and used to provide logical volumes on head
servers. (iSCSI)
77. VastSky
• Logical Volume
a set of several mirrored disks
several physical disk chunks on different servers
Storage
Server2
Storage
Server1
Storage
Server3
Storage
Server4
Storage Pool
Logical Volume
There are 3 mirrored disks
and all of them are distributed
in 3 different servers.
78. VastSky
• Redundancy
VastSky mirrors user data to three storage servers by default and
all of them are updated synchronously.
VastSky can be configured to use two networks (e.g. two
independent ethernet segments) for redundancy.
• Fault detection
The storage manager periodically checks if each head and storage
servers are responsive.
• Recovery
On a failure, the storage manager attempts to reconfigure mirrors
by allocating new extents from other disks automatically.
80. VastSky
• Scalability
Most of cluster file-systems and storage systems which have a
meta-data control node have a scalability problem.
VastSky doesn't have this problem since once a logical volume is set
up, all I/O operations will be done only through Linux drivers
without any storage manager interactions.
81. VastSky
• Load Balance
With VastSky's approach, the loads will be equalized across the
physical disks, which leads that it utilizes the I/O bandwidth of
them.
D2
D2
D1
D1
D3
D3
D1
D1
D3
Storage
Server2
D2
D1
D2
D2
D1
D3 D1
D3
Storage
Server1
D2
D1
D3
D2
D3
D2
D1
D3
D1
D3
Storage
Server3
D3
D2
D2
D3
D1
D2
D3
D1
D2
Storage
Server4
Storage Pool
Logical Volume
D3
D2
D1
D3
D2
D1
D3
D2
D1
D3
D2
D1
82. Lustre File System
• What is Lustre ?
Lustre is a POSIX-compliant global, distributed, parallel filesystem.
Lustre is licensed under GPL.
• Some features :
Parallel shared POSIX file system
Scalable
• High performance
• Petabytes of storage
Coherent
• Single namespace
• Strict concurrency control
Heterogeneous networking
High availability
83. Lustre File System
• Lustre components :
Metadata Server (MDS)
• The MDS server makes metadata stored in one or more MDTs.
Metadata Target (MDT)
• The MDT stores metadata (such as filenames, permissions) on an MDS.
Object Storage Servers (OSS)
• The OSS provides file I/O service, and network request handling for one or
more local OSTs.
Object Storage Target (OST)
• The OST stores file data as data objects on one or more OSSs.
• Lustre network :
Supports several network types
• Infiniband, TCP/IP on Ethernet, Myrinet, Quadrics, …etc.
Take advantage of remote direct memory access (RDMA)
• Improve throughput and reduce CPU usage
85. Lustre File System
• Lustre in HPC
Lustre is the leading HPC file system
• 15 of Top 30
• Demonstrated scalability
Performance
• Systems with over 1,000 nodes
• 190 GB/sec IO
• 26,000 clients
Examples
• Titan supercomputer at Oak Ridge National Laboratory
– TOP500: #1, November 2012
• System at Lawrence Livermore National Laboratory (LLNL)
• Texas Advanced Computing Center (TACC)
86. Ceph
• Overview
Ceph is a free software distributed file system.
Ceph's main goals are to be POSIX-compatible, and completely
distributed without a single point of failure.
The data is seamlessly replicated, making it fault tolerant.
• Release
On July 3, 2012, the Ceph development team released Argonaut, the
first release of Ceph with long-term support.
87. Ceph
• Introduction
Ceph is a distributed file system that provides excellent
performance ,reliability and scalability.
Objected-based Storage.
Ceph separates data and metadata operations by eliminating file
allocation tables and replacing them with generating functions.
Ceph utilizes a highly adaptive distributed metadata cluster,
improving scalability.
Using object-based storage device (OSD) to directly access data,
high performance.
89. Ceph
• Goal
Scalability
• Storage capacity, throughput, client performance. Emphasis on HPC.
Reliability
• Failures are the norm rather than the exception, so the system must have
fault detection and recovery mechanism.
Performance
• Dynamic workloads Load balance.
90. Ceph
• Three main components
Clients : Near-POSIX file system interface.
Cluster of OSDs : Store all data and metadata.
Metadata server cluster : Manage namespace (file name).
91. Three Fundamental Design
1. Separating Data and Metadata
Separation of file metadata management from the storage.
Metadata operations are collectively managed by a metadata server
cluster.
User can direct access OSDs to get data by metadata.
Ceph removed data allocation lists entirely.
Usr CRUSH assigns objects to storage devices.
93. Separating Data and Metadata
• CRUSH(Controlled Replication Under Scalable Hashing)
CRUSH is a scalable pseudo-random data distribution function
designed for distributed object-based storage systems .
Define some simple hash functions.
Use hash functions to efficiently map data objects to storage
devices without relying on a central directory.
Advantages
• Because using hash functions, client can calculate object location directly.
94. Separating Data and Metadata
• CRUSH(x) (osdn1, osdn2, osdn3)
Inputs
• x is the placement group
• Hierarchical cluster map
• Placement rules
Outputs a list of OSDs
• Advantages
Anyone can calculate object location
Cluster map infrequently updated
95. Separating Data and Metadata
• Data Distribution with CRUSH
In order to avoid imbalance (OSD idle, empty) or load asymmetries
(hot data on new device).
→distributing new data randomly.
Use a simple hash function, Ceph maps objects to Placement
groups (PGs). PGs are assigned to OSDs by CRUSH.
96. Dynamic Distributed Metadata
Management
2. Dynamic Distributed Metadata Management
Ceph utilizes a metadata cluster architecture based on Dynamic
Subtree Partitioning.(workload balance)
Dynamic Subtree Partitioning
• Most FS ,use static subtree partitioning
→imbalance workloads.
→simple hash function can get directory.
• Ceph’s MDS cluster is based on a dynamic subtree partitioning. →balance
workloads
98. Client
• Client Operation
File I/O and Capabilities
Request
Client
(open file)
MDS
Translate file
name into
inode(inode
number, file
owner, mode,
size, …)
Check OK, return
Return inode number,
map file data into
objects(CRUSH)
OSD
Direct
access
99. Client
• Client Synchronization
If multiple clients (readers and writers) use the same file, cancel
any previously read and write requests until OSD check OK.
• Traditional: Update serialization. → Bad performance
• Ceph: Use for HPC (high-performance computing community) can read
and write different parts of same file (diff objects).
→ increase performance
100. Metadata
• Dynamically Distributed Metadata
MDSs use journaling
• Repetitive metadata updates handled in memory.
• Optimizes on-disk layout for read access.
Per-MDS has journal (usually a circular log in a dedicated area of
the file system), when MDS failure another node can quickly
recover with journal.
Inodes are embedded directly within directories.
Each directory’s content is written to the OSD cluster using the
same striping and distribution strategy as metadata journals and
file data.
101. Replica
• Replication
Data is replicated in terms of PGs.
Clients send all writes to the first non-failed OSD in an object’s PG
(the primary), which assigns a new version number for the object
and PG; and forwards the write to any additional replica OSDs.
102. Failure detection
• Failure detection
When OSD not response → sign “down”
Pass to the next OSD.
If first OSD doesn’t recover →sign “out”
Another OSD join.
103. Recovery
• Recovery and Cluster Updates
If OSD1 crashes → sign “down”
The OSD2 takes over as primary.
If OSD1 recovers → sign “up”
The OSD2 receives update request, sent new version data to OSD1.
104. HDFS
• Overview
HDFS(Hadoop Distributed
File System).
Reference from Google File
System.
A scalable distributed file
system for large data analysis.
Based on commodity
hardware with high fault-
tolerant.
The primary storage used by
Hadoop applications.
Hadoop Distributed File System
(HDFS)
MapReduce Hbase
A Cluster of Machines
Cloud Applications
105. Hadoop
• Introduction
An Apache project
A distributed computing platform
A software framework that lets one easily write and run
applications that process vast amounts of data
• From three papers
SOSP 2003 : “The Google File System”
OSDI 2004 : “MapReduce : Simplifed Data Processing on Large
Cluster”
OSDI 2006 : “Bigtable: A Distributed Storage System for Structured
Data”
106. Hadoop Features
• Efficiency
Process in parallel on the nodes where the data is located
• Robustness
Automatically maintain multiple copies of data and automatically re-
deploys computing tasks based on failures
• Cost Efficiency
Distribute the data and processing across clusters of commodity
computers
• Scalability
Reliably store and process massive data
108. HDFS Architecture
• NameNode
The file content is split into blocks (default 128MB,3 replica).
Files and directories are represented on the NameNode by inodes
(permissions, modification and access times, namespace and disk
space quotas).
Namespace is a hierarchy of files and directories.
NameNode maintains the namespace tree and the mapping of file
blocks to DataNodes.
Three components
• Image: the inode data and the list of blocks (name system).
• Checkpoint: the persistent record of the image (file system).
• Journal: the modification log of the image (file system).
109. HDFS Architecture
• Image and Journal
When Startup NameNode
1. Load checkpoint.
2. Use journal.
• CheckpointNode
The CheckpointNode periodically combines the existing checkpoint
and journal to create a new checkpoint and an empty journal.
The CheckpointNode downloads checkpoint and journal from
NameNode and return a new checkpoint and an empty journal.
• BackupNode
The BackupNode always follows journal to keep NameNode latest
version.
If NameNode fails, use BackupNode untill NameNode recovers.
110. HDFS Architecture
• DataNode
Each block replica on a DataNode is represented by two files:
• Data
• Block’s metadata (checksum, generation stamp)
When startup DataNode, NameNode performs handshaking:
• Verify the name space ID.
• Verify the soft version.
A new DataNode will receive namespace ID .
After the handshaking the DataNode registers will get Storage ID.
A DataNode identifies block replicas in its possession to the
NameNode by sending a block report (block ID, generation stamp).
(1 time/hr)
Hearbeats: 1 time/3 sec
112. HDFS Client
• File Write
HDFS implements a single-writer, multiple-reader model.
The writing client periodically renews the lease by heartbeat:
• Soft limit: client fails to renew the lease, another client can preempt.
• Hard limit: client fails to renew the lease, the client quit and close file.
Form a pipeline.
A full packet buffer is pushed
to the pipeline.
113. HDFS Client
• File Read
When a client opens a file to read, it fetches the list of blocks and
the locations of each block replica from the NameNode.
Read from the nearest replica first. If fails, read from the next
nearest replica.
114. REPLICA MANGEMENT
• Block Placement
When a DataNode registers to the NameNode, the NameNode runs
a configured script to decide which rack the node belongs to.
The default HDFS block placement policy provides a tradeoff
between minimizing the write cost, and maximizing data reliability.
The default HDFS replica placement policy:
• No DataNode contains more than one replica of any block.
• No rack contains more than two replicas of the same block.