SlideShare a Scribd company logo
1 of 162
STORAGE SYSTEM ARCHITECTURE
Instructor
Mr. S.Christalin Nelson
AP(SG)/CIT
At a Glance
• Storage System Environment
• Host Environment
• Connectivity
• Disk Storage
– Physical Disks
– RAID
• Intelligent Disk Storage
18-Feb-20 2 of 162
Storage System Environment
• Storage has evolved from single internal disks to storage
systems.
• Storage system environment (group of components) provide
storage, handle R/W data requests & data transmission
• Components of Storage system Environment
– Host: Interact with OS & applications that require data
– Connectivity: Carries R/W commands and the data between
the host and the storage devices
– Storage: Devices where the data is stored
• Storage environment has evolved along with changes in
computing models.
18-Feb-20 3 of 162
Module – 1/5
Host Environment
Host Environment
• Host?
– Computers on which the applications for I/O reside
– Laptops to Cluster of Servers
18-Feb-20
Laptop Server
Group of Servers
Mainframe
5 of 162
Physical Components of Host (1/12)
• Physical components
– CPU, Internal Memory & Disk Devices, IO Devices
– Bus: The physical components interact with one another
through a Bus
18-Feb-20
Bus
I/O Devices
CPU Storage
6 of 162
Physical Components of Host (2/12)
18-Feb-20
• CPU – Components
– ALU
– Control Unit
– Register
– Level-1 Cache
7 of 162
Physical Components of Host (3/12)
• CPU technology now mean systems typically come at least
Dual Core, Quad Core or more processors (on one single
chip) instead of the traditional one core per chip.
• The total number of Cores can slot into a socket as before
and a single heat sink and fan can keep everything to the
right temperature.
18-Feb-20 8 of 162
Physical Components of Host (4/12)
18-Feb-20
• Storage
– Memory Modules
• Semiconductor memory, High speed data access, Expensive
• Example: RAM, ROM
– Storage Devices
• Magnetic or Optical media, Low speed data access, Cheaper
• Example:
– Magnetic: Tape, Floppy Disk, Hard Disk
– Optical: CD, DVD, VCD
9 of 162
18-Feb-20 10 of 162
Physical Components of Host (6/12)
• Internal
– Processor registers – fastest access (usually 1 CPU cycle)
– Cache
• L0 Micro operations cache – 6 kB
• L1 Instruction cache – 131 kB
• L1 Data cache – 131 kB, 751 GB/s
• L2 Instruction & data (shared) – 1 MB, 215 GB/sec
• L3 Shared cache – 6 MB, 100 GB/s
• L4 Shared cache – 134 MB, 40 GB/s
• Main memory (Primary)
– RAM – GBs, 10 GB/s
• Online Mass Storage
– Disk storage (Secondary) – TBs, 2000 MB/s (2017)
18-Feb-20
KiB & MiB ?
1 KiB/s = 210 byte per sec
1 MiB/s = 220 byte per sec
11 of 162
Physical Components of Host (7/12)
• Offline Mass Storage
– Nearline storage (Tertiary) – EBs, 160 MB/s(2013)
• MAID
– Offline storage
• Floppy Disk, Optical Disk, Flash Memory, Magnetic Tape
• Online vs Nearline vs Offline storage
– Online storage is immediately available for I/O.
– Nearline storage is not immediately available, but can be made
online quickly without human intervention.
– Offline storage is not immediately available, and requires
some human intervention to bring online.
• Tiered Storage
– The lower levels of the hierarchy from disks downwards.
18-Feb-20 12 of 162
Physical Components of Host (8/12)
• Modern programming languages mainly assume 2 levels of
memory: main memory & disk storage, though in assembly
language and inline assemblers in languages such as C,
registers can be directly accessed.
• Taking optimal advantage of the memory hierarchy requires
the cooperation of programmers, hardware, and compilers
(as well as underlying support from OS):
– Programmers are responsible for moving data between disk
and memory through file I/O.
– Hardware is responsible for moving data between memory
and caches.
– Optimizing compilers are responsible for generating code that,
when executed, will cause the hardware to use caches and
registers efficiently.
18-Feb-20 13 of 162
Physical Components of Host (9/12)
• Storage Hierarchy
18-Feb-20
Speed
Slow
Fast
Cost HighLow
Tape
Optical disk
Magnetic disk
RAM
L2 cache
L1 cache
CPU registers
14 of 162
Physical Components of Host (10/12)
• RAM types
– SRAM (Static RAM) – Bipolar, MOSFET, BiMOS
– DRAM (Dynamic RAM) – DRAM, PSRAM, VRAM, FRAM, QLC
– SDRAM (Synchronous DRAM) – SDR, RDRAM, DDR, eDRAM,
DDR2/3/4, LPDDR2/3/4/5
– SGRAM (Synchronous Graphics RAM)
– HBM (High Bandwidth Memory)
18-Feb-20 15 of 162
Physical Components of Host (11/12)
• Storage
18-Feb-20
…
0
1
2
3
n
Data 0
Data n
Data 2
Data 3
Data 1
Address Content
Disk
Memory
16 of 162
Physical Components of Host (12/12)
• I/O Devices
– Human interface
• Keyboard, Mouse, Monitor
– Computer-computer interface
• Network Interface Card (NIC)
– Computer-peripheral interface
• USB (Universal Serial Bus) port
• Host Bus Adapter (HBA)
18-Feb-20 17 of 162
Logical Components of Host (1/10)
• Logical components
– Software applications, Protocols, OS, File systems, Database
18-Feb-20
Host
Applications
Volume Management
DBMS Mgmt Utilities
File System
Multi-pathing Software
Device Drivers
HBA HBA HBA
OS
18 of 162
Logical Components of Host (2/10)
• Applications
– Provide a point of interaction either between the user and the
host or another system and the host
– Most applications have storage requirements (short or long-
term depending upon the application)
• Operating system
– Controls interaction between applications & storage systems
– Monitors and responds to user actions and the environment
– Organizes and controls the hardware components
– Connects hardware components to the application program
layer and the users
– Manages system activities such as storage and communication
18-Feb-20 19 of 162
Logical Components of Host (3/10)
• Device drivers
– Allow the OS to be aware of and use a standard interface to
access and control a specific device (i.e., printer, speakers,
mouse, keyboard, video, storage devices, etc.)
– Provide appropriate protocols to host to allow device access
• File System (and Files)
– Provides a logical structure for data and methods for accessing
that data
• Hosts work with data stored in File System blocks
• File system converts the user logical structures into host
accessible blocks
18-Feb-20 20 of 162
Logical Components of Host (4/10)
• File system block
– Smallest ‘container’ allocated to a file’s data
– Each block is a contiguous area of physical disk capacity
– Block Size depend on type of files being stored and accessed
• Block size is Fixed or pre-defined by OS during storage system
configuration.
• Larger files will span multiple file system blocks (may not
necessarily be contiguous on a physical disk)
18-Feb-20 21 of 162
Logical Components of Host (5/10)
• In multi-user, multi-tasking environments, file systems
manage shared storage resources using
– Directories, paths and structures
• Identify file locations
– Volume Managers
• Hide the complexity of physical disk structures
– File locking capabilities
• Control access and data flow to and from file locations when
used by potentially competing users or applications
– Databases and data management components such as
• Large, shared relational databases
• Management of shared data storage
18-Feb-20 22 of 162
Logical Components of Host (6/10)
• Linear File Structure
– The number of files on a system can be extensive and could
quickly get out of hand
• Hierarchical Structure with Directories
– Also called as Folders in the Windows environment
– Hold files as well as other directories
– Hold information about files that they contain (Metadata)
18-Feb-20 23 of 162
Logical Components of Host (7/10)
• Metadata (Information or Data about the file)
– Examples: In UNIX (UFS)
• File type and permissions
• Number of links
• Owner and group IDs
• Number of bytes in the file
• Last file access & Last file modification
– Example: In Windows (NTFS)
• Time stamp and link count
• File name
• Access rights
• File data
• Index information & Volume information
18-Feb-20 24 of 162
Logical Components of Host (8/10)
• Journaling & Logging
– Non-Journaling File system
• At Write when System crashes, data or metadata can be lost
– Use many separate writes to update their data and metadata
– Journaling File System
• Improves data integrity, system restart time (vs. non-journaling
file systems)
• Before operations are made to the file system, they are written
to a separate area called a log or journal
– May hold all data to be written (Physical Journal)
– May hold only metadata (Logical Journal)
• Disadvantage – slower than other file systems
– Each file system update requires at least one extra write – to the log
18-Feb-20 25 of 162
Logical Components of Host (9/10)
• Volume Management
– Optional intermediate layer between file system and physical
disks
– Aggregates several smaller disks to form a larger virtual disk
• Virtual Disks are only visible to higher level programs and
applications
– Optimizes access to storage
– Simplifies the management of storage resources
18-Feb-20 26 of 162
Logical Components of Host (10/10)
• Host Bus Adapter (HBA)
– Add-on card (or) a chip on the motherboard of the host
– Ports connect the Host to the storage devices
– Has processing capability to handle some storage commands,
thereby reducing the burden on the host CPU
– Multiple HBAs
18-Feb-20 27 of 162
Improving Data Availability at Host
• Hosts can be configured to provide uninterrupted access to
critical data through
– Redundancy [Implemented using multiple HBAs]
– Multi-path software [Server resident]
• Utilizes available HBAs on the server to provide
redundant/multiple communication paths between host and
storage devices
• It provides assured uninterrupted data transfers even in the
event of a path failure and may also provide automatic load
balancing
– Clustering [Redundant host systems connected together]
• Cluster members can be configured to transparently take over
each others’ workload, with minimal or no impact to the user
• If one host in the cluster fails, its functions will be assumed by
surviving member(s).
18-Feb-20 28 of 162
File Movement to/from Storage (Example)
18-Feb-20
Teacher
Configures
/ Manages
File System Files
Mapped by
File System to
Course File(s)
Reside in
File System Blocks
Disk Physical
Extents
Consisting
of
LVM Logical
Extents
Residing inMapped by
LVM to
Disk Sectors
Managed by
Disk Storage
Subsystem
File system blocks are mapped to Disk
Sectors by OS in the absence of LVM
29 of 162
Module – 2/5
Connectivity
Parts of Storage Environment
• Based on Connectivity
– Between hosts (or) between a host and peripheral (storage)
devices
– Physical components of Connectivity include
• Bus, Port, Cables (Uses Optical and Copper media)
• Connectors and plugs
• Adapters
– Host Bus Adapter (HBA) – enables devices to connect to a host’s
internal bus system
• NIC – enables simple network attachments to a host
• Switches/hubs - Manage traffic within a network
– Logical components of Connectivity
• Communication protocols, Device Drivers
18-Feb-20 31 of 162
Physical Components – Host with Internal
Storage
18-Feb-20 32 of 162
Bus Technology (1/4)
• Bus?
– Collection of paths that facilitate data transmission from one
part of the computer to another
– Physical components communicate across a bus by sending
packages (packets) of data between the devices in Serial or
Parallel Paths.
• Serial communication: Bits travel one behind the other
• Parallel communication: Bits can move along redundant paths
simultaneously.
18-Feb-20 33 of 162
Bus Technology (2/4)
• Serial/Parallel Paths
18-Feb-20
Serial Uni-directional
Serial Bi-directional
Parallel
34 of 162
Bus Technology (3/4)
• Types of buses in a computer system
– System Bus
• Carries data from Processor to Memory
– Local or I/O Bus
• Carries data to/from Peripheral devices (such as storage devices)
• Provides a high-speed pathway that connects directly to
processor
18-Feb-20 35 of 162
Bus Technology (4/4)
• Bus Properties
– Bus width (bits)
• Amount of data that can be transmitted at a time
• E.g. n-bit bus can transmit n-bits of data
– Bus speed (MHz)
• Every bus has an associated clock speed which determines how
fast data can be transferred
• Applications can run faster when bus speed are higher.
– Throughput (Mbps)
18-Feb-20 36 of 162
Connectivity Protocols (1/2)
• Protocol
– Defined format for communication that allows the sending and
receiving devices to agree on what is being communicated.
– Communication between hardware or software components
• Different Connectivity Models
Tightly
Connected
Entities
Directly
Attached
Entities
Network
Connected
Entities
18-Feb-20 37 of 162
Connectivity Protocols (2/2)
• Tightly connected entities
– E.g. Central Processor to RAM, or storage buffers to controllers
– Use standard Bus technology (System bus or I/O – Local Bus)
• Directly attached entities
– Devices connected at moderate distances – such as host to
printer or host to storage (JBOD or DAS)
• Network connected entities
– E.g. Networked hosts, NAS or SAN
18-Feb-20 38 of 162
Communication Protocols
Host
Apps
Operating System
PCI
SCSI or IDE/ATA Device Drivers
18-Feb-20
• Protocols for local I/O bus and for connections to an internal
disk system include
– PCI (Peripheral Component Interconnection)
– IDE/ATA (Integrated Device Electronics / Advanced Technology
Attachment)
– SCSI (Small Computers System Interface)
39 of 162
Bus Technology – PCI (1/2)
• PCI defines the local bus system within a computer
• The specification standardizes how PCI expansion cards, such
as network cards or modems, install themselves and
exchange information with the CPU.
• PCI includes
– An interconnection between microprocessor and attached
devices, in which expansion slots are spaced closely for high-
speed operation
– Plug and Play functionality
– 32/64 bit simplex data (1992-2002) to duplex data (2019)
– Throughput is 133 MB/s (1992) to 128 GB/s (2019)
18-Feb-20 40 of 162
Bus Technology – PCI (2/2)
• PCI Express is an enhanced PCI bus with increased
Bandwidth
18-Feb-20 41 of 162
Bus Technology - IDE/ATA
• Most popular interface used with modern hard disks
• Good performance at low cost
• Desktop and laptop systems
• Inexpensive storage interconnect
18-Feb-20 42 of 162
Bus Technology - SCSI
• 2nd most popular hard disk interface protocol in PCs today
• Higher cost than IDE/ATA
• Supports multiple simultaneous data access
• Currently both parallel and serial forms
• Used primarily in “higher end” environments such as with
servers
• Note
– SCSI HBA (ref. to as controller) can be implemented as an
onboard interface (or) ‘add in’ card plugged into system I/O
bus.
18-Feb-20 43 of 162
SCSI Model (1/2)
Target
Initiator
18-Feb-20
SCSI device that
starts a
communication
SCSI device
that services
a request
• If Initiator is a host, it will release communication connection
and continue processing other events while target executes
the command. The host will await an interrupt signal from
the storage device to complete the transaction
44 of 162
SCSI Model (2/2)
Target
IDInitiator
ID
LUNs
18-Feb-20
• Initiator ID – uniquely identifies an initiator that is used as
an “originating address”
• Target ID – uniquely identifies a target. Used as address for
exchanging commands and status information with initiators
• Logical Unit Numbers (LUNs) – identifies a specific Logical
Unit in a target. Logical Units can be more than a single disk
45 of 162
SCSI Addressing
• Initiator ID
– Original initiator ID number (0 to 15)
– Used to send responses back to initiator from storage device
• Target ID
– Value for a specific storage device (0 to 15)
– An address that is set on the interface of the device such as a
disk, tape or CDROM
• LUN
– A number that reflects the actual address of the device, as
seen by the target
18-Feb-20
Initiator ID Target ID LUN
46 of 162
Disk Identifier - Addressing
• If logical device name used by a host for a disk drive is
cn|tn|dn
– dn is usually d0 for most SCSI disks because there is only one
disk attached to the target controller. In intelligent storage
systems, discussed later, each target may address many LUNs
18-Feb-20
c0- Controller
Initiator, HBA
Peripheral
Controller
t0
Target
LUNs
d0 d1 d2
47 of 162
SCSI – Pros & Cons
• Pros:
– Fast transfer speeds (up to
320 MB/s for parallel SCSI)
– Reliable, durable
components
– Can connect many devices
with a single bus, more
than just HDs
– SCSI host adapter cards can
be put in almost any system
– Fully backward
compatibility
18-Feb-20
• Cons:
– Configuration and setup
specific to one computer
– Unlike IDE, few BIOS
support the standard
– Overwhelming number of
variations in the standard,
hardware, and connectors
– No common software
interfaces and protocol
48 of 162
SCSI vs. IDE/ATA
18-Feb-20
Feature IDE/ATA SCSI
Expandability Low Very Good
Configuration & Setup Easier Complex and Expensive
Device Type Support Less Support Larger Support
Cost Cheap Expensive
Performance [1. Max.
Interface Data transfer
rate for multiple devices,
2. Device Mixing Issues &
3. Device Performance]
(1) Low DR
(2) Significant
Performance Hit
(3) Supports only one
device at a time
(1) High DR
(2) No issues related with
diff. operational speed
(3) Supports multiple
devices simultaneously
Connectivity Internal Storage Internal and External Storage
Speed (MB/s) 100/133/150 320
49 of 162
Physical Components (Host with External Storage)
• Hosts with external storage are usually large enterprise servers
18-Feb-20
Bus
Disk
Cable
Host Port
Port
HBA
CPU
50 of 162
Fiber Channel
• Offers high-speed interconnection used
in networked storage to connect
servers to shared storage devices
• FC refers to hardware components &
storage protocol that communicates
across the channel elements
• Fiber Channel components
– HBAs
– Hubs & Switches
– Cables
– Disks
18-Feb-20
Fiber Channel
Storage Arrays
Host
Apps
DBMS Mgmt Utils
File System
LVM
Multipathing Software
Device Drivers
HBA HBA HBA
51 of 162
External Storage Interfaces – A Comparison
• SCSI
– Limited distance
– Limited device count
– Usually limited to single initiator
– Single-ported drives
• Fiber Channel
– Greater distance
– High device count in SANs
– Multiple initiators
– Dual-ported drives
• Note
– SCSI can be used for internal storage in hosts.
– FC is almost never used internally.18-Feb-20 52 of 162
Fiber Channel Connectivity (1/2)
• When computing environments require high speed
connectivity, they use sophisticated equipment to connect
hosts to storage devices
• Physical connectivity components in networked storage
environments include:
– HBA (Host-side interface) – Host Bus Adapters connect the
host to the storage devices
– Optical cables – fiber optic cables to increase distance, and
reduce cable bulk
– Switches – used to control access to multiple attached devices
– Directors – sophisticated switches with high availability
components
– Bridges – connections to different parts of a network
18-Feb-20 53 of 162
Fiber Channel Connectivity (2/2)
18-Feb-20
Switches
Storage
Hosts
54 of 162
Module – 3/5
Physical Disks (Storage)
Parts of Storage Environment - Storage
• Physical components of storage include
– Physical devices that hold the data (i.e., disk, tape, optical
drives, etc.)
– Components that make the devices operate (i.e., power
supplies, fans)
– The enclosures that hold the equipment (e.g., racks)
• Logical components of storage include
– Protocols
– Flow algorithms
18-Feb-20 56 of 162
Disk Drive Components
• The Components of Disk Drive include
– Platters
– Spindle
– R/W Heads
– Actuator Arm Assemble
– Controller
18-Feb-20 57 of 162
Disk Drive Components: Platters (1/2)
• The Head Disk Assembly (HDA) is a sealed case which
contains a series of rotating platters
• Attributes of a Platter
– It is a rigid, round disk coated with magnetically sensitive
material
– Data is stored (encoded) as 0/1 by polarizing magnetic areas,
or domains, on the disk surface
– Data can be R/W on both surfaces of a platter
– The no. of platters on a drive is specific to the particular drive
– A platter’s storage capacity varies across drives and technology
• Note: The drive’s capacity is determined by the no. of platters,
the amount of data which can be stored on each platter, and how
efficiently data is written to the platter
18-Feb-20 58 of 162
Disk Drive Components: Platters (2/2)
18-Feb-20
00110100111010101010
00110100111010101010
10110101011010101010
01010100111010101010
59 of 162
Disk Drive Components: Spindle (1/2)
• Connects multiple disk platters to a motor which rotates at a
constant speed
– The spindle rotates continuously until power is removed from
the spindle motor
– Many hard drive failures occur when the spindle motor fails
– Disk platters spin at speeds of several thousand revolutions
per minute
• Note: These speeds will increase as technologies improve,
though there is a physical limit to the extent to which they can
improve
18-Feb-20 60 of 162
Disk Drive Components: Spindle (2/2)
18-Feb-20
Spindle
Platters
61 of 162
Disk Drive Components: R/W Heads (1/3)
• Most drives have two Read/Write heads per platter [One for
each surface of the platter]
• Data R/W is a magnetic process using read/write heads
– Data Read – Detection of magnetic polarization on the platter
surface
– Data Write – Change the magnetic polarization on the platter
surface
• Head flying height
– Height of microscopic air gap between the R/W heads and the
platter
18-Feb-20 62 of 162
Disk Drive Components: R/W Heads (2/3)
• Landing Zone
– Special area on the surface of the platter near the spindle were
the R/W heads rests when the spindle rotation has stopped
• Logic on the disk drive ensures that the heads are moved to the
landing zone before they touch the surface
– The landing zone is coated with a lubricant to reduce
head/platter friction.
• Head Crash
– Occurs when the drive malfunctions and a R/W head
accidentally touches platter’s surface outside of landing zone
– When a head crash occurs, the magnetic coating on the platter
gets scratched and damage may also occur to the R/W head
– A head crash generally results in data loss
18-Feb-20 63 of 162
Disk Drive Components: R/W Heads (3/3)
18-Feb-20 64 of 162
Disk Drive Components: Actuator Arm Assembly (1/2)
• The R/W heads for all of the platters in the drive are
attached to one actuator arm assembly and move across the
platter simultaneously
– Note: There are two R/W heads per platter, one for each
surface
• It positions the R/W head at a location on the platter where
data needs to be written or read
18-Feb-20 65 of 162
Disk Drive Components: Actuator Arm Assembly (2/2)
18-Feb-20
Actuator
Spindle
Actuator
R/W Head
R/W Head
66 of 162
Bottom View of Disk Drive
HDA
Controller
Interface
Power
Connector
Disk Drive Components: Controller
• It is a PCB, mounted at the bottom of the disk drive
• It contains a microprocessor (as well as some internal
memory, circuitry, and firmware) that controls:
– Power to the spindle motor and control of motor speed
– Communication of the drive with the CPU on the host system
– R/W by moving the actuator arm, and switching between R/W
heads
– Optimization of data access
18-Feb-20 67 of 162
Physical Disk Structures: Tracks
• It is a concentric ring around spindle which record data
• Track density
– How tightly the tracks are packed on a platter
• Track numbering
– Numbered from the outer edge of the platter starting at Track-
0 (zero)
18-Feb-20
Sector
Track
Platter
68 of 162
Physical Disk Structures: Sectors (1/4)
• Smallest individually-addressable unit of storage in a track
which typically hold 512B of user data
• Format operation
– Done by manufacturer to writes the track and sector structure
on the platter. Drive manufacturers generally advertise the
formatted capacity.
– Sector stores user data and other information
• Other Information (Sector no., head/platter no., track no.) aids
the controller in locating data on the drive
• No. of sectors per track is based upon the specific drive
– 1st PC hard disks typically held 17 sectors per track. Today's
hard disks can have much larger no. of sectors in a single track
– There can be 1000s of tracks on a platter depending on the
drive size
18-Feb-20 69 of 162
Physical Disk Structures: Sectors (2/4)
• Platter Geometry
– Since a platter is made up of concentric tracks, the outer tracks
can hold more data than the inner ones because they are
physically longer than the inner tracks
– Older disk drives had the same number of sectors in the outer
tracks as in the inner tracks
• Data density is very low on the outer tracks. This was an
inefficient use of the available space
• Zone-Bit Recording serves a good alternative for efficient use of
available space.
18-Feb-20 70 of 162
Physical Disk Structures: Sectors (3/4)
• Zoned-Bit Recording
– Group the tracks into zones based upon their distance from
the center of the disk
– Each zone is assigned an appropriate number of sectors per
track
• A zone near the center of the platter has fewer sectors per track
than a zone on the outer edge
• Tracks within a given zone have the same number of sectors.
• Outside tracks have more sectors than inside tracks
– Zones are numbered, with the outermost zone being Zone 0.
– Note
• The media transfer rate drops as the zones move closer to the
center of the platter, meaning that performance is better on the
zones created on the outside of the drive.
18-Feb-20 71 of 162
Physical Disk Structures: Sectors (4/4)
18-Feb-20
Platter Without Zones
Sector
Track
Platter With Zones
72 of 162
Physical Disk Structures: Cylinders
• A cylinder is the set of identical tracks
on both surfaces of each of the drive’s
platters
• Often the location of drive heads are
referred to by cylinder number rather
than by track number
18-Feb-20
Cylinder
Tracks, Cylinders and Sectors
73 of 162
Physical Disk Structures (contd.)
• Physical addressing
– Addresses made up of Cylinder, Head and Sector number (CHS)
to refer to specific locations on the disk
• Host should be aware of the geometry of each disk used
• Logical Block Addressing (LBA) with CHS is a good alternative
18-Feb-20 74 of 162
Physical Disk Structures (contd.)
• Logical Block Addressing (1/3)
– Traditional method for accessing peripherals on SCSI, FC, and
newer ATA disks.
– Simplifies addressing by a using a linear address for accessing
physical blocks of data.
• Host only needs to know the size of disk drive (number of blocks)
– Disk controller translates/maps address from LBA to CHS
– Block numbering starts at the beginning of a cylinder and
continues until the end of that cylinder
• Logical blocks are mapped to physical sectors on a 1:1 basis.
• Each block will have its own unique address
18-Feb-20 75 of 162
Physical Disk Structures (contd.)
• Logical Block Addressing (2/3)
– E.g. The true capacity of the 500 GB drive is 465.7 GB, which is
in excess of 976,000,000 blocks. Each block will have its own
unique address.
• As in next slide, the drive shows 8 sectors per track, 8 heads, and
4 cylinders => Total of 256 blocks (8 x 8 x 4). The illustration on
the right shows the block numbering, which will range from 0 to
255.
18-Feb-20 76 of 162
Physical Disk Structures (contd.)
• Logical Block Addressing (3/3)
18-Feb-20
Physical Address = CHS
(Cylinder, Head and Sector number)
Logical Block Address = Block #
Sector
Cylinder
Head
Block 0
Block 16
Block 32
Block 48
Block 8
(lower surface)
77 of 162
Physical Disk Structures (contd.)
• What the Host sees?
– Disk Partitioning
• Partitioning divides the disk into logical containers (known as
volumes), each of which can be used for a particular purpose
• Partitions define the disk layout and partition size impacts disk
space utilization
– Partitions are generally created when the hard disk is initially set up
on the host
• Partitions are created from groups of contiguous cylinders
– A large physical drive could be partitioned into multiple Logical
Volumes (LV) of smaller capacity
– Several small physical drives can be concatenated together by a
volume manager and presented as one logical volume.
• The host file-system accesses logical volumes, with no knowledge
of the physical structure
18-Feb-20 78 of 162
Physical Disk Structures (contd.)
• What the Host sees?
18-Feb-20
A
One Logical VolumeMultiple Logical Volumes
A
B
C
D
79 of 162
Disk Drive Performance (1/8)
• Seek time
– Time taken to position the R/W heads radially across the platter
(measured in ms)
– Seek time specifications
• Full Stroke - Time taken to move across the entire width of the disk,
from the innermost track to the outermost
• Average – Time taken to move from one random track to another,
– Full stroke/3
– Typical range in modern disks: 3 to 15 ms
• Track-to-Track – Time taken to move between adjacent tracks
– Seek time has more impact on reads of random tracks on the
disk rather than on adjacent tracks
18-Feb-20 80 of 162
Disk Drive Performance (2/8)
• Seek Time (contd.)
– Seek time can be improved by short-stroking the drive
• Write data only to a subset (inner or outer tracks) of the available
cylinders and treat the drive as though it has a lower capacity
• E.g. 500 GB drive is set up to use only the first 40% of the
cylinders, and is treated as a 200 GB drive
18-Feb-20 81 of 162
Disk Drive Performance (3/8)
• Rotational Speed/Latency
– Actuator moves R/W head over the platter to a particular track,
while the platter spins to position the particular sector in the
track under the R/W head
– Time taken by the platter to rotate and position the data under
the R/W head (measured in ms)
– It is dependent upon the rotation speed of the spindle and is
– ½ the time taken for a full rotation
– Rotational latency has more of impact on reads/writes of
random sectors on the disk rather than on adjacent sectors
– E.g.: Rotational latency value
• 5.5ms for 5400 rpm drive
• 2.0ms for 15000 rpm drive
18-Feb-20 82 of 162
Disk Drive Performance (4/8)
• Command Queuing
– Time is wasted if commands are processed as they are
received and the R/W head passes over data that will be
needed one or two requests later
– Drive manufacturers include logic that analyzes where data is
stored on the platter relative to data access requests. Requests
are then reordered to make best use of the data’s layout on
the disk (physical level of the disk)
– Also known as Multiple Command Reordering/Optimization,
Command Queuing and Reordering, Native Command Queuing
or Tagged Command Queuing
– Command queuing can also be performed by the storage
system that uses the disk
18-Feb-20 83 of 162
Disk Drive Performance (5/8)
• Command Queuing (contd.)
18-Feb-20
Request 1
Request 2
Request 3
Request 4
1234
Request 1
Request 2
Request 3
Request 4
1324
Without Command Queuing
With Command Queuing
1
2
3
4
1
2
3
4
84 of 162
Disk Drive Performance (6/8)
• Data Transfer Rate
– Data transfer during Read from the drive (Write operation?)
• Disk platters -> Heads -> Drive's internal buffer -> Through the
interface (HBA) to the rest of the system
– Rate of data transferred (in MBps) by the drive to the HBA
– Internal transfer rate
• Rate of data transferred from Disk surface to the R/W heads on a
single track of one surface of the disk
• Few factors (E.g. Seek time) influence sustained internal DTR
• Internal DTR will almost always be lower than External DTR
– External transfer rate
• Rate of data transferred through the interface
• Generally advertised speed of interface (E.g. 133 MBps for ATA/133)
• Sustained external DTR will be lower than the interface speed
18-Feb-20 85 of 162
Disk Drive Performance (7/8)
• Data Transfer Rate (contd.)
18-Feb-20
Interface BufferHBA
Disk Drive
Internal transfer rate
measured here
External transfer rate
measured here
86 of 162
Disk Drive Performance (8/8)
• Drive Reliability
– Measured with Mean Time Between Failure (MTBF)
• Amount of time that one can anticipate a device to work before
an incapacitating malfunction occurs (associated with Service Life
of the drive)
• It is based on averages and therefore used merely to provide
estimates. MTBF is measured in hours (E.g. 750,000 hours)
• It is based on an aggregate analysis of a huge number of drives
• It is a statistical method developed by the US Military as a way of
estimating maintenance levels required by various devices.
• MTBF is tested by artificially aging the drives by subjecting them
to stressful environments such as high temperatures, high
humidity, fluctuating voltages, etc.
18-Feb-20 87 of 162
Module – 4/5
RAID Arrays
Introduction (1/2)
• Disk Array
– Collection of Disk Drives for increased capacity, but with no
added intelligence
• RAID (Redundant Array of Independent Disks)
– Disk array + Controller (added intelligence)
18-Feb-20
RAID
Controller
RAID Array
Host
89 of 162
Introduction (2/2)
• RAID arrays enables you to
– Increase capacity
– Provide higher availability or life expectancy (in case of drive
failure measured with MTBF)
– Increase I/O performance (through parallel access)
– Streamlined management of storage devices
• Note:
– Traditionally RAID (Redundant Array of Inexpensive Disks) -
Data was stored on large & expensive disk drives (called SLED,
or Single Large Expensive Disk)
18-Feb-20 90 of 162
RAID Components (1/3)
• Sub-enclosures or Physical arrays
– Hold a fixed number of physical disks, power supply, and other
supporting hardware
• Logical Arrays or RAID set
– Logical Association of subset/group of disks within RAID array
• Several physical disks can be concatenated to make large logical
volumes (e.g., for databases)
• Single physical disk can be divided to create smaller areas (e.g.,
for logging)
– OS may view it as if they were regular Disk Volumes
– Simplify management of a huge number of disks
18-Feb-20 91 of 162
RAID Components (2/3)
18-Feb-20
RAID
Controller
Logical
Array
Logical
Array
Physical
Array
RAID Array
Host
92 of 162
RAID Components (3/3)
• No. of Logical & Physical Arrays
– Depends entirely on RAID level(s) & specific vendor
implementation
– Mostly in 1:1 ratio. However, you could have 1:N or N:1 ratios
• Array management software implemented in RAID systems
handles:
– Management and control of disk aggregations (e.g. volume
management)
– Translation of I/O requests between logical & physical arrays
– Error correction when disk failures occur
18-Feb-20 93 of 162
Data Organization: Strips & Stripes (1/2)
• Strips
– Contiguously addressed blocks inside each disk of a RAID set
• Stripes
– Set of aligned strips that spans across all disks within RAID set
18-Feb-20
Stripe 1
Stripe 2
Stripe 3
Strips
94 of 162
Data Organization: Strips & Stripes (2/2)
• Strip size or Stripe depth
– Describes no. of blocks in a strip & Max. amount of data R/W
in a single disk of the RAID set before next disk is accessed
– Data access may start from the beginning of the strip
– All strips in a stripe have the same number of blocks
– Decreasing strip size means that data is broken into smaller
pieces when spread across the disks
• Stripe size
– Describes number of data blocks in a stripe
– Stripe Size = Strip size x No. of data disks
• Stripe width
– Refers to the number of data strips in a stripe (OR) number of
data disks in a stripe
18-Feb-20 95 of 162
RAID Performance: Striping (1/2)
• Striping distributes data across the disks in the array and
permits use of multiple independent disks for multiple and
concurrent R/W
• R/W large amount of data
– Write: 1st piece is sent to 1st drive, 2nd piece to 2nd drive, etc.
– Read: Pieces are put back together again
• Based on RAID level & vendor-specific implementation,
striping can occur at block (or block multiple) or byte level
• Notes on striping
– Higher stripe width – higher no. of drives – better performance
– Striping is transparent to the OS of host (handled by controller)
18-Feb-20 96 of 162
RAID Performance: Striping (2/2)
18-Feb-20
Logical Array
LUN
(Logical Unit No.)
RAID
Controller
Host
97 of 162
RAID Redundancy: Mirroring (1/2)
• Redundancy improves fault tolerance
• Mirroring uses multiple drives that hold identical copies of
the data (usually 2 drives)
– Every write to a data disk is also a write to mirror disk(s),
containing the same data
– If a disk fails, RAID controller uses the mirror drive for data
recovery & continuous operation. Data on a replaced drive is
rebuilt from the mirrored drive
18-Feb-20 RAID Array
Mirrored
Disk
RAID
Controller
Host
98 of 162
RAID Redundancy: Mirroring (2/2)
• Mirroring is transparent to the attached host
• Benefits
– Fast recovery from a failure
– Improved read performance
• Drawbacks
– Degrades write performance because each block of host data
is written to multiple disks
– High cost of data protection due to the need for multiple disks
18-Feb-20 99 of 162
RAID Redundancy: Parity (1/4)
• Parity is a redundancy check mechanism that also ensures
data protection
• Like striping, parity is generally a function of the RAID
controller and is transparent to the host
• Parity can be thought as the updated sum of data on the
other disks in the RAID set
– Each time data is updated, the parity will be updated as well,
so that it always reflects the current sum of the data on the
other disks
• Parity information can either be
– Stored on a separate, dedicated drive
– Distributed with the data across all the drives in the array
18-Feb-20 100 of 162
RAID Redundancy: Parity (2/4)
18-Feb-20
Parity Disk
0
8
4
1
9
5
2
10
6
3
11
7
0 1 2 3
8 9 10 11
4 5 6 7
RAID
Controller
Host
101 of 162
RAID Redundancy: Parity (3/4)
• Parity is calculated on a per stripe basis
• On disk failure
– Value of its data is recalculated by using parity information and
data on the surviving disks
• Request for data by host from failed disk requires that data to be
recalculated before it can be sent. This recalculation is time-
consuming, and will decrease the performance of the RAID set
– Note: Hot Spare Drives provide a way to minimize the disruption
caused by a disk failure
• On parity disk failure
– Value of its data (parity) is recalculated by using data disks and
then saved when failed disk is replaced with a new disk
18-Feb-20 102 of 162
RAID Redundancy: Parity (4/4)
18-Feb-20
Parity
Data
Data
Data
Data
4
2
3
5
14
5 + 3 + 4 + 2 = 14
The middle drive fails:
5 + 3 + ? + 2 = 14
? = 14 – 5 – 3 – 2
? = 4
RAID Array
103 of 162
RAID Levels
• There are some standard RAID configuration levels, each of
which has benefits in terms of performance, capacity, data
protection, etc.
• Commonly used levels or combinations of levels
– RAID 0 – Striped Array with No Fault Tolerance
– RAID 1 – Disk Mirroring
– RAID 3 – Parallel Access Array with Dedicated Parity Disk
– RAID 4 – Striped Array with Independent Disks and a
Dedicated Parity Disk
– RAID 5 – Striped Array with Independent Disks and Distributed
Parity
– Combinations of levels [RAID 1+0, RAID 0+1, etc.]
18-Feb-20 104 of 162
RAID 0 - Striping (1/2)
• Stripes the data across drives in array without generating
redundant data
• Performance: Better than JBOD because it uses Striping
– Performance is further improved when data is striped across
multiple controllers with only one drive per controller
• Throughput: Very high when I/O sizes are small
• Data Protection
– No Parity or Mirroring (Hence no fault tolerance)
– Extremely difficult to recover data
• Applications
– Those that need high bandwidth or high throughput but where
data is not critical (E.g. Temporary storage or spool areas)
18-Feb-20 105 of 162
RAID 0 - Striping (2/2)
18-Feb-20
RAID
Controller
Block 4 Block 4Block 3 Block 3Block 2 Block 2Block 1 Block 1Block 0 Block 0
Host
106 of 162
RAID 1 – Mirroring & Fault Tol. (1/3)
• Uses mirroring to improve fault tolerance
– Every write to a data disk is also a write to the mirror disk(s)
– This is transparent to the host
– If a disk fails, the disk array controller uses the mirror drive for
data recovery and continuous operation
• A RAID 1 group consists of 2 (typically) or more disk modules
• Benefits
– High data availability
– High Throughput or I/O rate (small block size)
• Drawbacks
– Total no. of disks in array equals 2 times the data (usable) disks
• i.e. Overhead cost = 100%, Usable storage capacity = 50%
18-Feb-20 107 of 162
RAID 1 – Mirroring & Fault Tol. (2/3)
• Performance
– Improved Read performance but degrades Write performance
• Data Protection
– Improved fault tolerance over RAID 0
• Cost
– Expensive due to extra capacity required to duplicate data
• Disks: At least two disks
• Maintenance: Low complexity
• Applications
– Those that need High availability (E.g. Accounting, Payroll,
Finance)
18-Feb-20 108 of 162
RAID 1 – Mirroring & Fault Tol. (3/3)
18-Feb-20
RAID
Controller
Block 1 Block 1Block 1Block 0 Block 0Block 0
Host
109 of 162
RAID 0+1 – Striping & Mirroring (1/3)
• Combines speed of RAID 0 with redundancy of RAID 1
• RAID 0+1 is implemented as a mirrored array whose basic
elements are RAID 0 stripes
• Benefits
– Medium data availability
– High Throughput or I/O rate (small block size)
– Ability to withstand multiple drive failures as long as they
occur on the same stripe
• Drawbacks
– Total no. of disks in array equals two times the data disks, with
overhead cost equaling 100%
18-Feb-20 110 of 162
RAID 0+1 – Striping & Mirroring (2/3)
• Data Protection: Medium reliability
• Disks: Even no. of disks (Minimum 4 disks to allow striping)
• Cost: Very expensive because of the high overhead
• Performance
– High I/O rates
– Writes are slower than Reads because of mirroring
• Applications
– Imaging
– General file server
18-Feb-20 111 of 162
RAID 0+1 – Striping & Mirroring (3/3)
18-Feb-20
RAID
Controller
Block 3 Block 3Block 3Block 2 Block 2Block 2Block 1 Block 1Block 1Block 0 Block 0Block 0
Host
112 of 162
RAID 1+0 – Mirroring & Striping (1/3)
• RAID 1+0 (or RAID 10, RAID 1/0, or RAID A) also combines
the speed of RAID 0 with the redundancy of RAID 1
• RAID 1+0 is implemented as a striped array whose individual
elements are RAID 1 arrays - mirrors
• Benefits (almost similar to RAID 0+1)
– High data availability
– High Throughput or I/O rate (small block size)
– Ability to withstand multiple drive failures as long as they
occur on different mirrors
• Drawbacks (almost similar to RAID 0+1)
– Total no. of disks in arrays equals two times the data disks,
with overhead cost equaling 100%
18-Feb-20 113 of 162
RAID 1+0 – Mirroring & Striping (2/3)
• Data Protection: High reliability
• Disks: Even no. of disks (Minimum 4 disks to allow striping)
• Cost: Very expensive because of the high overhead
• Performance
– High I/O rates achieved using multiple stripe segments
– Writes are slower than Reads because they are mirrored
• Applications
– Databases requiring high I/O rates with random data
– Applications requiring maximum data availability
18-Feb-20 114 of 162
RAID 1+0 – Mirroring & Striping (3/3)
18-Feb-20
RAID
Controller
Block 3 Block 3Block 3Block 2 Block 2Block 2Block 1 Block 1Block 1Block 0 Block 0Block 0
Host
115 of 162
RAID 0+1 vs. RAID 1+0
• Benefits are identical under normal operations
• Basic Element: Mirrored pair (RAID 1+0), Stripe (RAID 0+1)
• At drive failure the rebuild operations are very different
– In RAID 1+0 rebuild only the mirror. i.e. The disk array controller
copies data from one surviving disk to the replacement disk
– In RAID 0+1 rebuild entire stripe. i.e. The disk array controller
copies data from each disk in the healthy stripe to equivalent
disk in the failed stripe
• Note 1: As the stripe has no protection (RAID 0) the entire stripe is
faulted even if single drive in it fails
• Note 2: This causes increased and unneeded I/O load on backend &
also makes the RAID set more vulnerable to a second disk failure
• RAID 0+1 is less common & a poorer solution
18-Feb-20 116 of 162
RAID 3 (1/3)
• Parallel Access Array with Dedicated Parity Disk
• RAID 3 stripes data for high performance and uses parity for
improved fault tolerance
– Data is striped across all the disks but one in the array
– Parity information is stored on a dedicated drive, so that data
can be reconstructed if a drive fails
• R/W data to all disks in parallel
– There are no partial writes that update one out of many strips
in a stripe
• Benefits
– Total no. of disks is less than in a mirrored solution
– Good throughput/bandwidth on large data transfers
18-Feb-20 117 of 162
RAID 3 (2/3)
• Drawbacks
– Poor efficiency in handling small data blocks (not well suited to
transaction processing applications)
– Data is lost if multiple drives fail within the same RAID 3 Group
• Performance
– High data R/W transfer rate. Disk failure has a significant
impact on throughput. Rebuilds are slow.
• Data Protection: Use of parity for improved fault tolerance
• Striping: Byte level to multiple block level depending on
vendor implementation
• Applications
– Those which need large sequential data accesses (e.g. Medical
and geographic imaging)
18-Feb-20 118 of 162
RAID 3 (3/3)
18-Feb-20
RAID
Controller
Block 1
Block 2
Block 3
P 0 1 2 3
Block 0Block 3Block 2Block 1Block 0
Parity
Generated
Host
119 of 162
RAID 4 (1/3)
• Striped with Independent Disks & a Dedicated Parity Disk
• RAID Level 4 stripes data for high performance and uses
parity for improved fault tolerance (same as RAID 3)
– Data is striped across all the disks but one in the array
– Parity information is stored on a dedicated disk so that data
can be reconstructed if a drive fails.
• Data disks are independently accessible, and multiple R/W
can occur simultaneously
• Benefits
– Total no. of disks is less than in a mirrored solution
– Good read throughput & reasonable write throughput
18-Feb-20 120 of 162
RAID 4 (2/3)
• Drawbacks (same as RAID 3)
– Dedicated parity drive can be a bottleneck when handling
small data writes (not well suited to transaction processing
applications)
– Data is lost if multiple drives fail within the same RAID 4 Group
• Performance
– High data read transfer rate. Poor to medium write transfer
rate. Disk failure has a significant impact on throughput
• Data Protection: Use of parity for improved fault tolerance
• Striping: Usually at the block (or block multiple) level
• Applications: General purpose file storage
• Note: RAID 4 is much less commonly used than RAID 5
18-Feb-20 121 of 162
RAID 4 (3/3)
18-Feb-20
RAID
Controller
P 0 1 2 3
Block 0Block 0
Block 0
Block 4
Block 1
Block 5
Block 2
Block 6
Block 3
Block 7
P 0 1 2 3
P 4 5 6 7
Parity
Generated
Block 0
P 0 1 2 3
Host
122 of 162
RAID 5 (1/4)
• Striped Array with Independent Disks and Distributed Parity
• RAID 5 performs independent R/W operations
• No dedicated parity drive (data and parity information is
distributed across all drives in the group)
• Benefits
• Most versatile RAID level
• A transfer rate greater than that of a single drive but with a
high overall I/O rate
• Good for parallel processing (multi-tasking) applications or
environments
• Cost savings due to the use of parity over mirroring
18-Feb-20 123 of 162
RAID 5 (2/4)
• Drawbacks
• Slower transfer rate than RAID 3
• Small writes are slow, because they require a read-modify-write
(RMW) operation
• There is degradation in performance in recovery/reconstruction
• Data loss if multiple drives within the same group fails
• Performance
• Good aggregate transfer rate (High read data transfer rate,
medium write data transfer rate)
• Low ratio of parity disks to data disks
18-Feb-20 124 of 162
RAID 5 (3/4)
• Data Protection
• Single disk failure puts volume in degraded mode
• Difficult to rebuild (as compared to RAID level 1)
• Disks
• 5-disk and 9-disk groups are popular. Most implementations
allow other RAID set sizes
• Striping: Block or multiple-block level
• Applications
• File and application servers, database servers, WWW, email,
and News servers
18-Feb-20 125 of 162
RAID 5 (4/4)
18-Feb-20
Block 0
P 0 1 2 3
Block 7
RAID
Controller
P 0 1 2 3
Block 0Block 4Block 0
Block 1
Block 5
Block 2
Block 6
Block 3
Parity
Generated
Block 0
P 0 1 2 3
Block 4
P 4 5 6 7P 4 5 6 7
Block 4
P 4 5 6 7
Block 4
Parity
Generated
Host
126 of 162
RAID Implementations (1/2)
• Hardware RAID
– Implemented by intelligent storage systems external to the
host (or Host has intelligent controllers that offload RAID
management functions from the host)
• Software RAID
– Describes RAID that is managed by the host CPU
• Disadvantage
– It uses host CPU cycles that would be better utilized to process
application data
– Many host CPUs and OS do not perform I/O functions very
efficiently, so the host is ill-suited for the task
– Often looks attractive initially because it does not require the
purchase of additional hardware. The initial cost savings are soon
exceeded by the expense of using a costly server to perform I/O
operations that it performs inefficiently at best
18-Feb-20 127 of 162
RAID Implementations (2/2)
• Hardware (usually a specialized disk controller card)
– Controls all drives that are attached it
– Performs all RAID-related functions including volume
management
– Array(s) appear to the host operating system as a regular disk
drive
– Dedicated cache to improve performance
– Generally provides some type of administrative software
• Software (Generally runs as part of OS)
– Volume management and performed by the server
– Provides more flexibility for hardware, which can reduce cost
– Performance is dependent on CPU load & server performance
– Has limited functionality
18-Feb-20 128 of 162
Hot Spares (1/3)
• Hot spare is an idle component (often a drive) in a RAID array
that becomes temporary replacement of a failed component
• For example:
– The hot spare takes the failed drive’s identity in the array
– Data recovery takes place based on the RAID implementation
(whether Parity OR Mirroring was used)
– The failed drive is replaced with a new drive at some time later
– One of the following occurs:
• The hot spare replaces the new drive permanently (A new hot
spare must be configured on the system)
• When the new drive is added to the system, data from the hot
spare is copied to the new drive (The hot spare returns to its idle
state, ready to replace the next failed drive)
18-Feb-20 129 of 162
Hot Spares (2/3)
• Note: The hot spare drive needs to be large enough to
accommodate the data from the failed drive
• Hot spare replacement can be Automatic or User initiated
– Automatic: When a disk’s recoverable error rates exceed a
predetermined threshold, the disk subsystem tries to copy
data from the failing disk to a spare one. If this task completes
before the damaged disk fails, the subsystem switches to the
spare and marks the failing disk unusable. (If not it uses parity
or the mirrored disk to recover the data, as appropriate).
– User initiated: This gives the administrator control when to
rebuild (e.g., rebuild overnight so as not to degrade system
performance). However, the system is vulnerable to another
failure because the hot spare is now unavailable. Some
systems implement multiple hot spares to improve availability.
18-Feb-20 130 of 162
Hot Spares (3/3)
18-Feb-20
RAID
Controller
131 of 162
Hot Swap (1/2)
• Like hot spares, hot swaps enable a system to recover
quickly in the event of a failure. With a hot swap the user
can replace the failed hardware (such as a controller)
without having to shut down the system
• Note
– A warm swap occurs when the system needs to be shut down,
but power does not need to be removed in order to replace the
failed component
– A cold swap occurs when the power must be removed as well
– Some systems have the ability to auto-swap without user
intervention
18-Feb-20 132 of 162
Hot Swap (2/2)
18-Feb-20
RAID
Controller
RAID
Controller
RAID
Controller
133 of 162
Module – 5/5
Intelligent Disk Storage Systems
Intelligent Storage System?
• A disk storage system distributes data over several devices
and manages data access
• vs. Individual storage devices
– Increased capacity
– Improved performance
– Easier data management
– Better data availability
– More robust backup/restore capabilities
– Improved flexibility and scalability
• Categories of Arrays
– Monolithic (Integrated) Storage Systems
– Modular Storage Systems
18-Feb-20 135 of 162
Monolithic (Integrated) Storage Systems (1/3)
• Aimed at the Enterprise level, centralizing data in a powerful
system with hundreds of drives
• Also called: Integrated arrays or Enterprise arrays or Cache
centric arrays
• The system is contained within a single or interconnected
frame (for expansion) and can scale to support increases in
connectivity, performance, and capacity as required
• Can handle large amounts of concurrent I/Os on very large
data applications
• Limitations
– High upfront costs limiting their applicability to only the most
mission critical applications
– Take up a large amount of space in the data center
18-Feb-20 136 of 162
Monolithic (Integrated) Storage Systems (3/3)
18-Feb-20
Monolithic
FC Ports
Port Processors
Cache
RAID Controllers
137 of 162
Monolithic (Integrated) Storage Systems (3/3)
• Characteristics
– Large storage capacity
– Large Cache to store IOs before writing to disk
– Redundancy (improves data protection and availability)
– More robust and fault tolerant due to many built-in features
– Connect to mainframes or very powerful open systems hosts
– Multiple front-end ports (connectivity to multiple servers)
– Multiple back-end FC/SCSI RAID controllers (manage disk
processing)
– Expensive
18-Feb-20 138 of 162
Modular Storage Systems (1/3)
• Aimed at small companies/department level
• Also called: Midrange or Departmental storage systems
• Provide storage to a smaller number of Windows or Unix
servers than the larger Integrated storage systems
• Typically designed with two controllers, each of which
contains host interfaces, cache, RAID processors, and disk
drive interfaces.
18-Feb-20 139 of 162
Modular Storage Systems (2/3)
18-Feb-20
Rack
Servers
Disk Modules
Control Module
with Disks
FC Switches
Modular
Host Interface
Cache
RAID
Controller A
Host Interface
Cache
RAID
Controller B
140 of 162
Modular Storage Systems (3/3)
• Characteristics
– Smaller disk capacity
– Less global cache
– Limited redundancy and connectivity
– Can start with a smaller number of disks and scale as needed
– Performance can degrade as capacity increases
– Fewer front-end ports for connection to servers
– Cannot connect to mainframes
– Usually have separate controllers from the disk array
– Takes up less floor space and costs less
18-Feb-20 141 of 162
Elements of Intelligent Storage Systems
• Intelligent storage systems are organized into the following
areas:
– Front End
– Cache
– Back End
– Physical disks
18-Feb-20 142 of 162
Elements of Intelligent Storage Systems
18-Feb-20
Intelligent Storage System
Cache
Front-End Back-End
Cache
Physical Disks
Host Connectivity
• Intelligent storage systems are organized into following
areas:
– Front End
– Cache
– Back End
– Physical disks
143 of 162
Intelligent Storage System: Front-end (1/3)
18-Feb-20
Note: Include redundancy in the channels to and from the ports.
Intelligent Storage System
Ports
Host Connectivity
Controllers
Front-End Back-End
Cache
Physical Disks
144 of 162
Intelligent Storage System: Front-end (2/3)
• Provides communication between storage system and host
• Main parts
– Ports & Controllers
• Storage Ports
– External interfaces for connectivity to host
– Each port has processing logic responsible for executing
appropriate transport protocol for storage connections.
• E.g. SCSI, FC, or iSCSI
– To maintain data availability, the front end of the storage
systems generally have multiple ports.
• Provides redundancy in case of a failure
• Balance the load when the system is experiencing heavy use.
• Mid-range storage system: ranges from 1-8 (Typically 4)
• Large monolithic array: about 64 or 12818-Feb-20 145 of 162
Intelligent Storage System: Front-end (3/3)
• Controllers
– Available behind the storage ports to route data to the cache
via the internal data bus
– Sends an acknowledgement message back to host as soon as
the cache receives the data
18-Feb-20 146 of 162
Front-End Command Queuing (1/3)
18-Feb-20
F
R
O
N
T
E
N
D
Request 1
Request 2
Request 3
Request 4
1234
F
R
O
N
T
E
N
D
Request 1
Request 2
Request 3
Request 4
1324
Without Command Queuing
With Command Queuing
1
2
3
4
1
2
3
4
147 of 162
Front-End Command Queuing (2/3)
• Processes multiple concurrent commands based on disk data
organization, regardless of the order in which the commands
were received
• Command queuing software
– Reorders commands, Assigns a tag to each command to
identify when it is executed (efficiently)
– Some disk drives (SCSI & FC disks) are intelligent enough to
manage their own command queuing
• Intelligent storage systems may make use of this native disk
intelligence, and may supplement it with queuing performed by
the controller
• Queue Depth Setting
– Defines number of outstanding requests that are active at the
same time in the queue
– Many manufactures have configurable queue depths
18-Feb-20 148 of 162
Front-End Command Queuing (3/3)
• Common Command queuing algorithms
– FIFO
• Commands are executed in the order in which they arrive
• Limitation: Identical to having no queuing – Inefficient
– Seek Time Optimization
• Faster than FIFO
• Optimizing seek times only, without regard for rotational latency,
will not normally produce the best results
– E.g. Consider two requests on cylinders that are very close to each
other, but in very different places within the track. Meanwhile,
there might be a third sector that is a few cylinders further away
but much closer overall to the location of the first request which
could be considered
– Access Time Optimization
• Combines seek time optimization with an analysis of rotational
latency for optimal performance
18-Feb-20 149 of 162
Intelligent Storage System: Cache
• Cache is a high speed memory
– Improves system performance by isolating hosts from
mechanical delays associated with physical disks (due to seek
times and rotational latency) and minimizes delay (< ms)
– Improves performance of R/W
18-Feb-20
Intelligent Storage System
Host Connectivity
Front-End Back-End
Cache
Physical Disks
150 of 162
Intelligent Storage System: Back End (1/3)
18-Feb-20
Host Connectivity
PortsControllers
Front-End Back-End
Cache
Physical Disks
Intelligent Storage System
151 of 162
Intelligent Storage System: Back-End (2/3)
• Data from Cache gets transferred through I/O bus to back
end, where it is routed to the correct drive
• Disk Controllers provides communication with disks for R/W
operations
– Manages data transfer between I/O bus and disks
– Handles device addressing, translating logical blocks into
physical locations on the disk
– Provides additional and limited temporary storage for data
– Provides error detection and correction – often in conjunction
with similar features on the disks
– Allows multiple devices to communicate to HBA on the host
– Facilitates performance enhancement
18-Feb-20 152 of 162
Intelligent Storage System: Back-End (3/3)
• Disk controllers
– Implemented as hardware with firmware that communicates
with disks via disk interface, sending commands to initiate
R/W process on disks
– The design of the controller is vendor specific
• Multiple Disk Controllers
– Provide maximum data protection and availability (with
alternative path in case of a failure)
• Reliability is enhanced if the disks used are dual-ported; each
disk port can connect to a separate controller. Having more than
one port on each controller will provide additional protection in
the event of certain types of failure
– Facilitate load balancing
18-Feb-20 153 of 162
Intelligent Storage System: Physical Disks (1/2)
18-Feb-20
Host Connectivity
Front End Back End
Cache
Physical Disks
Intelligent Storage System
154 of 162
Intelligent Storage System: Physical Disks (2/2)
• Physical disks are where the storage actually takes place
• Drives are connected to controller with either SCSI (SCSI
interface and copper cable) or FC or copper cables
• This could be a single disk drive or a more complex RAID set
– ATA drives are used when a storage system is used in
environments where performance is not critical
• Connection: Parallel ATA (PATA) or serial ATA (SATA) copper cables
– Mixture of SCSI or FC drives and ATA drives
• Higher performing drives are used for application data storage
• Slower ATA drives are used for backup and archiving
18-Feb-20 155 of 162
I/O Example: Read Requests
Intelligent Storage System
Host Connectivity
Front End Back End
Cache
Physical Disks
18-Feb-20 156 of 162
I/O Example: Write Requests
Intelligent Storage System
Host Connectivity
Front End Back End
Cache
Physical Disks
18-Feb-20 157 of 162
What the Host Sees
Intelligent Storage System
LUN 0
LUN 1
LUN 2
LUN 0
LUN 1
LUN 2
Host
Host
Back End
Physical Disks
Cache
18-Feb-20 158 of 162
The Host and Logical Device Names
Host
Volume
Manager
Host
/dev/rdsk/c1t1d0
/dev/rdsk/c1t1d1
.PhysicalDrive0
Volume
Manager
Intelligent Storage System
LUN 0
LUN 1
LUN 2
LUN 0
LUN 1
LUN 2
Back End
Physical Disks
Cache
18-Feb-20 159 of 162
Disk Organization in a Storage System
Intelligent Storage System
LUN 0
LUN 1
Host
Host
LUN 0
LUN 1
Back End Physical Disks
Cache
18-Feb-20 160 of 162
References
• Information Storage & Management, EMC Education
Services (E – Book)
18-Feb-20 161 of 162
STORAGE SYSTEM ARCHITECTURE OVERVIEW

More Related Content

What's hot

Data Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfData Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfChristalin Nelson
 
Basics of storage Technology
Basics of storage TechnologyBasics of storage Technology
Basics of storage TechnologyLopamudra Das
 
Data Modeling - Enhanced ER diagrams & Mapping.pdf
Data Modeling - Enhanced ER diagrams & Mapping.pdfData Modeling - Enhanced ER diagrams & Mapping.pdf
Data Modeling - Enhanced ER diagrams & Mapping.pdfChristalin Nelson
 
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSPARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSRaheemUnnisa1
 
Information Storage and Management notes ssmeena
Information Storage and Management notes ssmeena Information Storage and Management notes ssmeena
Information Storage and Management notes ssmeena ssmeena7
 
Trends in Database Management
Trends in Database ManagementTrends in Database Management
Trends in Database ManagementMarlon Jamera
 
Relational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdfRelational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdfChristalin Nelson
 
Exadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13cExadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13cAlfredo Krieg
 
Introduction to Unix operating system Chapter 1-PPT Mrs.Sowmya Jyothi
Introduction to Unix operating system Chapter 1-PPT Mrs.Sowmya JyothiIntroduction to Unix operating system Chapter 1-PPT Mrs.Sowmya Jyothi
Introduction to Unix operating system Chapter 1-PPT Mrs.Sowmya JyothiSowmya Jyothi
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overviewnomathjobs
 
Database replication
Database replicationDatabase replication
Database replicationArslan111
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsTony Pearson
 
Real time-system
Real time-systemReal time-system
Real time-systemysush
 

What's hot (20)

Data Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdfData Modeling - Entity Relationship Diagrams-1.pdf
Data Modeling - Entity Relationship Diagrams-1.pdf
 
Basics of storage Technology
Basics of storage TechnologyBasics of storage Technology
Basics of storage Technology
 
Data Modeling - Enhanced ER diagrams & Mapping.pdf
Data Modeling - Enhanced ER diagrams & Mapping.pdfData Modeling - Enhanced ER diagrams & Mapping.pdf
Data Modeling - Enhanced ER diagrams & Mapping.pdf
 
Raid
RaidRaid
Raid
 
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERSPARALLEL FILE SYSTEM FOR LINUX CLUSTERS
PARALLEL FILE SYSTEM FOR LINUX CLUSTERS
 
Storage Basics
Storage BasicsStorage Basics
Storage Basics
 
Information Storage and Management notes ssmeena
Information Storage and Management notes ssmeena Information Storage and Management notes ssmeena
Information Storage and Management notes ssmeena
 
Trends in Database Management
Trends in Database ManagementTrends in Database Management
Trends in Database Management
 
DAS RAID NAS SAN
DAS RAID NAS SANDAS RAID NAS SAN
DAS RAID NAS SAN
 
Storage
StorageStorage
Storage
 
Relational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdfRelational_Algebra_Calculus Operations.pdf
Relational_Algebra_Calculus Operations.pdf
 
Exadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13cExadata SMART Monitoring - OEM 13c
Exadata SMART Monitoring - OEM 13c
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
RAID LEVELS
RAID LEVELSRAID LEVELS
RAID LEVELS
 
Introduction to Unix operating system Chapter 1-PPT Mrs.Sowmya Jyothi
Introduction to Unix operating system Chapter 1-PPT Mrs.Sowmya JyothiIntroduction to Unix operating system Chapter 1-PPT Mrs.Sowmya Jyothi
Introduction to Unix operating system Chapter 1-PPT Mrs.Sowmya Jyothi
 
Storage Technology Overview
Storage Technology OverviewStorage Technology Overview
Storage Technology Overview
 
Database replication
Database replicationDatabase replication
Database replication
 
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its ApplicationsIBM Cloud Object Storage System (powered by Cleversafe) and its Applications
IBM Cloud Object Storage System (powered by Cleversafe) and its Applications
 
Process Management
Process ManagementProcess Management
Process Management
 
Real time-system
Real time-systemReal time-system
Real time-system
 

Similar to STORAGE SYSTEM ARCHITECTURE OVERVIEW

Understanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - Storage
Understanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - StorageUnderstanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - Storage
Understanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - Storageyaminohime
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemAlchemist095
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdfJESUNPK
 
PC hardware components ppt slide_week2.ppt
PC hardware components ppt slide_week2.pptPC hardware components ppt slide_week2.ppt
PC hardware components ppt slide_week2.pptvimala elumalai
 
Conceptual framework storage devices (2)
Conceptual framework   storage devices (2)Conceptual framework   storage devices (2)
Conceptual framework storage devices (2)Rajendra Sharma
 
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptxUNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptxLeahRachael
 
Introduction to Storage technologies
Introduction to Storage technologiesIntroduction to Storage technologies
Introduction to Storage technologiesKaivalya Shah
 
Advanced Storage Area Network
Advanced Storage Area NetworkAdvanced Storage Area Network
Advanced Storage Area NetworkSoumee Maschatak
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architectureSHAJANA BASHEER
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case studyLavanya G
 
IS740 Chapter 03
IS740 Chapter 03IS740 Chapter 03
IS740 Chapter 03iDocs
 

Similar to STORAGE SYSTEM ARCHITECTURE OVERVIEW (20)

Secondary storage devices
Secondary storage devicesSecondary storage devices
Secondary storage devices
 
Understanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - Storage
Understanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - StorageUnderstanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - Storage
Understanding Computers: Today and Tomorrow, 13th Edition Chapter 3 - Storage
 
Unit 4 DBMS.ppt
Unit 4 DBMS.pptUnit 4 DBMS.ppt
Unit 4 DBMS.ppt
 
Storage Devices
Storage DevicesStorage Devices
Storage Devices
 
Lecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file systemLecture 8 comp forensics 03 10-18 file system
Lecture 8 comp forensics 03 10-18 file system
 
19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf19IS305_U4_LP10_LM10-22-23.pdf
19IS305_U4_LP10_LM10-22-23.pdf
 
PC hardware components ppt slide_week2.ppt
PC hardware components ppt slide_week2.pptPC hardware components ppt slide_week2.ppt
PC hardware components ppt slide_week2.ppt
 
Conceptual framework storage devices (2)
Conceptual framework   storage devices (2)Conceptual framework   storage devices (2)
Conceptual framework storage devices (2)
 
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptxUNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
UNIT 4-UNDERSTANDING VIRTUAL MEMORY.pptx
 
Introduction to Storage technologies
Introduction to Storage technologiesIntroduction to Storage technologies
Introduction to Storage technologies
 
Ioppt
IopptIoppt
Ioppt
 
9781111306366 ppt ch7
9781111306366 ppt ch79781111306366 ppt ch7
9781111306366 ppt ch7
 
09. storage-part-1
09. storage-part-109. storage-part-1
09. storage-part-1
 
Advanced Storage Area Network
Advanced Storage Area NetworkAdvanced Storage Area Network
Advanced Storage Area Network
 
Thiru
ThiruThiru
Thiru
 
Storage Managment
Storage ManagmentStorage Managment
Storage Managment
 
Os7
Os7Os7
Os7
 
Linux kernel architecture
Linux kernel architectureLinux kernel architecture
Linux kernel architecture
 
I/O System and Case study
I/O System and Case studyI/O System and Case study
I/O System and Case study
 
IS740 Chapter 03
IS740 Chapter 03IS740 Chapter 03
IS740 Chapter 03
 

More from Christalin Nelson (18)

Packages and Subpackages in Java
Packages and Subpackages in JavaPackages and Subpackages in Java
Packages and Subpackages in Java
 
Bitwise complement operator
Bitwise complement operatorBitwise complement operator
Bitwise complement operator
 
Advanced Data Structures - Vol.2
Advanced Data Structures - Vol.2Advanced Data Structures - Vol.2
Advanced Data Structures - Vol.2
 
Deadlocks
DeadlocksDeadlocks
Deadlocks
 
CPU Scheduling
CPU SchedulingCPU Scheduling
CPU Scheduling
 
Process Synchronization
Process SynchronizationProcess Synchronization
Process Synchronization
 
Applications of Stack
Applications of StackApplications of Stack
Applications of Stack
 
Data Storage and Information Management
Data Storage and Information ManagementData Storage and Information Management
Data Storage and Information Management
 
Application Middleware Overview
Application Middleware OverviewApplication Middleware Overview
Application Middleware Overview
 
Network security
Network securityNetwork security
Network security
 
Directory services
Directory servicesDirectory services
Directory services
 
System overview
System overviewSystem overview
System overview
 
Storage overview
Storage overviewStorage overview
Storage overview
 
Sql commands
Sql commandsSql commands
Sql commands
 
Computer Fundamentals-2
Computer Fundamentals-2Computer Fundamentals-2
Computer Fundamentals-2
 
Computer Fundamentals - 1
Computer Fundamentals - 1Computer Fundamentals - 1
Computer Fundamentals - 1
 
Advanced data structures vol. 1
Advanced data structures   vol. 1Advanced data structures   vol. 1
Advanced data structures vol. 1
 
Programming in c++
Programming in c++Programming in c++
Programming in c++
 

Recently uploaded

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 

Recently uploaded (20)

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 

STORAGE SYSTEM ARCHITECTURE OVERVIEW

  • 1. STORAGE SYSTEM ARCHITECTURE Instructor Mr. S.Christalin Nelson AP(SG)/CIT
  • 2. At a Glance • Storage System Environment • Host Environment • Connectivity • Disk Storage – Physical Disks – RAID • Intelligent Disk Storage 18-Feb-20 2 of 162
  • 3. Storage System Environment • Storage has evolved from single internal disks to storage systems. • Storage system environment (group of components) provide storage, handle R/W data requests & data transmission • Components of Storage system Environment – Host: Interact with OS & applications that require data – Connectivity: Carries R/W commands and the data between the host and the storage devices – Storage: Devices where the data is stored • Storage environment has evolved along with changes in computing models. 18-Feb-20 3 of 162
  • 4. Module – 1/5 Host Environment
  • 5. Host Environment • Host? – Computers on which the applications for I/O reside – Laptops to Cluster of Servers 18-Feb-20 Laptop Server Group of Servers Mainframe 5 of 162
  • 6. Physical Components of Host (1/12) • Physical components – CPU, Internal Memory & Disk Devices, IO Devices – Bus: The physical components interact with one another through a Bus 18-Feb-20 Bus I/O Devices CPU Storage 6 of 162
  • 7. Physical Components of Host (2/12) 18-Feb-20 • CPU – Components – ALU – Control Unit – Register – Level-1 Cache 7 of 162
  • 8. Physical Components of Host (3/12) • CPU technology now mean systems typically come at least Dual Core, Quad Core or more processors (on one single chip) instead of the traditional one core per chip. • The total number of Cores can slot into a socket as before and a single heat sink and fan can keep everything to the right temperature. 18-Feb-20 8 of 162
  • 9. Physical Components of Host (4/12) 18-Feb-20 • Storage – Memory Modules • Semiconductor memory, High speed data access, Expensive • Example: RAM, ROM – Storage Devices • Magnetic or Optical media, Low speed data access, Cheaper • Example: – Magnetic: Tape, Floppy Disk, Hard Disk – Optical: CD, DVD, VCD 9 of 162
  • 11. Physical Components of Host (6/12) • Internal – Processor registers – fastest access (usually 1 CPU cycle) – Cache • L0 Micro operations cache – 6 kB • L1 Instruction cache – 131 kB • L1 Data cache – 131 kB, 751 GB/s • L2 Instruction & data (shared) – 1 MB, 215 GB/sec • L3 Shared cache – 6 MB, 100 GB/s • L4 Shared cache – 134 MB, 40 GB/s • Main memory (Primary) – RAM – GBs, 10 GB/s • Online Mass Storage – Disk storage (Secondary) – TBs, 2000 MB/s (2017) 18-Feb-20 KiB & MiB ? 1 KiB/s = 210 byte per sec 1 MiB/s = 220 byte per sec 11 of 162
  • 12. Physical Components of Host (7/12) • Offline Mass Storage – Nearline storage (Tertiary) – EBs, 160 MB/s(2013) • MAID – Offline storage • Floppy Disk, Optical Disk, Flash Memory, Magnetic Tape • Online vs Nearline vs Offline storage – Online storage is immediately available for I/O. – Nearline storage is not immediately available, but can be made online quickly without human intervention. – Offline storage is not immediately available, and requires some human intervention to bring online. • Tiered Storage – The lower levels of the hierarchy from disks downwards. 18-Feb-20 12 of 162
  • 13. Physical Components of Host (8/12) • Modern programming languages mainly assume 2 levels of memory: main memory & disk storage, though in assembly language and inline assemblers in languages such as C, registers can be directly accessed. • Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from OS): – Programmers are responsible for moving data between disk and memory through file I/O. – Hardware is responsible for moving data between memory and caches. – Optimizing compilers are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently. 18-Feb-20 13 of 162
  • 14. Physical Components of Host (9/12) • Storage Hierarchy 18-Feb-20 Speed Slow Fast Cost HighLow Tape Optical disk Magnetic disk RAM L2 cache L1 cache CPU registers 14 of 162
  • 15. Physical Components of Host (10/12) • RAM types – SRAM (Static RAM) – Bipolar, MOSFET, BiMOS – DRAM (Dynamic RAM) – DRAM, PSRAM, VRAM, FRAM, QLC – SDRAM (Synchronous DRAM) – SDR, RDRAM, DDR, eDRAM, DDR2/3/4, LPDDR2/3/4/5 – SGRAM (Synchronous Graphics RAM) – HBM (High Bandwidth Memory) 18-Feb-20 15 of 162
  • 16. Physical Components of Host (11/12) • Storage 18-Feb-20 … 0 1 2 3 n Data 0 Data n Data 2 Data 3 Data 1 Address Content Disk Memory 16 of 162
  • 17. Physical Components of Host (12/12) • I/O Devices – Human interface • Keyboard, Mouse, Monitor – Computer-computer interface • Network Interface Card (NIC) – Computer-peripheral interface • USB (Universal Serial Bus) port • Host Bus Adapter (HBA) 18-Feb-20 17 of 162
  • 18. Logical Components of Host (1/10) • Logical components – Software applications, Protocols, OS, File systems, Database 18-Feb-20 Host Applications Volume Management DBMS Mgmt Utilities File System Multi-pathing Software Device Drivers HBA HBA HBA OS 18 of 162
  • 19. Logical Components of Host (2/10) • Applications – Provide a point of interaction either between the user and the host or another system and the host – Most applications have storage requirements (short or long- term depending upon the application) • Operating system – Controls interaction between applications & storage systems – Monitors and responds to user actions and the environment – Organizes and controls the hardware components – Connects hardware components to the application program layer and the users – Manages system activities such as storage and communication 18-Feb-20 19 of 162
  • 20. Logical Components of Host (3/10) • Device drivers – Allow the OS to be aware of and use a standard interface to access and control a specific device (i.e., printer, speakers, mouse, keyboard, video, storage devices, etc.) – Provide appropriate protocols to host to allow device access • File System (and Files) – Provides a logical structure for data and methods for accessing that data • Hosts work with data stored in File System blocks • File system converts the user logical structures into host accessible blocks 18-Feb-20 20 of 162
  • 21. Logical Components of Host (4/10) • File system block – Smallest ‘container’ allocated to a file’s data – Each block is a contiguous area of physical disk capacity – Block Size depend on type of files being stored and accessed • Block size is Fixed or pre-defined by OS during storage system configuration. • Larger files will span multiple file system blocks (may not necessarily be contiguous on a physical disk) 18-Feb-20 21 of 162
  • 22. Logical Components of Host (5/10) • In multi-user, multi-tasking environments, file systems manage shared storage resources using – Directories, paths and structures • Identify file locations – Volume Managers • Hide the complexity of physical disk structures – File locking capabilities • Control access and data flow to and from file locations when used by potentially competing users or applications – Databases and data management components such as • Large, shared relational databases • Management of shared data storage 18-Feb-20 22 of 162
  • 23. Logical Components of Host (6/10) • Linear File Structure – The number of files on a system can be extensive and could quickly get out of hand • Hierarchical Structure with Directories – Also called as Folders in the Windows environment – Hold files as well as other directories – Hold information about files that they contain (Metadata) 18-Feb-20 23 of 162
  • 24. Logical Components of Host (7/10) • Metadata (Information or Data about the file) – Examples: In UNIX (UFS) • File type and permissions • Number of links • Owner and group IDs • Number of bytes in the file • Last file access & Last file modification – Example: In Windows (NTFS) • Time stamp and link count • File name • Access rights • File data • Index information & Volume information 18-Feb-20 24 of 162
  • 25. Logical Components of Host (8/10) • Journaling & Logging – Non-Journaling File system • At Write when System crashes, data or metadata can be lost – Use many separate writes to update their data and metadata – Journaling File System • Improves data integrity, system restart time (vs. non-journaling file systems) • Before operations are made to the file system, they are written to a separate area called a log or journal – May hold all data to be written (Physical Journal) – May hold only metadata (Logical Journal) • Disadvantage – slower than other file systems – Each file system update requires at least one extra write – to the log 18-Feb-20 25 of 162
  • 26. Logical Components of Host (9/10) • Volume Management – Optional intermediate layer between file system and physical disks – Aggregates several smaller disks to form a larger virtual disk • Virtual Disks are only visible to higher level programs and applications – Optimizes access to storage – Simplifies the management of storage resources 18-Feb-20 26 of 162
  • 27. Logical Components of Host (10/10) • Host Bus Adapter (HBA) – Add-on card (or) a chip on the motherboard of the host – Ports connect the Host to the storage devices – Has processing capability to handle some storage commands, thereby reducing the burden on the host CPU – Multiple HBAs 18-Feb-20 27 of 162
  • 28. Improving Data Availability at Host • Hosts can be configured to provide uninterrupted access to critical data through – Redundancy [Implemented using multiple HBAs] – Multi-path software [Server resident] • Utilizes available HBAs on the server to provide redundant/multiple communication paths between host and storage devices • It provides assured uninterrupted data transfers even in the event of a path failure and may also provide automatic load balancing – Clustering [Redundant host systems connected together] • Cluster members can be configured to transparently take over each others’ workload, with minimal or no impact to the user • If one host in the cluster fails, its functions will be assumed by surviving member(s). 18-Feb-20 28 of 162
  • 29. File Movement to/from Storage (Example) 18-Feb-20 Teacher Configures / Manages File System Files Mapped by File System to Course File(s) Reside in File System Blocks Disk Physical Extents Consisting of LVM Logical Extents Residing inMapped by LVM to Disk Sectors Managed by Disk Storage Subsystem File system blocks are mapped to Disk Sectors by OS in the absence of LVM 29 of 162
  • 31. Parts of Storage Environment • Based on Connectivity – Between hosts (or) between a host and peripheral (storage) devices – Physical components of Connectivity include • Bus, Port, Cables (Uses Optical and Copper media) • Connectors and plugs • Adapters – Host Bus Adapter (HBA) – enables devices to connect to a host’s internal bus system • NIC – enables simple network attachments to a host • Switches/hubs - Manage traffic within a network – Logical components of Connectivity • Communication protocols, Device Drivers 18-Feb-20 31 of 162
  • 32. Physical Components – Host with Internal Storage 18-Feb-20 32 of 162
  • 33. Bus Technology (1/4) • Bus? – Collection of paths that facilitate data transmission from one part of the computer to another – Physical components communicate across a bus by sending packages (packets) of data between the devices in Serial or Parallel Paths. • Serial communication: Bits travel one behind the other • Parallel communication: Bits can move along redundant paths simultaneously. 18-Feb-20 33 of 162
  • 34. Bus Technology (2/4) • Serial/Parallel Paths 18-Feb-20 Serial Uni-directional Serial Bi-directional Parallel 34 of 162
  • 35. Bus Technology (3/4) • Types of buses in a computer system – System Bus • Carries data from Processor to Memory – Local or I/O Bus • Carries data to/from Peripheral devices (such as storage devices) • Provides a high-speed pathway that connects directly to processor 18-Feb-20 35 of 162
  • 36. Bus Technology (4/4) • Bus Properties – Bus width (bits) • Amount of data that can be transmitted at a time • E.g. n-bit bus can transmit n-bits of data – Bus speed (MHz) • Every bus has an associated clock speed which determines how fast data can be transferred • Applications can run faster when bus speed are higher. – Throughput (Mbps) 18-Feb-20 36 of 162
  • 37. Connectivity Protocols (1/2) • Protocol – Defined format for communication that allows the sending and receiving devices to agree on what is being communicated. – Communication between hardware or software components • Different Connectivity Models Tightly Connected Entities Directly Attached Entities Network Connected Entities 18-Feb-20 37 of 162
  • 38. Connectivity Protocols (2/2) • Tightly connected entities – E.g. Central Processor to RAM, or storage buffers to controllers – Use standard Bus technology (System bus or I/O – Local Bus) • Directly attached entities – Devices connected at moderate distances – such as host to printer or host to storage (JBOD or DAS) • Network connected entities – E.g. Networked hosts, NAS or SAN 18-Feb-20 38 of 162
  • 39. Communication Protocols Host Apps Operating System PCI SCSI or IDE/ATA Device Drivers 18-Feb-20 • Protocols for local I/O bus and for connections to an internal disk system include – PCI (Peripheral Component Interconnection) – IDE/ATA (Integrated Device Electronics / Advanced Technology Attachment) – SCSI (Small Computers System Interface) 39 of 162
  • 40. Bus Technology – PCI (1/2) • PCI defines the local bus system within a computer • The specification standardizes how PCI expansion cards, such as network cards or modems, install themselves and exchange information with the CPU. • PCI includes – An interconnection between microprocessor and attached devices, in which expansion slots are spaced closely for high- speed operation – Plug and Play functionality – 32/64 bit simplex data (1992-2002) to duplex data (2019) – Throughput is 133 MB/s (1992) to 128 GB/s (2019) 18-Feb-20 40 of 162
  • 41. Bus Technology – PCI (2/2) • PCI Express is an enhanced PCI bus with increased Bandwidth 18-Feb-20 41 of 162
  • 42. Bus Technology - IDE/ATA • Most popular interface used with modern hard disks • Good performance at low cost • Desktop and laptop systems • Inexpensive storage interconnect 18-Feb-20 42 of 162
  • 43. Bus Technology - SCSI • 2nd most popular hard disk interface protocol in PCs today • Higher cost than IDE/ATA • Supports multiple simultaneous data access • Currently both parallel and serial forms • Used primarily in “higher end” environments such as with servers • Note – SCSI HBA (ref. to as controller) can be implemented as an onboard interface (or) ‘add in’ card plugged into system I/O bus. 18-Feb-20 43 of 162
  • 44. SCSI Model (1/2) Target Initiator 18-Feb-20 SCSI device that starts a communication SCSI device that services a request • If Initiator is a host, it will release communication connection and continue processing other events while target executes the command. The host will await an interrupt signal from the storage device to complete the transaction 44 of 162
  • 45. SCSI Model (2/2) Target IDInitiator ID LUNs 18-Feb-20 • Initiator ID – uniquely identifies an initiator that is used as an “originating address” • Target ID – uniquely identifies a target. Used as address for exchanging commands and status information with initiators • Logical Unit Numbers (LUNs) – identifies a specific Logical Unit in a target. Logical Units can be more than a single disk 45 of 162
  • 46. SCSI Addressing • Initiator ID – Original initiator ID number (0 to 15) – Used to send responses back to initiator from storage device • Target ID – Value for a specific storage device (0 to 15) – An address that is set on the interface of the device such as a disk, tape or CDROM • LUN – A number that reflects the actual address of the device, as seen by the target 18-Feb-20 Initiator ID Target ID LUN 46 of 162
  • 47. Disk Identifier - Addressing • If logical device name used by a host for a disk drive is cn|tn|dn – dn is usually d0 for most SCSI disks because there is only one disk attached to the target controller. In intelligent storage systems, discussed later, each target may address many LUNs 18-Feb-20 c0- Controller Initiator, HBA Peripheral Controller t0 Target LUNs d0 d1 d2 47 of 162
  • 48. SCSI – Pros & Cons • Pros: – Fast transfer speeds (up to 320 MB/s for parallel SCSI) – Reliable, durable components – Can connect many devices with a single bus, more than just HDs – SCSI host adapter cards can be put in almost any system – Fully backward compatibility 18-Feb-20 • Cons: – Configuration and setup specific to one computer – Unlike IDE, few BIOS support the standard – Overwhelming number of variations in the standard, hardware, and connectors – No common software interfaces and protocol 48 of 162
  • 49. SCSI vs. IDE/ATA 18-Feb-20 Feature IDE/ATA SCSI Expandability Low Very Good Configuration & Setup Easier Complex and Expensive Device Type Support Less Support Larger Support Cost Cheap Expensive Performance [1. Max. Interface Data transfer rate for multiple devices, 2. Device Mixing Issues & 3. Device Performance] (1) Low DR (2) Significant Performance Hit (3) Supports only one device at a time (1) High DR (2) No issues related with diff. operational speed (3) Supports multiple devices simultaneously Connectivity Internal Storage Internal and External Storage Speed (MB/s) 100/133/150 320 49 of 162
  • 50. Physical Components (Host with External Storage) • Hosts with external storage are usually large enterprise servers 18-Feb-20 Bus Disk Cable Host Port Port HBA CPU 50 of 162
  • 51. Fiber Channel • Offers high-speed interconnection used in networked storage to connect servers to shared storage devices • FC refers to hardware components & storage protocol that communicates across the channel elements • Fiber Channel components – HBAs – Hubs & Switches – Cables – Disks 18-Feb-20 Fiber Channel Storage Arrays Host Apps DBMS Mgmt Utils File System LVM Multipathing Software Device Drivers HBA HBA HBA 51 of 162
  • 52. External Storage Interfaces – A Comparison • SCSI – Limited distance – Limited device count – Usually limited to single initiator – Single-ported drives • Fiber Channel – Greater distance – High device count in SANs – Multiple initiators – Dual-ported drives • Note – SCSI can be used for internal storage in hosts. – FC is almost never used internally.18-Feb-20 52 of 162
  • 53. Fiber Channel Connectivity (1/2) • When computing environments require high speed connectivity, they use sophisticated equipment to connect hosts to storage devices • Physical connectivity components in networked storage environments include: – HBA (Host-side interface) – Host Bus Adapters connect the host to the storage devices – Optical cables – fiber optic cables to increase distance, and reduce cable bulk – Switches – used to control access to multiple attached devices – Directors – sophisticated switches with high availability components – Bridges – connections to different parts of a network 18-Feb-20 53 of 162
  • 54. Fiber Channel Connectivity (2/2) 18-Feb-20 Switches Storage Hosts 54 of 162
  • 55. Module – 3/5 Physical Disks (Storage)
  • 56. Parts of Storage Environment - Storage • Physical components of storage include – Physical devices that hold the data (i.e., disk, tape, optical drives, etc.) – Components that make the devices operate (i.e., power supplies, fans) – The enclosures that hold the equipment (e.g., racks) • Logical components of storage include – Protocols – Flow algorithms 18-Feb-20 56 of 162
  • 57. Disk Drive Components • The Components of Disk Drive include – Platters – Spindle – R/W Heads – Actuator Arm Assemble – Controller 18-Feb-20 57 of 162
  • 58. Disk Drive Components: Platters (1/2) • The Head Disk Assembly (HDA) is a sealed case which contains a series of rotating platters • Attributes of a Platter – It is a rigid, round disk coated with magnetically sensitive material – Data is stored (encoded) as 0/1 by polarizing magnetic areas, or domains, on the disk surface – Data can be R/W on both surfaces of a platter – The no. of platters on a drive is specific to the particular drive – A platter’s storage capacity varies across drives and technology • Note: The drive’s capacity is determined by the no. of platters, the amount of data which can be stored on each platter, and how efficiently data is written to the platter 18-Feb-20 58 of 162
  • 59. Disk Drive Components: Platters (2/2) 18-Feb-20 00110100111010101010 00110100111010101010 10110101011010101010 01010100111010101010 59 of 162
  • 60. Disk Drive Components: Spindle (1/2) • Connects multiple disk platters to a motor which rotates at a constant speed – The spindle rotates continuously until power is removed from the spindle motor – Many hard drive failures occur when the spindle motor fails – Disk platters spin at speeds of several thousand revolutions per minute • Note: These speeds will increase as technologies improve, though there is a physical limit to the extent to which they can improve 18-Feb-20 60 of 162
  • 61. Disk Drive Components: Spindle (2/2) 18-Feb-20 Spindle Platters 61 of 162
  • 62. Disk Drive Components: R/W Heads (1/3) • Most drives have two Read/Write heads per platter [One for each surface of the platter] • Data R/W is a magnetic process using read/write heads – Data Read – Detection of magnetic polarization on the platter surface – Data Write – Change the magnetic polarization on the platter surface • Head flying height – Height of microscopic air gap between the R/W heads and the platter 18-Feb-20 62 of 162
  • 63. Disk Drive Components: R/W Heads (2/3) • Landing Zone – Special area on the surface of the platter near the spindle were the R/W heads rests when the spindle rotation has stopped • Logic on the disk drive ensures that the heads are moved to the landing zone before they touch the surface – The landing zone is coated with a lubricant to reduce head/platter friction. • Head Crash – Occurs when the drive malfunctions and a R/W head accidentally touches platter’s surface outside of landing zone – When a head crash occurs, the magnetic coating on the platter gets scratched and damage may also occur to the R/W head – A head crash generally results in data loss 18-Feb-20 63 of 162
  • 64. Disk Drive Components: R/W Heads (3/3) 18-Feb-20 64 of 162
  • 65. Disk Drive Components: Actuator Arm Assembly (1/2) • The R/W heads for all of the platters in the drive are attached to one actuator arm assembly and move across the platter simultaneously – Note: There are two R/W heads per platter, one for each surface • It positions the R/W head at a location on the platter where data needs to be written or read 18-Feb-20 65 of 162
  • 66. Disk Drive Components: Actuator Arm Assembly (2/2) 18-Feb-20 Actuator Spindle Actuator R/W Head R/W Head 66 of 162
  • 67. Bottom View of Disk Drive HDA Controller Interface Power Connector Disk Drive Components: Controller • It is a PCB, mounted at the bottom of the disk drive • It contains a microprocessor (as well as some internal memory, circuitry, and firmware) that controls: – Power to the spindle motor and control of motor speed – Communication of the drive with the CPU on the host system – R/W by moving the actuator arm, and switching between R/W heads – Optimization of data access 18-Feb-20 67 of 162
  • 68. Physical Disk Structures: Tracks • It is a concentric ring around spindle which record data • Track density – How tightly the tracks are packed on a platter • Track numbering – Numbered from the outer edge of the platter starting at Track- 0 (zero) 18-Feb-20 Sector Track Platter 68 of 162
  • 69. Physical Disk Structures: Sectors (1/4) • Smallest individually-addressable unit of storage in a track which typically hold 512B of user data • Format operation – Done by manufacturer to writes the track and sector structure on the platter. Drive manufacturers generally advertise the formatted capacity. – Sector stores user data and other information • Other Information (Sector no., head/platter no., track no.) aids the controller in locating data on the drive • No. of sectors per track is based upon the specific drive – 1st PC hard disks typically held 17 sectors per track. Today's hard disks can have much larger no. of sectors in a single track – There can be 1000s of tracks on a platter depending on the drive size 18-Feb-20 69 of 162
  • 70. Physical Disk Structures: Sectors (2/4) • Platter Geometry – Since a platter is made up of concentric tracks, the outer tracks can hold more data than the inner ones because they are physically longer than the inner tracks – Older disk drives had the same number of sectors in the outer tracks as in the inner tracks • Data density is very low on the outer tracks. This was an inefficient use of the available space • Zone-Bit Recording serves a good alternative for efficient use of available space. 18-Feb-20 70 of 162
  • 71. Physical Disk Structures: Sectors (3/4) • Zoned-Bit Recording – Group the tracks into zones based upon their distance from the center of the disk – Each zone is assigned an appropriate number of sectors per track • A zone near the center of the platter has fewer sectors per track than a zone on the outer edge • Tracks within a given zone have the same number of sectors. • Outside tracks have more sectors than inside tracks – Zones are numbered, with the outermost zone being Zone 0. – Note • The media transfer rate drops as the zones move closer to the center of the platter, meaning that performance is better on the zones created on the outside of the drive. 18-Feb-20 71 of 162
  • 72. Physical Disk Structures: Sectors (4/4) 18-Feb-20 Platter Without Zones Sector Track Platter With Zones 72 of 162
  • 73. Physical Disk Structures: Cylinders • A cylinder is the set of identical tracks on both surfaces of each of the drive’s platters • Often the location of drive heads are referred to by cylinder number rather than by track number 18-Feb-20 Cylinder Tracks, Cylinders and Sectors 73 of 162
  • 74. Physical Disk Structures (contd.) • Physical addressing – Addresses made up of Cylinder, Head and Sector number (CHS) to refer to specific locations on the disk • Host should be aware of the geometry of each disk used • Logical Block Addressing (LBA) with CHS is a good alternative 18-Feb-20 74 of 162
  • 75. Physical Disk Structures (contd.) • Logical Block Addressing (1/3) – Traditional method for accessing peripherals on SCSI, FC, and newer ATA disks. – Simplifies addressing by a using a linear address for accessing physical blocks of data. • Host only needs to know the size of disk drive (number of blocks) – Disk controller translates/maps address from LBA to CHS – Block numbering starts at the beginning of a cylinder and continues until the end of that cylinder • Logical blocks are mapped to physical sectors on a 1:1 basis. • Each block will have its own unique address 18-Feb-20 75 of 162
  • 76. Physical Disk Structures (contd.) • Logical Block Addressing (2/3) – E.g. The true capacity of the 500 GB drive is 465.7 GB, which is in excess of 976,000,000 blocks. Each block will have its own unique address. • As in next slide, the drive shows 8 sectors per track, 8 heads, and 4 cylinders => Total of 256 blocks (8 x 8 x 4). The illustration on the right shows the block numbering, which will range from 0 to 255. 18-Feb-20 76 of 162
  • 77. Physical Disk Structures (contd.) • Logical Block Addressing (3/3) 18-Feb-20 Physical Address = CHS (Cylinder, Head and Sector number) Logical Block Address = Block # Sector Cylinder Head Block 0 Block 16 Block 32 Block 48 Block 8 (lower surface) 77 of 162
  • 78. Physical Disk Structures (contd.) • What the Host sees? – Disk Partitioning • Partitioning divides the disk into logical containers (known as volumes), each of which can be used for a particular purpose • Partitions define the disk layout and partition size impacts disk space utilization – Partitions are generally created when the hard disk is initially set up on the host • Partitions are created from groups of contiguous cylinders – A large physical drive could be partitioned into multiple Logical Volumes (LV) of smaller capacity – Several small physical drives can be concatenated together by a volume manager and presented as one logical volume. • The host file-system accesses logical volumes, with no knowledge of the physical structure 18-Feb-20 78 of 162
  • 79. Physical Disk Structures (contd.) • What the Host sees? 18-Feb-20 A One Logical VolumeMultiple Logical Volumes A B C D 79 of 162
  • 80. Disk Drive Performance (1/8) • Seek time – Time taken to position the R/W heads radially across the platter (measured in ms) – Seek time specifications • Full Stroke - Time taken to move across the entire width of the disk, from the innermost track to the outermost • Average – Time taken to move from one random track to another, – Full stroke/3 – Typical range in modern disks: 3 to 15 ms • Track-to-Track – Time taken to move between adjacent tracks – Seek time has more impact on reads of random tracks on the disk rather than on adjacent tracks 18-Feb-20 80 of 162
  • 81. Disk Drive Performance (2/8) • Seek Time (contd.) – Seek time can be improved by short-stroking the drive • Write data only to a subset (inner or outer tracks) of the available cylinders and treat the drive as though it has a lower capacity • E.g. 500 GB drive is set up to use only the first 40% of the cylinders, and is treated as a 200 GB drive 18-Feb-20 81 of 162
  • 82. Disk Drive Performance (3/8) • Rotational Speed/Latency – Actuator moves R/W head over the platter to a particular track, while the platter spins to position the particular sector in the track under the R/W head – Time taken by the platter to rotate and position the data under the R/W head (measured in ms) – It is dependent upon the rotation speed of the spindle and is – ½ the time taken for a full rotation – Rotational latency has more of impact on reads/writes of random sectors on the disk rather than on adjacent sectors – E.g.: Rotational latency value • 5.5ms for 5400 rpm drive • 2.0ms for 15000 rpm drive 18-Feb-20 82 of 162
  • 83. Disk Drive Performance (4/8) • Command Queuing – Time is wasted if commands are processed as they are received and the R/W head passes over data that will be needed one or two requests later – Drive manufacturers include logic that analyzes where data is stored on the platter relative to data access requests. Requests are then reordered to make best use of the data’s layout on the disk (physical level of the disk) – Also known as Multiple Command Reordering/Optimization, Command Queuing and Reordering, Native Command Queuing or Tagged Command Queuing – Command queuing can also be performed by the storage system that uses the disk 18-Feb-20 83 of 162
  • 84. Disk Drive Performance (5/8) • Command Queuing (contd.) 18-Feb-20 Request 1 Request 2 Request 3 Request 4 1234 Request 1 Request 2 Request 3 Request 4 1324 Without Command Queuing With Command Queuing 1 2 3 4 1 2 3 4 84 of 162
  • 85. Disk Drive Performance (6/8) • Data Transfer Rate – Data transfer during Read from the drive (Write operation?) • Disk platters -> Heads -> Drive's internal buffer -> Through the interface (HBA) to the rest of the system – Rate of data transferred (in MBps) by the drive to the HBA – Internal transfer rate • Rate of data transferred from Disk surface to the R/W heads on a single track of one surface of the disk • Few factors (E.g. Seek time) influence sustained internal DTR • Internal DTR will almost always be lower than External DTR – External transfer rate • Rate of data transferred through the interface • Generally advertised speed of interface (E.g. 133 MBps for ATA/133) • Sustained external DTR will be lower than the interface speed 18-Feb-20 85 of 162
  • 86. Disk Drive Performance (7/8) • Data Transfer Rate (contd.) 18-Feb-20 Interface BufferHBA Disk Drive Internal transfer rate measured here External transfer rate measured here 86 of 162
  • 87. Disk Drive Performance (8/8) • Drive Reliability – Measured with Mean Time Between Failure (MTBF) • Amount of time that one can anticipate a device to work before an incapacitating malfunction occurs (associated with Service Life of the drive) • It is based on averages and therefore used merely to provide estimates. MTBF is measured in hours (E.g. 750,000 hours) • It is based on an aggregate analysis of a huge number of drives • It is a statistical method developed by the US Military as a way of estimating maintenance levels required by various devices. • MTBF is tested by artificially aging the drives by subjecting them to stressful environments such as high temperatures, high humidity, fluctuating voltages, etc. 18-Feb-20 87 of 162
  • 89. Introduction (1/2) • Disk Array – Collection of Disk Drives for increased capacity, but with no added intelligence • RAID (Redundant Array of Independent Disks) – Disk array + Controller (added intelligence) 18-Feb-20 RAID Controller RAID Array Host 89 of 162
  • 90. Introduction (2/2) • RAID arrays enables you to – Increase capacity – Provide higher availability or life expectancy (in case of drive failure measured with MTBF) – Increase I/O performance (through parallel access) – Streamlined management of storage devices • Note: – Traditionally RAID (Redundant Array of Inexpensive Disks) - Data was stored on large & expensive disk drives (called SLED, or Single Large Expensive Disk) 18-Feb-20 90 of 162
  • 91. RAID Components (1/3) • Sub-enclosures or Physical arrays – Hold a fixed number of physical disks, power supply, and other supporting hardware • Logical Arrays or RAID set – Logical Association of subset/group of disks within RAID array • Several physical disks can be concatenated to make large logical volumes (e.g., for databases) • Single physical disk can be divided to create smaller areas (e.g., for logging) – OS may view it as if they were regular Disk Volumes – Simplify management of a huge number of disks 18-Feb-20 91 of 162
  • 93. RAID Components (3/3) • No. of Logical & Physical Arrays – Depends entirely on RAID level(s) & specific vendor implementation – Mostly in 1:1 ratio. However, you could have 1:N or N:1 ratios • Array management software implemented in RAID systems handles: – Management and control of disk aggregations (e.g. volume management) – Translation of I/O requests between logical & physical arrays – Error correction when disk failures occur 18-Feb-20 93 of 162
  • 94. Data Organization: Strips & Stripes (1/2) • Strips – Contiguously addressed blocks inside each disk of a RAID set • Stripes – Set of aligned strips that spans across all disks within RAID set 18-Feb-20 Stripe 1 Stripe 2 Stripe 3 Strips 94 of 162
  • 95. Data Organization: Strips & Stripes (2/2) • Strip size or Stripe depth – Describes no. of blocks in a strip & Max. amount of data R/W in a single disk of the RAID set before next disk is accessed – Data access may start from the beginning of the strip – All strips in a stripe have the same number of blocks – Decreasing strip size means that data is broken into smaller pieces when spread across the disks • Stripe size – Describes number of data blocks in a stripe – Stripe Size = Strip size x No. of data disks • Stripe width – Refers to the number of data strips in a stripe (OR) number of data disks in a stripe 18-Feb-20 95 of 162
  • 96. RAID Performance: Striping (1/2) • Striping distributes data across the disks in the array and permits use of multiple independent disks for multiple and concurrent R/W • R/W large amount of data – Write: 1st piece is sent to 1st drive, 2nd piece to 2nd drive, etc. – Read: Pieces are put back together again • Based on RAID level & vendor-specific implementation, striping can occur at block (or block multiple) or byte level • Notes on striping – Higher stripe width – higher no. of drives – better performance – Striping is transparent to the OS of host (handled by controller) 18-Feb-20 96 of 162
  • 97. RAID Performance: Striping (2/2) 18-Feb-20 Logical Array LUN (Logical Unit No.) RAID Controller Host 97 of 162
  • 98. RAID Redundancy: Mirroring (1/2) • Redundancy improves fault tolerance • Mirroring uses multiple drives that hold identical copies of the data (usually 2 drives) – Every write to a data disk is also a write to mirror disk(s), containing the same data – If a disk fails, RAID controller uses the mirror drive for data recovery & continuous operation. Data on a replaced drive is rebuilt from the mirrored drive 18-Feb-20 RAID Array Mirrored Disk RAID Controller Host 98 of 162
  • 99. RAID Redundancy: Mirroring (2/2) • Mirroring is transparent to the attached host • Benefits – Fast recovery from a failure – Improved read performance • Drawbacks – Degrades write performance because each block of host data is written to multiple disks – High cost of data protection due to the need for multiple disks 18-Feb-20 99 of 162
  • 100. RAID Redundancy: Parity (1/4) • Parity is a redundancy check mechanism that also ensures data protection • Like striping, parity is generally a function of the RAID controller and is transparent to the host • Parity can be thought as the updated sum of data on the other disks in the RAID set – Each time data is updated, the parity will be updated as well, so that it always reflects the current sum of the data on the other disks • Parity information can either be – Stored on a separate, dedicated drive – Distributed with the data across all the drives in the array 18-Feb-20 100 of 162
  • 101. RAID Redundancy: Parity (2/4) 18-Feb-20 Parity Disk 0 8 4 1 9 5 2 10 6 3 11 7 0 1 2 3 8 9 10 11 4 5 6 7 RAID Controller Host 101 of 162
  • 102. RAID Redundancy: Parity (3/4) • Parity is calculated on a per stripe basis • On disk failure – Value of its data is recalculated by using parity information and data on the surviving disks • Request for data by host from failed disk requires that data to be recalculated before it can be sent. This recalculation is time- consuming, and will decrease the performance of the RAID set – Note: Hot Spare Drives provide a way to minimize the disruption caused by a disk failure • On parity disk failure – Value of its data (parity) is recalculated by using data disks and then saved when failed disk is replaced with a new disk 18-Feb-20 102 of 162
  • 103. RAID Redundancy: Parity (4/4) 18-Feb-20 Parity Data Data Data Data 4 2 3 5 14 5 + 3 + 4 + 2 = 14 The middle drive fails: 5 + 3 + ? + 2 = 14 ? = 14 – 5 – 3 – 2 ? = 4 RAID Array 103 of 162
  • 104. RAID Levels • There are some standard RAID configuration levels, each of which has benefits in terms of performance, capacity, data protection, etc. • Commonly used levels or combinations of levels – RAID 0 – Striped Array with No Fault Tolerance – RAID 1 – Disk Mirroring – RAID 3 – Parallel Access Array with Dedicated Parity Disk – RAID 4 – Striped Array with Independent Disks and a Dedicated Parity Disk – RAID 5 – Striped Array with Independent Disks and Distributed Parity – Combinations of levels [RAID 1+0, RAID 0+1, etc.] 18-Feb-20 104 of 162
  • 105. RAID 0 - Striping (1/2) • Stripes the data across drives in array without generating redundant data • Performance: Better than JBOD because it uses Striping – Performance is further improved when data is striped across multiple controllers with only one drive per controller • Throughput: Very high when I/O sizes are small • Data Protection – No Parity or Mirroring (Hence no fault tolerance) – Extremely difficult to recover data • Applications – Those that need high bandwidth or high throughput but where data is not critical (E.g. Temporary storage or spool areas) 18-Feb-20 105 of 162
  • 106. RAID 0 - Striping (2/2) 18-Feb-20 RAID Controller Block 4 Block 4Block 3 Block 3Block 2 Block 2Block 1 Block 1Block 0 Block 0 Host 106 of 162
  • 107. RAID 1 – Mirroring & Fault Tol. (1/3) • Uses mirroring to improve fault tolerance – Every write to a data disk is also a write to the mirror disk(s) – This is transparent to the host – If a disk fails, the disk array controller uses the mirror drive for data recovery and continuous operation • A RAID 1 group consists of 2 (typically) or more disk modules • Benefits – High data availability – High Throughput or I/O rate (small block size) • Drawbacks – Total no. of disks in array equals 2 times the data (usable) disks • i.e. Overhead cost = 100%, Usable storage capacity = 50% 18-Feb-20 107 of 162
  • 108. RAID 1 – Mirroring & Fault Tol. (2/3) • Performance – Improved Read performance but degrades Write performance • Data Protection – Improved fault tolerance over RAID 0 • Cost – Expensive due to extra capacity required to duplicate data • Disks: At least two disks • Maintenance: Low complexity • Applications – Those that need High availability (E.g. Accounting, Payroll, Finance) 18-Feb-20 108 of 162
  • 109. RAID 1 – Mirroring & Fault Tol. (3/3) 18-Feb-20 RAID Controller Block 1 Block 1Block 1Block 0 Block 0Block 0 Host 109 of 162
  • 110. RAID 0+1 – Striping & Mirroring (1/3) • Combines speed of RAID 0 with redundancy of RAID 1 • RAID 0+1 is implemented as a mirrored array whose basic elements are RAID 0 stripes • Benefits – Medium data availability – High Throughput or I/O rate (small block size) – Ability to withstand multiple drive failures as long as they occur on the same stripe • Drawbacks – Total no. of disks in array equals two times the data disks, with overhead cost equaling 100% 18-Feb-20 110 of 162
  • 111. RAID 0+1 – Striping & Mirroring (2/3) • Data Protection: Medium reliability • Disks: Even no. of disks (Minimum 4 disks to allow striping) • Cost: Very expensive because of the high overhead • Performance – High I/O rates – Writes are slower than Reads because of mirroring • Applications – Imaging – General file server 18-Feb-20 111 of 162
  • 112. RAID 0+1 – Striping & Mirroring (3/3) 18-Feb-20 RAID Controller Block 3 Block 3Block 3Block 2 Block 2Block 2Block 1 Block 1Block 1Block 0 Block 0Block 0 Host 112 of 162
  • 113. RAID 1+0 – Mirroring & Striping (1/3) • RAID 1+0 (or RAID 10, RAID 1/0, or RAID A) also combines the speed of RAID 0 with the redundancy of RAID 1 • RAID 1+0 is implemented as a striped array whose individual elements are RAID 1 arrays - mirrors • Benefits (almost similar to RAID 0+1) – High data availability – High Throughput or I/O rate (small block size) – Ability to withstand multiple drive failures as long as they occur on different mirrors • Drawbacks (almost similar to RAID 0+1) – Total no. of disks in arrays equals two times the data disks, with overhead cost equaling 100% 18-Feb-20 113 of 162
  • 114. RAID 1+0 – Mirroring & Striping (2/3) • Data Protection: High reliability • Disks: Even no. of disks (Minimum 4 disks to allow striping) • Cost: Very expensive because of the high overhead • Performance – High I/O rates achieved using multiple stripe segments – Writes are slower than Reads because they are mirrored • Applications – Databases requiring high I/O rates with random data – Applications requiring maximum data availability 18-Feb-20 114 of 162
  • 115. RAID 1+0 – Mirroring & Striping (3/3) 18-Feb-20 RAID Controller Block 3 Block 3Block 3Block 2 Block 2Block 2Block 1 Block 1Block 1Block 0 Block 0Block 0 Host 115 of 162
  • 116. RAID 0+1 vs. RAID 1+0 • Benefits are identical under normal operations • Basic Element: Mirrored pair (RAID 1+0), Stripe (RAID 0+1) • At drive failure the rebuild operations are very different – In RAID 1+0 rebuild only the mirror. i.e. The disk array controller copies data from one surviving disk to the replacement disk – In RAID 0+1 rebuild entire stripe. i.e. The disk array controller copies data from each disk in the healthy stripe to equivalent disk in the failed stripe • Note 1: As the stripe has no protection (RAID 0) the entire stripe is faulted even if single drive in it fails • Note 2: This causes increased and unneeded I/O load on backend & also makes the RAID set more vulnerable to a second disk failure • RAID 0+1 is less common & a poorer solution 18-Feb-20 116 of 162
  • 117. RAID 3 (1/3) • Parallel Access Array with Dedicated Parity Disk • RAID 3 stripes data for high performance and uses parity for improved fault tolerance – Data is striped across all the disks but one in the array – Parity information is stored on a dedicated drive, so that data can be reconstructed if a drive fails • R/W data to all disks in parallel – There are no partial writes that update one out of many strips in a stripe • Benefits – Total no. of disks is less than in a mirrored solution – Good throughput/bandwidth on large data transfers 18-Feb-20 117 of 162
  • 118. RAID 3 (2/3) • Drawbacks – Poor efficiency in handling small data blocks (not well suited to transaction processing applications) – Data is lost if multiple drives fail within the same RAID 3 Group • Performance – High data R/W transfer rate. Disk failure has a significant impact on throughput. Rebuilds are slow. • Data Protection: Use of parity for improved fault tolerance • Striping: Byte level to multiple block level depending on vendor implementation • Applications – Those which need large sequential data accesses (e.g. Medical and geographic imaging) 18-Feb-20 118 of 162
  • 119. RAID 3 (3/3) 18-Feb-20 RAID Controller Block 1 Block 2 Block 3 P 0 1 2 3 Block 0Block 3Block 2Block 1Block 0 Parity Generated Host 119 of 162
  • 120. RAID 4 (1/3) • Striped with Independent Disks & a Dedicated Parity Disk • RAID Level 4 stripes data for high performance and uses parity for improved fault tolerance (same as RAID 3) – Data is striped across all the disks but one in the array – Parity information is stored on a dedicated disk so that data can be reconstructed if a drive fails. • Data disks are independently accessible, and multiple R/W can occur simultaneously • Benefits – Total no. of disks is less than in a mirrored solution – Good read throughput & reasonable write throughput 18-Feb-20 120 of 162
  • 121. RAID 4 (2/3) • Drawbacks (same as RAID 3) – Dedicated parity drive can be a bottleneck when handling small data writes (not well suited to transaction processing applications) – Data is lost if multiple drives fail within the same RAID 4 Group • Performance – High data read transfer rate. Poor to medium write transfer rate. Disk failure has a significant impact on throughput • Data Protection: Use of parity for improved fault tolerance • Striping: Usually at the block (or block multiple) level • Applications: General purpose file storage • Note: RAID 4 is much less commonly used than RAID 5 18-Feb-20 121 of 162
  • 122. RAID 4 (3/3) 18-Feb-20 RAID Controller P 0 1 2 3 Block 0Block 0 Block 0 Block 4 Block 1 Block 5 Block 2 Block 6 Block 3 Block 7 P 0 1 2 3 P 4 5 6 7 Parity Generated Block 0 P 0 1 2 3 Host 122 of 162
  • 123. RAID 5 (1/4) • Striped Array with Independent Disks and Distributed Parity • RAID 5 performs independent R/W operations • No dedicated parity drive (data and parity information is distributed across all drives in the group) • Benefits • Most versatile RAID level • A transfer rate greater than that of a single drive but with a high overall I/O rate • Good for parallel processing (multi-tasking) applications or environments • Cost savings due to the use of parity over mirroring 18-Feb-20 123 of 162
  • 124. RAID 5 (2/4) • Drawbacks • Slower transfer rate than RAID 3 • Small writes are slow, because they require a read-modify-write (RMW) operation • There is degradation in performance in recovery/reconstruction • Data loss if multiple drives within the same group fails • Performance • Good aggregate transfer rate (High read data transfer rate, medium write data transfer rate) • Low ratio of parity disks to data disks 18-Feb-20 124 of 162
  • 125. RAID 5 (3/4) • Data Protection • Single disk failure puts volume in degraded mode • Difficult to rebuild (as compared to RAID level 1) • Disks • 5-disk and 9-disk groups are popular. Most implementations allow other RAID set sizes • Striping: Block or multiple-block level • Applications • File and application servers, database servers, WWW, email, and News servers 18-Feb-20 125 of 162
  • 126. RAID 5 (4/4) 18-Feb-20 Block 0 P 0 1 2 3 Block 7 RAID Controller P 0 1 2 3 Block 0Block 4Block 0 Block 1 Block 5 Block 2 Block 6 Block 3 Parity Generated Block 0 P 0 1 2 3 Block 4 P 4 5 6 7P 4 5 6 7 Block 4 P 4 5 6 7 Block 4 Parity Generated Host 126 of 162
  • 127. RAID Implementations (1/2) • Hardware RAID – Implemented by intelligent storage systems external to the host (or Host has intelligent controllers that offload RAID management functions from the host) • Software RAID – Describes RAID that is managed by the host CPU • Disadvantage – It uses host CPU cycles that would be better utilized to process application data – Many host CPUs and OS do not perform I/O functions very efficiently, so the host is ill-suited for the task – Often looks attractive initially because it does not require the purchase of additional hardware. The initial cost savings are soon exceeded by the expense of using a costly server to perform I/O operations that it performs inefficiently at best 18-Feb-20 127 of 162
  • 128. RAID Implementations (2/2) • Hardware (usually a specialized disk controller card) – Controls all drives that are attached it – Performs all RAID-related functions including volume management – Array(s) appear to the host operating system as a regular disk drive – Dedicated cache to improve performance – Generally provides some type of administrative software • Software (Generally runs as part of OS) – Volume management and performed by the server – Provides more flexibility for hardware, which can reduce cost – Performance is dependent on CPU load & server performance – Has limited functionality 18-Feb-20 128 of 162
  • 129. Hot Spares (1/3) • Hot spare is an idle component (often a drive) in a RAID array that becomes temporary replacement of a failed component • For example: – The hot spare takes the failed drive’s identity in the array – Data recovery takes place based on the RAID implementation (whether Parity OR Mirroring was used) – The failed drive is replaced with a new drive at some time later – One of the following occurs: • The hot spare replaces the new drive permanently (A new hot spare must be configured on the system) • When the new drive is added to the system, data from the hot spare is copied to the new drive (The hot spare returns to its idle state, ready to replace the next failed drive) 18-Feb-20 129 of 162
  • 130. Hot Spares (2/3) • Note: The hot spare drive needs to be large enough to accommodate the data from the failed drive • Hot spare replacement can be Automatic or User initiated – Automatic: When a disk’s recoverable error rates exceed a predetermined threshold, the disk subsystem tries to copy data from the failing disk to a spare one. If this task completes before the damaged disk fails, the subsystem switches to the spare and marks the failing disk unusable. (If not it uses parity or the mirrored disk to recover the data, as appropriate). – User initiated: This gives the administrator control when to rebuild (e.g., rebuild overnight so as not to degrade system performance). However, the system is vulnerable to another failure because the hot spare is now unavailable. Some systems implement multiple hot spares to improve availability. 18-Feb-20 130 of 162
  • 132. Hot Swap (1/2) • Like hot spares, hot swaps enable a system to recover quickly in the event of a failure. With a hot swap the user can replace the failed hardware (such as a controller) without having to shut down the system • Note – A warm swap occurs when the system needs to be shut down, but power does not need to be removed in order to replace the failed component – A cold swap occurs when the power must be removed as well – Some systems have the ability to auto-swap without user intervention 18-Feb-20 132 of 162
  • 134. Module – 5/5 Intelligent Disk Storage Systems
  • 135. Intelligent Storage System? • A disk storage system distributes data over several devices and manages data access • vs. Individual storage devices – Increased capacity – Improved performance – Easier data management – Better data availability – More robust backup/restore capabilities – Improved flexibility and scalability • Categories of Arrays – Monolithic (Integrated) Storage Systems – Modular Storage Systems 18-Feb-20 135 of 162
  • 136. Monolithic (Integrated) Storage Systems (1/3) • Aimed at the Enterprise level, centralizing data in a powerful system with hundreds of drives • Also called: Integrated arrays or Enterprise arrays or Cache centric arrays • The system is contained within a single or interconnected frame (for expansion) and can scale to support increases in connectivity, performance, and capacity as required • Can handle large amounts of concurrent I/Os on very large data applications • Limitations – High upfront costs limiting their applicability to only the most mission critical applications – Take up a large amount of space in the data center 18-Feb-20 136 of 162
  • 137. Monolithic (Integrated) Storage Systems (3/3) 18-Feb-20 Monolithic FC Ports Port Processors Cache RAID Controllers 137 of 162
  • 138. Monolithic (Integrated) Storage Systems (3/3) • Characteristics – Large storage capacity – Large Cache to store IOs before writing to disk – Redundancy (improves data protection and availability) – More robust and fault tolerant due to many built-in features – Connect to mainframes or very powerful open systems hosts – Multiple front-end ports (connectivity to multiple servers) – Multiple back-end FC/SCSI RAID controllers (manage disk processing) – Expensive 18-Feb-20 138 of 162
  • 139. Modular Storage Systems (1/3) • Aimed at small companies/department level • Also called: Midrange or Departmental storage systems • Provide storage to a smaller number of Windows or Unix servers than the larger Integrated storage systems • Typically designed with two controllers, each of which contains host interfaces, cache, RAID processors, and disk drive interfaces. 18-Feb-20 139 of 162
  • 140. Modular Storage Systems (2/3) 18-Feb-20 Rack Servers Disk Modules Control Module with Disks FC Switches Modular Host Interface Cache RAID Controller A Host Interface Cache RAID Controller B 140 of 162
  • 141. Modular Storage Systems (3/3) • Characteristics – Smaller disk capacity – Less global cache – Limited redundancy and connectivity – Can start with a smaller number of disks and scale as needed – Performance can degrade as capacity increases – Fewer front-end ports for connection to servers – Cannot connect to mainframes – Usually have separate controllers from the disk array – Takes up less floor space and costs less 18-Feb-20 141 of 162
  • 142. Elements of Intelligent Storage Systems • Intelligent storage systems are organized into the following areas: – Front End – Cache – Back End – Physical disks 18-Feb-20 142 of 162
  • 143. Elements of Intelligent Storage Systems 18-Feb-20 Intelligent Storage System Cache Front-End Back-End Cache Physical Disks Host Connectivity • Intelligent storage systems are organized into following areas: – Front End – Cache – Back End – Physical disks 143 of 162
  • 144. Intelligent Storage System: Front-end (1/3) 18-Feb-20 Note: Include redundancy in the channels to and from the ports. Intelligent Storage System Ports Host Connectivity Controllers Front-End Back-End Cache Physical Disks 144 of 162
  • 145. Intelligent Storage System: Front-end (2/3) • Provides communication between storage system and host • Main parts – Ports & Controllers • Storage Ports – External interfaces for connectivity to host – Each port has processing logic responsible for executing appropriate transport protocol for storage connections. • E.g. SCSI, FC, or iSCSI – To maintain data availability, the front end of the storage systems generally have multiple ports. • Provides redundancy in case of a failure • Balance the load when the system is experiencing heavy use. • Mid-range storage system: ranges from 1-8 (Typically 4) • Large monolithic array: about 64 or 12818-Feb-20 145 of 162
  • 146. Intelligent Storage System: Front-end (3/3) • Controllers – Available behind the storage ports to route data to the cache via the internal data bus – Sends an acknowledgement message back to host as soon as the cache receives the data 18-Feb-20 146 of 162
  • 147. Front-End Command Queuing (1/3) 18-Feb-20 F R O N T E N D Request 1 Request 2 Request 3 Request 4 1234 F R O N T E N D Request 1 Request 2 Request 3 Request 4 1324 Without Command Queuing With Command Queuing 1 2 3 4 1 2 3 4 147 of 162
  • 148. Front-End Command Queuing (2/3) • Processes multiple concurrent commands based on disk data organization, regardless of the order in which the commands were received • Command queuing software – Reorders commands, Assigns a tag to each command to identify when it is executed (efficiently) – Some disk drives (SCSI & FC disks) are intelligent enough to manage their own command queuing • Intelligent storage systems may make use of this native disk intelligence, and may supplement it with queuing performed by the controller • Queue Depth Setting – Defines number of outstanding requests that are active at the same time in the queue – Many manufactures have configurable queue depths 18-Feb-20 148 of 162
  • 149. Front-End Command Queuing (3/3) • Common Command queuing algorithms – FIFO • Commands are executed in the order in which they arrive • Limitation: Identical to having no queuing – Inefficient – Seek Time Optimization • Faster than FIFO • Optimizing seek times only, without regard for rotational latency, will not normally produce the best results – E.g. Consider two requests on cylinders that are very close to each other, but in very different places within the track. Meanwhile, there might be a third sector that is a few cylinders further away but much closer overall to the location of the first request which could be considered – Access Time Optimization • Combines seek time optimization with an analysis of rotational latency for optimal performance 18-Feb-20 149 of 162
  • 150. Intelligent Storage System: Cache • Cache is a high speed memory – Improves system performance by isolating hosts from mechanical delays associated with physical disks (due to seek times and rotational latency) and minimizes delay (< ms) – Improves performance of R/W 18-Feb-20 Intelligent Storage System Host Connectivity Front-End Back-End Cache Physical Disks 150 of 162
  • 151. Intelligent Storage System: Back End (1/3) 18-Feb-20 Host Connectivity PortsControllers Front-End Back-End Cache Physical Disks Intelligent Storage System 151 of 162
  • 152. Intelligent Storage System: Back-End (2/3) • Data from Cache gets transferred through I/O bus to back end, where it is routed to the correct drive • Disk Controllers provides communication with disks for R/W operations – Manages data transfer between I/O bus and disks – Handles device addressing, translating logical blocks into physical locations on the disk – Provides additional and limited temporary storage for data – Provides error detection and correction – often in conjunction with similar features on the disks – Allows multiple devices to communicate to HBA on the host – Facilitates performance enhancement 18-Feb-20 152 of 162
  • 153. Intelligent Storage System: Back-End (3/3) • Disk controllers – Implemented as hardware with firmware that communicates with disks via disk interface, sending commands to initiate R/W process on disks – The design of the controller is vendor specific • Multiple Disk Controllers – Provide maximum data protection and availability (with alternative path in case of a failure) • Reliability is enhanced if the disks used are dual-ported; each disk port can connect to a separate controller. Having more than one port on each controller will provide additional protection in the event of certain types of failure – Facilitate load balancing 18-Feb-20 153 of 162
  • 154. Intelligent Storage System: Physical Disks (1/2) 18-Feb-20 Host Connectivity Front End Back End Cache Physical Disks Intelligent Storage System 154 of 162
  • 155. Intelligent Storage System: Physical Disks (2/2) • Physical disks are where the storage actually takes place • Drives are connected to controller with either SCSI (SCSI interface and copper cable) or FC or copper cables • This could be a single disk drive or a more complex RAID set – ATA drives are used when a storage system is used in environments where performance is not critical • Connection: Parallel ATA (PATA) or serial ATA (SATA) copper cables – Mixture of SCSI or FC drives and ATA drives • Higher performing drives are used for application data storage • Slower ATA drives are used for backup and archiving 18-Feb-20 155 of 162
  • 156. I/O Example: Read Requests Intelligent Storage System Host Connectivity Front End Back End Cache Physical Disks 18-Feb-20 156 of 162
  • 157. I/O Example: Write Requests Intelligent Storage System Host Connectivity Front End Back End Cache Physical Disks 18-Feb-20 157 of 162
  • 158. What the Host Sees Intelligent Storage System LUN 0 LUN 1 LUN 2 LUN 0 LUN 1 LUN 2 Host Host Back End Physical Disks Cache 18-Feb-20 158 of 162
  • 159. The Host and Logical Device Names Host Volume Manager Host /dev/rdsk/c1t1d0 /dev/rdsk/c1t1d1 .PhysicalDrive0 Volume Manager Intelligent Storage System LUN 0 LUN 1 LUN 2 LUN 0 LUN 1 LUN 2 Back End Physical Disks Cache 18-Feb-20 159 of 162
  • 160. Disk Organization in a Storage System Intelligent Storage System LUN 0 LUN 1 Host Host LUN 0 LUN 1 Back End Physical Disks Cache 18-Feb-20 160 of 162
  • 161. References • Information Storage & Management, EMC Education Services (E – Book) 18-Feb-20 161 of 162