STORAGE SYSTEM ARCHITECTURE OVERVIEW

STORAGE SYSTEM ARCHITECTURE
Instructor
Mr. S.Christalin Nelson
AP(SG)/CIT

At a Glance
• Storage System Environment
• Host Environment
• Connectivity
• Disk Storage
– Physical Disks
– RAID
• Intelligent Disk Storage
18-Feb-20 2 of 162

Storage System Environment
• Storage has evolved from single internal disks to storage
systems.
• Storage system environment (group of components) provide
storage, handle R/W data requests & data transmission
• Components of Storage system Environment
– Host: Interact with OS & applications that require data
– Connectivity: Carries R/W commands and the data between
the host and the storage devices
– Storage: Devices where the data is stored
• Storage environment has evolved along with changes in
computing models.
18-Feb-20 3 of 162

Module – 1/5
Host Environment

Host Environment
• Host?
– Computers on which the applications for I/O reside
– Laptops to Cluster of Servers
18-Feb-20
Laptop Server
Group of Servers
Mainframe
5 of 162

Physical Components of Host (1/12)
• Physical components
– CPU, Internal Memory & Disk Devices, IO Devices
– Bus: The physical components interact with one another
through a Bus
18-Feb-20
Bus
I/O Devices
CPU Storage
6 of 162

18-Feb-20
• CPU – Components
– ALU
– Control Unit
– Register
– Level-1 Cache
7 of 162

• CPU technology now mean systems typically come at least
Dual Core, Quad Core or more processors (on one single
chip) instead of the traditional one core per chip.
• The total number of Cores can slot into a socket as before
and a single heat sink and fan can keep everything to the
right temperature.
18-Feb-20 8 of 162

18-Feb-20
• Storage
– Memory Modules
• Semiconductor memory, High speed data access, Expensive
• Example: RAM, ROM
– Storage Devices
• Magnetic or Optical media, Low speed data access, Cheaper
• Example:
– Magnetic: Tape, Floppy Disk, Hard Disk
– Optical: CD, DVD, VCD
9 of 162

• Internal
– Processor registers – fastest access (usually 1 CPU cycle)
– Cache
• L0 Micro operations cache – 6 kB
• L1 Instruction cache – 131 kB
• L1 Data cache – 131 kB, 751 GB/s
• L2 Instruction & data (shared) – 1 MB, 215 GB/sec
• L3 Shared cache – 6 MB, 100 GB/s
• L4 Shared cache – 134 MB, 40 GB/s
• Main memory (Primary)
– RAM – GBs, 10 GB/s
• Online Mass Storage
– Disk storage (Secondary) – TBs, 2000 MB/s (2017)
18-Feb-20
KiB & MiB ?
1 KiB/s = 210 byte per sec
1 MiB/s = 220 byte per sec
11 of 162

• Offline Mass Storage
– Nearline storage (Tertiary) – EBs, 160 MB/s(2013)
• MAID
– Offline storage
• Floppy Disk, Optical Disk, Flash Memory, Magnetic Tape
• Online vs Nearline vs Offline storage
– Online storage is immediately available for I/O.
– Nearline storage is not immediately available, but can be made
online quickly without human intervention.
– Offline storage is not immediately available, and requires
some human intervention to bring online.
• Tiered Storage
– The lower levels of the hierarchy from disks downwards.
18-Feb-20 12 of 162

• Modern programming languages mainly assume 2 levels of
memory: main memory & disk storage, though in assembly
language and inline assemblers in languages such as C,
registers can be directly accessed.
• Taking optimal advantage of the memory hierarchy requires
the cooperation of programmers, hardware, and compilers
(as well as underlying support from OS):
– Programmers are responsible for moving data between disk
and memory through file I/O.
– Hardware is responsible for moving data between memory
and caches.
– Optimizing compilers are responsible for generating code that,
when executed, will cause the hardware to use caches and
registers efficiently.
18-Feb-20 13 of 162

• Storage Hierarchy
18-Feb-20
Speed
Slow
Fast
Cost HighLow
Tape
Optical disk
Magnetic disk
RAM
L2 cache
L1 cache
CPU registers
14 of 162

• RAM types
– SRAM (Static RAM) – Bipolar, MOSFET, BiMOS
– DRAM (Dynamic RAM) – DRAM, PSRAM, VRAM, FRAM, QLC
– SDRAM (Synchronous DRAM) – SDR, RDRAM, DDR, eDRAM,
DDR2/3/4, LPDDR2/3/4/5
– SGRAM (Synchronous Graphics RAM)
– HBM (High Bandwidth Memory)
18-Feb-20 15 of 162

• Storage
18-Feb-20
…
0
1
2
3
n
Data 0
Data n
Data 2
Data 3
Data 1
Address Content
Disk
Memory
16 of 162

• I/O Devices
– Human interface
• Keyboard, Mouse, Monitor
– Computer-computer interface
• Network Interface Card (NIC)
– Computer-peripheral interface
• USB (Universal Serial Bus) port
• Host Bus Adapter (HBA)
18-Feb-20 17 of 162

Logical Components of Host (1/10)
• Logical components
– Software applications, Protocols, OS, File systems, Database
18-Feb-20
Host
Applications
Volume Management
DBMS Mgmt Utilities
File System
Multi-pathing Software
Device Drivers
HBA HBA HBA
OS
18 of 162

• Applications
– Provide a point of interaction either between the user and the
host or another system and the host
– Most applications have storage requirements (short or long-
term depending upon the application)
• Operating system
– Controls interaction between applications & storage systems
– Monitors and responds to user actions and the environment
– Organizes and controls the hardware components
– Connects hardware components to the application program
layer and the users
– Manages system activities such as storage and communication
18-Feb-20 19 of 162

• Device drivers
– Allow the OS to be aware of and use a standard interface to
access and control a specific device (i.e., printer, speakers,
mouse, keyboard, video, storage devices, etc.)
– Provide appropriate protocols to host to allow device access
• File System (and Files)
– Provides a logical structure for data and methods for accessing
that data
• Hosts work with data stored in File System blocks
• File system converts the user logical structures into host
accessible blocks
18-Feb-20 20 of 162

• File system block
– Smallest ‘container’ allocated to a file’s data
– Each block is a contiguous area of physical disk capacity
– Block Size depend on type of files being stored and accessed
• Block size is Fixed or pre-defined by OS during storage system
configuration.
• Larger files will span multiple file system blocks (may not
necessarily be contiguous on a physical disk)
18-Feb-20 21 of 162

• In multi-user, multi-tasking environments, file systems
manage shared storage resources using
– Directories, paths and structures
• Identify file locations
– Volume Managers
• Hide the complexity of physical disk structures
– File locking capabilities
• Control access and data flow to and from file locations when
used by potentially competing users or applications
– Databases and data management components such as
• Large, shared relational databases
• Management of shared data storage
18-Feb-20 22 of 162

• Linear File Structure
– The number of files on a system can be extensive and could
quickly get out of hand
• Hierarchical Structure with Directories
– Also called as Folders in the Windows environment
– Hold files as well as other directories
– Hold information about files that they contain (Metadata)
18-Feb-20 23 of 162

• Metadata (Information or Data about the file)
– Examples: In UNIX (UFS)
• File type and permissions
• Number of links
• Owner and group IDs
• Number of bytes in the file
• Last file access & Last file modification
– Example: In Windows (NTFS)
• Time stamp and link count
• File name
• Access rights
• File data
• Index information & Volume information
18-Feb-20 24 of 162

• Journaling & Logging
– Non-Journaling File system
• At Write when System crashes, data or metadata can be lost
– Use many separate writes to update their data and metadata
– Journaling File System
• Improves data integrity, system restart time (vs. non-journaling
file systems)
• Before operations are made to the file system, they are written
to a separate area called a log or journal
– May hold all data to be written (Physical Journal)
– May hold only metadata (Logical Journal)
• Disadvantage – slower than other file systems
– Each file system update requires at least one extra write – to the log
18-Feb-20 25 of 162

• Volume Management
– Optional intermediate layer between file system and physical
disks
– Aggregates several smaller disks to form a larger virtual disk
• Virtual Disks are only visible to higher level programs and
applications
– Optimizes access to storage
– Simplifies the management of storage resources
18-Feb-20 26 of 162

• Host Bus Adapter (HBA)
– Add-on card (or) a chip on the motherboard of the host
– Ports connect the Host to the storage devices
– Has processing capability to handle some storage commands,
thereby reducing the burden on the host CPU
– Multiple HBAs
18-Feb-20 27 of 162

Improving Data Availability at Host
• Hosts can be configured to provide uninterrupted access to
critical data through
– Redundancy [Implemented using multiple HBAs]
– Multi-path software [Server resident]
• Utilizes available HBAs on the server to provide
redundant/multiple communication paths between host and
storage devices
• It provides assured uninterrupted data transfers even in the
event of a path failure and may also provide automatic load
balancing
– Clustering [Redundant host systems connected together]
• Cluster members can be configured to transparently take over
each others’ workload, with minimal or no impact to the user
• If one host in the cluster fails, its functions will be assumed by
surviving member(s).
18-Feb-20 28 of 162

File Movement to/from Storage (Example)
18-Feb-20
Teacher
Configures
/ Manages
File System Files
Mapped by
File System to
Course File(s)
Reside in
File System Blocks
Disk Physical
Extents
Consisting
of
LVM Logical
Extents
Residing inMapped by
LVM to
Disk Sectors
Managed by
Disk Storage
Subsystem
File system blocks are mapped to Disk
Sectors by OS in the absence of LVM
29 of 162

Parts of Storage Environment
• Based on Connectivity
– Between hosts (or) between a host and peripheral (storage)
devices
– Physical components of Connectivity include
• Bus, Port, Cables (Uses Optical and Copper media)
• Connectors and plugs
• Adapters
– Host Bus Adapter (HBA) – enables devices to connect to a host’s
internal bus system
• NIC – enables simple network attachments to a host
• Switches/hubs - Manage traffic within a network
– Logical components of Connectivity
• Communication protocols, Device Drivers
18-Feb-20 31 of 162

Physical Components – Host with Internal
Storage
18-Feb-20 32 of 162

Bus Technology (1/4)
• Bus?
– Collection of paths that facilitate data transmission from one
part of the computer to another
– Physical components communicate across a bus by sending
packages (packets) of data between the devices in Serial or
Parallel Paths.
• Serial communication: Bits travel one behind the other
• Parallel communication: Bits can move along redundant paths
simultaneously.
18-Feb-20 33 of 162

• Serial/Parallel Paths
18-Feb-20
Serial Uni-directional
Serial Bi-directional
Parallel
34 of 162

• Types of buses in a computer system
– System Bus
• Carries data from Processor to Memory
– Local or I/O Bus
• Carries data to/from Peripheral devices (such as storage devices)
• Provides a high-speed pathway that connects directly to
processor
18-Feb-20 35 of 162

• Bus Properties
– Bus width (bits)
• Amount of data that can be transmitted at a time
• E.g. n-bit bus can transmit n-bits of data
– Bus speed (MHz)
• Every bus has an associated clock speed which determines how
fast data can be transferred
• Applications can run faster when bus speed are higher.
– Throughput (Mbps)
18-Feb-20 36 of 162

Connectivity Protocols (1/2)
• Protocol
– Defined format for communication that allows the sending and
receiving devices to agree on what is being communicated.
– Communication between hardware or software components
• Different Connectivity Models
Tightly
Connected
Entities
Directly
Attached
Entities
Network
Connected
Entities
18-Feb-20 37 of 162

Connectivity Protocols (2/2)
• Tightly connected entities
– E.g. Central Processor to RAM, or storage buffers to controllers
– Use standard Bus technology (System bus or I/O – Local Bus)
• Directly attached entities
– Devices connected at moderate distances – such as host to
printer or host to storage (JBOD or DAS)
• Network connected entities
– E.g. Networked hosts, NAS or SAN
18-Feb-20 38 of 162

Communication Protocols
Host
Apps
Operating System
PCI
SCSI or IDE/ATA Device Drivers
18-Feb-20
• Protocols for local I/O bus and for connections to an internal
disk system include
– PCI (Peripheral Component Interconnection)
– IDE/ATA (Integrated Device Electronics / Advanced Technology
Attachment)
– SCSI (Small Computers System Interface)
39 of 162

Bus Technology – PCI (1/2)
• PCI defines the local bus system within a computer
• The specification standardizes how PCI expansion cards, such
as network cards or modems, install themselves and
exchange information with the CPU.
• PCI includes
– An interconnection between microprocessor and attached
devices, in which expansion slots are spaced closely for high-
speed operation
– Plug and Play functionality
– 32/64 bit simplex data (1992-2002) to duplex data (2019)
– Throughput is 133 MB/s (1992) to 128 GB/s (2019)
18-Feb-20 40 of 162

Bus Technology – PCI (2/2)
• PCI Express is an enhanced PCI bus with increased
Bandwidth
18-Feb-20 41 of 162

Bus Technology - IDE/ATA
• Most popular interface used with modern hard disks
• Good performance at low cost
• Desktop and laptop systems
• Inexpensive storage interconnect
18-Feb-20 42 of 162

Bus Technology - SCSI
• 2nd most popular hard disk interface protocol in PCs today
• Higher cost than IDE/ATA
• Supports multiple simultaneous data access
• Currently both parallel and serial forms
• Used primarily in “higher end” environments such as with
servers
• Note
– SCSI HBA (ref. to as controller) can be implemented as an
onboard interface (or) ‘add in’ card plugged into system I/O
bus.
18-Feb-20 43 of 162

SCSI Model (1/2)
Target
Initiator
18-Feb-20
SCSI device that
starts a
communication
SCSI device
that services
a request
• If Initiator is a host, it will release communication connection
and continue processing other events while target executes
the command. The host will await an interrupt signal from
the storage device to complete the transaction
44 of 162

SCSI Model (2/2)
Target
IDInitiator
ID
LUNs
18-Feb-20
• Initiator ID – uniquely identifies an initiator that is used as
an “originating address”
• Target ID – uniquely identifies a target. Used as address for
exchanging commands and status information with initiators
• Logical Unit Numbers (LUNs) – identifies a specific Logical
Unit in a target. Logical Units can be more than a single disk
45 of 162

SCSI Addressing
• Initiator ID
– Original initiator ID number (0 to 15)
– Used to send responses back to initiator from storage device
• Target ID
– Value for a specific storage device (0 to 15)
– An address that is set on the interface of the device such as a
disk, tape or CDROM
• LUN
– A number that reflects the actual address of the device, as
seen by the target
18-Feb-20
Initiator ID Target ID LUN
46 of 162

Disk Identifier - Addressing
• If logical device name used by a host for a disk drive is
cn|tn|dn
– dn is usually d0 for most SCSI disks because there is only one
disk attached to the target controller. In intelligent storage
systems, discussed later, each target may address many LUNs
18-Feb-20
c0- Controller
Initiator, HBA
Peripheral
Controller
t0
Target
LUNs
d0 d1 d2
47 of 162

SCSI – Pros & Cons
• Pros:
– Fast transfer speeds (up to
320 MB/s for parallel SCSI)
– Reliable, durable
components
– Can connect many devices
with a single bus, more
than just HDs
– SCSI host adapter cards can
be put in almost any system
– Fully backward
compatibility
18-Feb-20
• Cons:
– Configuration and setup
specific to one computer
– Unlike IDE, few BIOS
support the standard
– Overwhelming number of
variations in the standard,
hardware, and connectors
– No common software
interfaces and protocol
48 of 162

SCSI vs. IDE/ATA
18-Feb-20
Feature IDE/ATA SCSI
Expandability Low Very Good
Configuration & Setup Easier Complex and Expensive
Device Type Support Less Support Larger Support
Cost Cheap Expensive
Performance [1. Max.
Interface Data transfer
rate for multiple devices,
2. Device Mixing Issues &
3. Device Performance]
(1) Low DR
(2) Significant
Performance Hit
(3) Supports only one
device at a time
(1) High DR
(2) No issues related with
diff. operational speed
(3) Supports multiple
devices simultaneously
Connectivity Internal Storage Internal and External Storage
Speed (MB/s) 100/133/150 320
49 of 162

Physical Components (Host with External Storage)
• Hosts with external storage are usually large enterprise servers
18-Feb-20
Bus
Disk
Cable
Host Port
Port
HBA
CPU
50 of 162

Fiber Channel
• Offers high-speed interconnection used
in networked storage to connect
servers to shared storage devices
• FC refers to hardware components &
storage protocol that communicates
across the channel elements
• Fiber Channel components
– HBAs
– Hubs & Switches
– Cables
– Disks
18-Feb-20
Fiber Channel
Storage Arrays
Host
Apps
DBMS Mgmt Utils
File System
LVM
Multipathing Software
Device Drivers
HBA HBA HBA
51 of 162

External Storage Interfaces – A Comparison
• SCSI
– Limited distance
– Limited device count
– Usually limited to single initiator
– Single-ported drives
• Fiber Channel
– Greater distance
– High device count in SANs
– Multiple initiators
– Dual-ported drives
• Note
– SCSI can be used for internal storage in hosts.
– FC is almost never used internally.18-Feb-20 52 of 162

Fiber Channel Connectivity (1/2)
• When computing environments require high speed
connectivity, they use sophisticated equipment to connect
hosts to storage devices
• Physical connectivity components in networked storage
environments include:
– HBA (Host-side interface) – Host Bus Adapters connect the
host to the storage devices
– Optical cables – fiber optic cables to increase distance, and
reduce cable bulk
– Switches – used to control access to multiple attached devices
– Directors – sophisticated switches with high availability
components
– Bridges – connections to different parts of a network
18-Feb-20 53 of 162

Fiber Channel Connectivity (2/2)
18-Feb-20
Switches
Storage
Hosts
54 of 162

Module – 3/5
Physical Disks (Storage)

Parts of Storage Environment - Storage
• Physical components of storage include
– Physical devices that hold the data (i.e., disk, tape, optical
drives, etc.)
– Components that make the devices operate (i.e., power
supplies, fans)
– The enclosures that hold the equipment (e.g., racks)
• Logical components of storage include
– Protocols
– Flow algorithms
18-Feb-20 56 of 162

Disk Drive Components
• The Components of Disk Drive include
– Platters
– Spindle
– R/W Heads
– Actuator Arm Assemble
– Controller
18-Feb-20 57 of 162

Disk Drive Components: Platters (1/2)
• The Head Disk Assembly (HDA) is a sealed case which
contains a series of rotating platters
• Attributes of a Platter
– It is a rigid, round disk coated with magnetically sensitive
material
– Data is stored (encoded) as 0/1 by polarizing magnetic areas,
or domains, on the disk surface
– Data can be R/W on both surfaces of a platter
– The no. of platters on a drive is specific to the particular drive
– A platter’s storage capacity varies across drives and technology
• Note: The drive’s capacity is determined by the no. of platters,
the amount of data which can be stored on each platter, and how
efficiently data is written to the platter
18-Feb-20 58 of 162

Disk Drive Components: Platters (2/2)
18-Feb-20
00110100111010101010
00110100111010101010
10110101011010101010
01010100111010101010
59 of 162

Disk Drive Components: Spindle (1/2)
• Connects multiple disk platters to a motor which rotates at a
constant speed
– The spindle rotates continuously until power is removed from
the spindle motor
– Many hard drive failures occur when the spindle motor fails
– Disk platters spin at speeds of several thousand revolutions
per minute
• Note: These speeds will increase as technologies improve,
though there is a physical limit to the extent to which they can
improve
18-Feb-20 60 of 162

Disk Drive Components: Spindle (2/2)
18-Feb-20
Spindle
Platters
61 of 162

Disk Drive Components: R/W Heads (1/3)
• Most drives have two Read/Write heads per platter [One for
each surface of the platter]
• Data R/W is a magnetic process using read/write heads
– Data Read – Detection of magnetic polarization on the platter
surface
– Data Write – Change the magnetic polarization on the platter
surface
• Head flying height
– Height of microscopic air gap between the R/W heads and the
platter
18-Feb-20 62 of 162

• Landing Zone
– Special area on the surface of the platter near the spindle were
the R/W heads rests when the spindle rotation has stopped
• Logic on the disk drive ensures that the heads are moved to the
landing zone before they touch the surface
– The landing zone is coated with a lubricant to reduce
head/platter friction.
• Head Crash
– Occurs when the drive malfunctions and a R/W head
accidentally touches platter’s surface outside of landing zone
– When a head crash occurs, the magnetic coating on the platter
gets scratched and damage may also occur to the R/W head
– A head crash generally results in data loss
18-Feb-20 63 of 162

18-Feb-20 64 of 162

Disk Drive Components: Actuator Arm Assembly (1/2)
• The R/W heads for all of the platters in the drive are
attached to one actuator arm assembly and move across the
platter simultaneously
– Note: There are two R/W heads per platter, one for each
surface
• It positions the R/W head at a location on the platter where
data needs to be written or read
18-Feb-20 65 of 162

Disk Drive Components: Actuator Arm Assembly (2/2)
18-Feb-20
Actuator
Spindle
Actuator
R/W Head
R/W Head
66 of 162

Bottom View of Disk Drive
HDA
Controller
Interface
Power
Connector
Disk Drive Components: Controller
• It is a PCB, mounted at the bottom of the disk drive
• It contains a microprocessor (as well as some internal
memory, circuitry, and firmware) that controls:
– Power to the spindle motor and control of motor speed
– Communication of the drive with the CPU on the host system
– R/W by moving the actuator arm, and switching between R/W
heads
– Optimization of data access
18-Feb-20 67 of 162

Physical Disk Structures: Tracks
• It is a concentric ring around spindle which record data
• Track density
– How tightly the tracks are packed on a platter
• Track numbering
– Numbered from the outer edge of the platter starting at Track-
0 (zero)
18-Feb-20
Sector
Track
Platter
68 of 162

Physical Disk Structures: Sectors (1/4)
• Smallest individually-addressable unit of storage in a track
which typically hold 512B of user data
• Format operation
– Done by manufacturer to writes the track and sector structure
on the platter. Drive manufacturers generally advertise the
formatted capacity.
– Sector stores user data and other information
• Other Information (Sector no., head/platter no., track no.) aids
the controller in locating data on the drive
• No. of sectors per track is based upon the specific drive
– 1st PC hard disks typically held 17 sectors per track. Today's
hard disks can have much larger no. of sectors in a single track
– There can be 1000s of tracks on a platter depending on the
drive size
18-Feb-20 69 of 162

• Platter Geometry
– Since a platter is made up of concentric tracks, the outer tracks
can hold more data than the inner ones because they are
physically longer than the inner tracks
– Older disk drives had the same number of sectors in the outer
tracks as in the inner tracks
• Data density is very low on the outer tracks. This was an
inefficient use of the available space
• Zone-Bit Recording serves a good alternative for efficient use of
available space.
18-Feb-20 70 of 162

• Zoned-Bit Recording
– Group the tracks into zones based upon their distance from
the center of the disk
– Each zone is assigned an appropriate number of sectors per
track
• A zone near the center of the platter has fewer sectors per track
than a zone on the outer edge
• Tracks within a given zone have the same number of sectors.
• Outside tracks have more sectors than inside tracks
– Zones are numbered, with the outermost zone being Zone 0.
– Note
• The media transfer rate drops as the zones move closer to the
center of the platter, meaning that performance is better on the
zones created on the outside of the drive.
18-Feb-20 71 of 162

18-Feb-20
Platter Without Zones
Sector
Track
Platter With Zones
72 of 162

Physical Disk Structures: Cylinders
• A cylinder is the set of identical tracks
on both surfaces of each of the drive’s
platters
• Often the location of drive heads are
referred to by cylinder number rather
than by track number
18-Feb-20
Cylinder
Tracks, Cylinders and Sectors
73 of 162

Physical Disk Structures (contd.)
• Physical addressing
– Addresses made up of Cylinder, Head and Sector number (CHS)
to refer to specific locations on the disk
• Host should be aware of the geometry of each disk used
• Logical Block Addressing (LBA) with CHS is a good alternative
18-Feb-20 74 of 162

• Logical Block Addressing (1/3)
– Traditional method for accessing peripherals on SCSI, FC, and
newer ATA disks.
– Simplifies addressing by a using a linear address for accessing
physical blocks of data.
• Host only needs to know the size of disk drive (number of blocks)
– Disk controller translates/maps address from LBA to CHS
– Block numbering starts at the beginning of a cylinder and
continues until the end of that cylinder
• Logical blocks are mapped to physical sectors on a 1:1 basis.
• Each block will have its own unique address
18-Feb-20 75 of 162

– E.g. The true capacity of the 500 GB drive is 465.7 GB, which is
in excess of 976,000,000 blocks. Each block will have its own
unique address.
• As in next slide, the drive shows 8 sectors per track, 8 heads, and
4 cylinders => Total of 256 blocks (8 x 8 x 4). The illustration on
the right shows the block numbering, which will range from 0 to
255.
18-Feb-20 76 of 162

18-Feb-20
Physical Address = CHS
(Cylinder, Head and Sector number)
Logical Block Address = Block #
Sector
Cylinder
Head
Block 0
Block 16
Block 32
Block 48
Block 8
(lower surface)
77 of 162

• What the Host sees?
– Disk Partitioning
• Partitioning divides the disk into logical containers (known as
volumes), each of which can be used for a particular purpose
• Partitions define the disk layout and partition size impacts disk
space utilization
– Partitions are generally created when the hard disk is initially set up
on the host
• Partitions are created from groups of contiguous cylinders
– A large physical drive could be partitioned into multiple Logical
Volumes (LV) of smaller capacity
– Several small physical drives can be concatenated together by a
volume manager and presented as one logical volume.
• The host file-system accesses logical volumes, with no knowledge
of the physical structure
18-Feb-20 78 of 162

• What the Host sees?
18-Feb-20
A
One Logical VolumeMultiple Logical Volumes
A
B
C
D
79 of 162

Disk Drive Performance (1/8)
• Seek time
– Time taken to position the R/W heads radially across the platter
(measured in ms)
– Seek time specifications
• Full Stroke - Time taken to move across the entire width of the disk,
from the innermost track to the outermost
• Average – Time taken to move from one random track to another,
– Full stroke/3
– Typical range in modern disks: 3 to 15 ms
• Track-to-Track – Time taken to move between adjacent tracks
– Seek time has more impact on reads of random tracks on the
disk rather than on adjacent tracks
18-Feb-20 80 of 162

• Seek Time (contd.)
– Seek time can be improved by short-stroking the drive
• Write data only to a subset (inner or outer tracks) of the available
cylinders and treat the drive as though it has a lower capacity
• E.g. 500 GB drive is set up to use only the first 40% of the
cylinders, and is treated as a 200 GB drive
18-Feb-20 81 of 162

• Rotational Speed/Latency
– Actuator moves R/W head over the platter to a particular track,
while the platter spins to position the particular sector in the
track under the R/W head
– Time taken by the platter to rotate and position the data under
the R/W head (measured in ms)
– It is dependent upon the rotation speed of the spindle and is
– ½ the time taken for a full rotation
– Rotational latency has more of impact on reads/writes of
random sectors on the disk rather than on adjacent sectors
– E.g.: Rotational latency value
• 5.5ms for 5400 rpm drive
• 2.0ms for 15000 rpm drive
18-Feb-20 82 of 162

• Command Queuing
– Time is wasted if commands are processed as they are
received and the R/W head passes over data that will be
needed one or two requests later
– Drive manufacturers include logic that analyzes where data is
stored on the platter relative to data access requests. Requests
are then reordered to make best use of the data’s layout on
the disk (physical level of the disk)
– Also known as Multiple Command Reordering/Optimization,
Command Queuing and Reordering, Native Command Queuing
or Tagged Command Queuing
– Command queuing can also be performed by the storage
system that uses the disk
18-Feb-20 83 of 162

• Command Queuing (contd.)
18-Feb-20
Request 1
Request 2
Request 3
Request 4
1234
Request 1
Request 2
Request 3
Request 4
1324
Without Command Queuing
With Command Queuing
1
2
3
4
1
2
3
4
84 of 162

• Data Transfer Rate
– Data transfer during Read from the drive (Write operation?)
• Disk platters -> Heads -> Drive's internal buffer -> Through the
interface (HBA) to the rest of the system
– Rate of data transferred (in MBps) by the drive to the HBA
– Internal transfer rate
• Rate of data transferred from Disk surface to the R/W heads on a
single track of one surface of the disk
• Few factors (E.g. Seek time) influence sustained internal DTR
• Internal DTR will almost always be lower than External DTR
– External transfer rate
• Rate of data transferred through the interface
• Generally advertised speed of interface (E.g. 133 MBps for ATA/133)
• Sustained external DTR will be lower than the interface speed
18-Feb-20 85 of 162

• Data Transfer Rate (contd.)
18-Feb-20
Interface BufferHBA
Disk Drive
Internal transfer rate
measured here
External transfer rate
measured here
86 of 162

• Drive Reliability
– Measured with Mean Time Between Failure (MTBF)
• Amount of time that one can anticipate a device to work before
an incapacitating malfunction occurs (associated with Service Life
of the drive)
• It is based on averages and therefore used merely to provide
estimates. MTBF is measured in hours (E.g. 750,000 hours)
• It is based on an aggregate analysis of a huge number of drives
• It is a statistical method developed by the US Military as a way of
estimating maintenance levels required by various devices.
• MTBF is tested by artificially aging the drives by subjecting them
to stressful environments such as high temperatures, high
humidity, fluctuating voltages, etc.
18-Feb-20 87 of 162

Introduction (1/2)
• Disk Array
– Collection of Disk Drives for increased capacity, but with no
added intelligence
• RAID (Redundant Array of Independent Disks)
– Disk array + Controller (added intelligence)
18-Feb-20
RAID
Controller
RAID Array
Host
89 of 162

Introduction (2/2)
• RAID arrays enables you to
– Increase capacity
– Provide higher availability or life expectancy (in case of drive
failure measured with MTBF)
– Increase I/O performance (through parallel access)
– Streamlined management of storage devices
• Note:
– Traditionally RAID (Redundant Array of Inexpensive Disks) -
Data was stored on large & expensive disk drives (called SLED,
or Single Large Expensive Disk)
18-Feb-20 90 of 162

RAID Components (1/3)
• Sub-enclosures or Physical arrays
– Hold a fixed number of physical disks, power supply, and other
supporting hardware
• Logical Arrays or RAID set
– Logical Association of subset/group of disks within RAID array
• Several physical disks can be concatenated to make large logical
volumes (e.g., for databases)
• Single physical disk can be divided to create smaller areas (e.g.,
for logging)
– OS may view it as if they were regular Disk Volumes
– Simplify management of a huge number of disks
18-Feb-20 91 of 162

18-Feb-20
RAID
Controller
Logical
Array
Logical
Array
Physical
Array
RAID Array
Host
92 of 162

• No. of Logical & Physical Arrays
– Depends entirely on RAID level(s) & specific vendor
implementation
– Mostly in 1:1 ratio. However, you could have 1:N or N:1 ratios
• Array management software implemented in RAID systems
handles:
– Management and control of disk aggregations (e.g. volume
management)
– Translation of I/O requests between logical & physical arrays
– Error correction when disk failures occur
18-Feb-20 93 of 162

Data Organization: Strips & Stripes (1/2)
• Strips
– Contiguously addressed blocks inside each disk of a RAID set
• Stripes
– Set of aligned strips that spans across all disks within RAID set
18-Feb-20
Stripe 1
Stripe 2
Stripe 3
Strips
94 of 162

Data Organization: Strips & Stripes (2/2)
• Strip size or Stripe depth
– Describes no. of blocks in a strip & Max. amount of data R/W
in a single disk of the RAID set before next disk is accessed
– Data access may start from the beginning of the strip
– All strips in a stripe have the same number of blocks
– Decreasing strip size means that data is broken into smaller
pieces when spread across the disks
• Stripe size
– Describes number of data blocks in a stripe
– Stripe Size = Strip size x No. of data disks
• Stripe width
– Refers to the number of data strips in a stripe (OR) number of
data disks in a stripe
18-Feb-20 95 of 162

RAID Performance: Striping (1/2)
• Striping distributes data across the disks in the array and
permits use of multiple independent disks for multiple and
concurrent R/W
• R/W large amount of data
– Write: 1st piece is sent to 1st drive, 2nd piece to 2nd drive, etc.
– Read: Pieces are put back together again
• Based on RAID level & vendor-specific implementation,
striping can occur at block (or block multiple) or byte level
• Notes on striping
– Higher stripe width – higher no. of drives – better performance
– Striping is transparent to the OS of host (handled by controller)
18-Feb-20 96 of 162

RAID Performance: Striping (2/2)
18-Feb-20
Logical Array
LUN
(Logical Unit No.)
RAID
Controller
Host
97 of 162

RAID Redundancy: Mirroring (1/2)
• Redundancy improves fault tolerance
• Mirroring uses multiple drives that hold identical copies of
the data (usually 2 drives)
– Every write to a data disk is also a write to mirror disk(s),
containing the same data
– If a disk fails, RAID controller uses the mirror drive for data
recovery & continuous operation. Data on a replaced drive is
rebuilt from the mirrored drive
18-Feb-20 RAID Array
Mirrored
Disk
RAID
Controller
Host
98 of 162

RAID Redundancy: Mirroring (2/2)
• Mirroring is transparent to the attached host
• Benefits
– Fast recovery from a failure
– Improved read performance
• Drawbacks
– Degrades write performance because each block of host data
is written to multiple disks
– High cost of data protection due to the need for multiple disks
18-Feb-20 99 of 162

RAID Redundancy: Parity (1/4)
• Parity is a redundancy check mechanism that also ensures
data protection
• Like striping, parity is generally a function of the RAID
controller and is transparent to the host
• Parity can be thought as the updated sum of data on the
other disks in the RAID set
– Each time data is updated, the parity will be updated as well,
so that it always reflects the current sum of the data on the
other disks
• Parity information can either be
– Stored on a separate, dedicated drive
– Distributed with the data across all the drives in the array
18-Feb-20 100 of 162

18-Feb-20
Parity Disk
0
8
4
1
9
5
2
10
6
3
11
7
0 1 2 3
8 9 10 11
4 5 6 7
RAID
Controller
Host
101 of 162

• Parity is calculated on a per stripe basis
• On disk failure
– Value of its data is recalculated by using parity information and
data on the surviving disks
• Request for data by host from failed disk requires that data to be
recalculated before it can be sent. This recalculation is time-
consuming, and will decrease the performance of the RAID set
– Note: Hot Spare Drives provide a way to minimize the disruption
caused by a disk failure
• On parity disk failure
– Value of its data (parity) is recalculated by using data disks and
then saved when failed disk is replaced with a new disk
18-Feb-20 102 of 162

18-Feb-20
Parity
Data
Data
Data
Data
4
2
3
5
14
5 + 3 + 4 + 2 = 14
The middle drive fails:
5 + 3 + ? + 2 = 14
? = 14 – 5 – 3 – 2
? = 4
RAID Array
103 of 162

RAID Levels
• There are some standard RAID configuration levels, each of
which has benefits in terms of performance, capacity, data
protection, etc.
• Commonly used levels or combinations of levels
– RAID 0 – Striped Array with No Fault Tolerance
– RAID 1 – Disk Mirroring
– RAID 3 – Parallel Access Array with Dedicated Parity Disk
– RAID 4 – Striped Array with Independent Disks and a
Dedicated Parity Disk
– RAID 5 – Striped Array with Independent Disks and Distributed
Parity
– Combinations of levels [RAID 1+0, RAID 0+1, etc.]
18-Feb-20 104 of 162

RAID 0 - Striping (1/2)
• Stripes the data across drives in array without generating
redundant data
• Performance: Better than JBOD because it uses Striping
– Performance is further improved when data is striped across
multiple controllers with only one drive per controller
• Throughput: Very high when I/O sizes are small
• Data Protection
– No Parity or Mirroring (Hence no fault tolerance)
– Extremely difficult to recover data
• Applications
– Those that need high bandwidth or high throughput but where
data is not critical (E.g. Temporary storage or spool areas)
18-Feb-20 105 of 162

RAID 0 - Striping (2/2)
18-Feb-20
RAID
Controller
Block 4 Block 4Block 3 Block 3Block 2 Block 2Block 1 Block 1Block 0 Block 0
Host
106 of 162

RAID 1 – Mirroring & Fault Tol. (1/3)
• Uses mirroring to improve fault tolerance
– Every write to a data disk is also a write to the mirror disk(s)
– This is transparent to the host
– If a disk fails, the disk array controller uses the mirror drive for
data recovery and continuous operation
• A RAID 1 group consists of 2 (typically) or more disk modules
• Benefits
– High data availability
– High Throughput or I/O rate (small block size)
• Drawbacks
– Total no. of disks in array equals 2 times the data (usable) disks
• i.e. Overhead cost = 100%, Usable storage capacity = 50%
18-Feb-20 107 of 162

• Performance
– Improved Read performance but degrades Write performance
• Data Protection
– Improved fault tolerance over RAID 0
• Cost
– Expensive due to extra capacity required to duplicate data
• Disks: At least two disks
• Maintenance: Low complexity
• Applications
– Those that need High availability (E.g. Accounting, Payroll,
Finance)
18-Feb-20 108 of 162

18-Feb-20
RAID
Controller
Block 1 Block 1Block 1Block 0 Block 0Block 0
Host
109 of 162

RAID 0+1 – Striping & Mirroring (1/3)
• Combines speed of RAID 0 with redundancy of RAID 1
• RAID 0+1 is implemented as a mirrored array whose basic
elements are RAID 0 stripes
• Benefits
– Medium data availability
– Ability to withstand multiple drive failures as long as they
occur on the same stripe
• Drawbacks
– Total no. of disks in array equals two times the data disks, with
overhead cost equaling 100%
18-Feb-20 110 of 162

• Data Protection: Medium reliability
• Disks: Even no. of disks (Minimum 4 disks to allow striping)
• Cost: Very expensive because of the high overhead
• Performance
– High I/O rates
– Writes are slower than Reads because of mirroring
• Applications
– Imaging
– General file server
18-Feb-20 111 of 162

18-Feb-20
RAID
Controller
Block 3 Block 3Block 3Block 2 Block 2Block 2Block 1 Block 1Block 1Block 0 Block 0Block 0
Host
112 of 162

RAID 1+0 – Mirroring & Striping (1/3)
• RAID 1+0 (or RAID 10, RAID 1/0, or RAID A) also combines
the speed of RAID 0 with the redundancy of RAID 1
• RAID 1+0 is implemented as a striped array whose individual
elements are RAID 1 arrays - mirrors
• Benefits (almost similar to RAID 0+1)
– High data availability
– Ability to withstand multiple drive failures as long as they
occur on different mirrors
• Drawbacks (almost similar to RAID 0+1)
– Total no. of disks in arrays equals two times the data disks,
with overhead cost equaling 100%
18-Feb-20 113 of 162

• Data Protection: High reliability
• Disks: Even no. of disks (Minimum 4 disks to allow striping)
• Cost: Very expensive because of the high overhead
• Performance
– High I/O rates achieved using multiple stripe segments
– Writes are slower than Reads because they are mirrored
• Applications
– Databases requiring high I/O rates with random data
– Applications requiring maximum data availability
18-Feb-20 114 of 162

18-Feb-20
RAID
Controller
Block 3 Block 3Block 3Block 2 Block 2Block 2Block 1 Block 1Block 1Block 0 Block 0Block 0
Host
115 of 162

RAID 0+1 vs. RAID 1+0
• Benefits are identical under normal operations
• Basic Element: Mirrored pair (RAID 1+0), Stripe (RAID 0+1)
• At drive failure the rebuild operations are very different
– In RAID 1+0 rebuild only the mirror. i.e. The disk array controller
copies data from one surviving disk to the replacement disk
– In RAID 0+1 rebuild entire stripe. i.e. The disk array controller
copies data from each disk in the healthy stripe to equivalent
disk in the failed stripe
• Note 1: As the stripe has no protection (RAID 0) the entire stripe is
faulted even if single drive in it fails
• Note 2: This causes increased and unneeded I/O load on backend &
also makes the RAID set more vulnerable to a second disk failure
• RAID 0+1 is less common & a poorer solution
18-Feb-20 116 of 162

RAID 3 (1/3)
• Parallel Access Array with Dedicated Parity Disk
• RAID 3 stripes data for high performance and uses parity for
improved fault tolerance
– Data is striped across all the disks but one in the array
– Parity information is stored on a dedicated drive, so that data
can be reconstructed if a drive fails
• R/W data to all disks in parallel
– There are no partial writes that update one out of many strips
in a stripe
• Benefits
– Total no. of disks is less than in a mirrored solution
– Good throughput/bandwidth on large data transfers
18-Feb-20 117 of 162

RAID 3 (2/3)
• Drawbacks
– Poor efficiency in handling small data blocks (not well suited to
transaction processing applications)
– Data is lost if multiple drives fail within the same RAID 3 Group
• Performance
– High data R/W transfer rate. Disk failure has a significant
impact on throughput. Rebuilds are slow.
• Data Protection: Use of parity for improved fault tolerance
• Striping: Byte level to multiple block level depending on
vendor implementation
• Applications
– Those which need large sequential data accesses (e.g. Medical
and geographic imaging)
18-Feb-20 118 of 162

RAID 3 (3/3)
18-Feb-20
RAID
Controller
Block 1
Block 2
Block 3
P 0 1 2 3
Block 0Block 3Block 2Block 1Block 0
Parity
Generated
Host
119 of 162

RAID 4 (1/3)
• Striped with Independent Disks & a Dedicated Parity Disk
• RAID Level 4 stripes data for high performance and uses
parity for improved fault tolerance (same as RAID 3)
– Data is striped across all the disks but one in the array
– Parity information is stored on a dedicated disk so that data
can be reconstructed if a drive fails.
• Data disks are independently accessible, and multiple R/W
can occur simultaneously
• Benefits
– Total no. of disks is less than in a mirrored solution
– Good read throughput & reasonable write throughput
18-Feb-20 120 of 162

RAID 4 (2/3)
• Drawbacks (same as RAID 3)
– Dedicated parity drive can be a bottleneck when handling
small data writes (not well suited to transaction processing
applications)
– Data is lost if multiple drives fail within the same RAID 4 Group
• Performance
– High data read transfer rate. Poor to medium write transfer
rate. Disk failure has a significant impact on throughput
• Data Protection: Use of parity for improved fault tolerance
• Striping: Usually at the block (or block multiple) level
• Applications: General purpose file storage
• Note: RAID 4 is much less commonly used than RAID 5
18-Feb-20 121 of 162

RAID 4 (3/3)
18-Feb-20
RAID
Controller
P 0 1 2 3
Block 0Block 0
Block 0
Block 4
Block 1
Block 5
Block 2
Block 6
Block 3
Block 7
P 0 1 2 3
P 4 5 6 7
Parity
Generated
Block 0
P 0 1 2 3
Host
122 of 162

RAID 5 (1/4)
• Striped Array with Independent Disks and Distributed Parity
• RAID 5 performs independent R/W operations
• No dedicated parity drive (data and parity information is
distributed across all drives in the group)
• Benefits
• Most versatile RAID level
• A transfer rate greater than that of a single drive but with a
high overall I/O rate
• Good for parallel processing (multi-tasking) applications or
environments
• Cost savings due to the use of parity over mirroring
18-Feb-20 123 of 162

RAID 5 (2/4)
• Drawbacks
• Slower transfer rate than RAID 3
• Small writes are slow, because they require a read-modify-write
(RMW) operation
• There is degradation in performance in recovery/reconstruction
• Data loss if multiple drives within the same group fails
• Performance
• Good aggregate transfer rate (High read data transfer rate,
medium write data transfer rate)
• Low ratio of parity disks to data disks
18-Feb-20 124 of 162

RAID 5 (3/4)
• Data Protection
• Single disk failure puts volume in degraded mode
• Difficult to rebuild (as compared to RAID level 1)
• Disks
• 5-disk and 9-disk groups are popular. Most implementations
allow other RAID set sizes
• Striping: Block or multiple-block level
• Applications
• File and application servers, database servers, WWW, email,
and News servers
18-Feb-20 125 of 162

RAID 5 (4/4)
18-Feb-20
Block 0
P 0 1 2 3
Block 7
RAID
Controller
P 0 1 2 3
Block 0Block 4Block 0
Block 1
Block 5
Block 2
Block 6
Block 3
Parity
Generated
Block 0
P 0 1 2 3
Block 4
P 4 5 6 7P 4 5 6 7
Block 4
P 4 5 6 7
Block 4
Parity
Generated
Host
126 of 162

RAID Implementations (1/2)
• Hardware RAID
– Implemented by intelligent storage systems external to the
host (or Host has intelligent controllers that offload RAID
management functions from the host)
• Software RAID
– Describes RAID that is managed by the host CPU
• Disadvantage
– It uses host CPU cycles that would be better utilized to process
application data
– Many host CPUs and OS do not perform I/O functions very
efficiently, so the host is ill-suited for the task
– Often looks attractive initially because it does not require the
purchase of additional hardware. The initial cost savings are soon
exceeded by the expense of using a costly server to perform I/O
operations that it performs inefficiently at best
18-Feb-20 127 of 162

RAID Implementations (2/2)
• Hardware (usually a specialized disk controller card)
– Controls all drives that are attached it
– Performs all RAID-related functions including volume
management
– Array(s) appear to the host operating system as a regular disk
drive
– Dedicated cache to improve performance
– Generally provides some type of administrative software
• Software (Generally runs as part of OS)
– Volume management and performed by the server
– Provides more flexibility for hardware, which can reduce cost
– Performance is dependent on CPU load & server performance
– Has limited functionality
18-Feb-20 128 of 162

Hot Spares (1/3)
• Hot spare is an idle component (often a drive) in a RAID array
that becomes temporary replacement of a failed component
• For example:
– The hot spare takes the failed drive’s identity in the array
– Data recovery takes place based on the RAID implementation
(whether Parity OR Mirroring was used)
– The failed drive is replaced with a new drive at some time later
– One of the following occurs:
• The hot spare replaces the new drive permanently (A new hot
spare must be configured on the system)
• When the new drive is added to the system, data from the hot
spare is copied to the new drive (The hot spare returns to its idle
state, ready to replace the next failed drive)
18-Feb-20 129 of 162

Hot Spares (2/3)
• Note: The hot spare drive needs to be large enough to
accommodate the data from the failed drive
• Hot spare replacement can be Automatic or User initiated
– Automatic: When a disk’s recoverable error rates exceed a
predetermined threshold, the disk subsystem tries to copy
data from the failing disk to a spare one. If this task completes
before the damaged disk fails, the subsystem switches to the
spare and marks the failing disk unusable. (If not it uses parity
or the mirrored disk to recover the data, as appropriate).
– User initiated: This gives the administrator control when to
rebuild (e.g., rebuild overnight so as not to degrade system
performance). However, the system is vulnerable to another
failure because the hot spare is now unavailable. Some
systems implement multiple hot spares to improve availability.
18-Feb-20 130 of 162

Hot Spares (3/3)
18-Feb-20
RAID
Controller
131 of 162

Hot Swap (1/2)
• Like hot spares, hot swaps enable a system to recover
quickly in the event of a failure. With a hot swap the user
can replace the failed hardware (such as a controller)
without having to shut down the system
• Note
– A warm swap occurs when the system needs to be shut down,
but power does not need to be removed in order to replace the
failed component
– A cold swap occurs when the power must be removed as well
– Some systems have the ability to auto-swap without user
intervention
18-Feb-20 132 of 162

Hot Swap (2/2)
18-Feb-20
RAID
Controller
RAID
Controller
RAID
Controller
133 of 162

Module – 5/5
Intelligent Disk Storage Systems

Intelligent Storage System?
• A disk storage system distributes data over several devices
and manages data access
• vs. Individual storage devices
– Increased capacity
– Improved performance
– Easier data management
– Better data availability
– More robust backup/restore capabilities
– Improved flexibility and scalability
• Categories of Arrays
– Monolithic (Integrated) Storage Systems
– Modular Storage Systems
18-Feb-20 135 of 162

Monolithic (Integrated) Storage Systems (1/3)
• Aimed at the Enterprise level, centralizing data in a powerful
system with hundreds of drives
• Also called: Integrated arrays or Enterprise arrays or Cache
centric arrays
• The system is contained within a single or interconnected
frame (for expansion) and can scale to support increases in
connectivity, performance, and capacity as required
• Can handle large amounts of concurrent I/Os on very large
data applications
• Limitations
– High upfront costs limiting their applicability to only the most
mission critical applications
– Take up a large amount of space in the data center
18-Feb-20 136 of 162

18-Feb-20
Monolithic
FC Ports
Port Processors
Cache
RAID Controllers
137 of 162

• Characteristics
– Large storage capacity
– Large Cache to store IOs before writing to disk
– Redundancy (improves data protection and availability)
– More robust and fault tolerant due to many built-in features
– Connect to mainframes or very powerful open systems hosts
– Multiple front-end ports (connectivity to multiple servers)
– Multiple back-end FC/SCSI RAID controllers (manage disk
processing)
– Expensive
18-Feb-20 138 of 162

Modular Storage Systems (1/3)
• Aimed at small companies/department level
• Also called: Midrange or Departmental storage systems
• Provide storage to a smaller number of Windows or Unix
servers than the larger Integrated storage systems
• Typically designed with two controllers, each of which
contains host interfaces, cache, RAID processors, and disk
drive interfaces.
18-Feb-20 139 of 162

18-Feb-20
Rack
Servers
Disk Modules
Control Module
with Disks
FC Switches
Modular
Host Interface
Cache
RAID
Controller A
Host Interface
Cache
RAID
Controller B
140 of 162

• Characteristics
– Smaller disk capacity
– Less global cache
– Limited redundancy and connectivity
– Can start with a smaller number of disks and scale as needed
– Performance can degrade as capacity increases
– Fewer front-end ports for connection to servers
– Cannot connect to mainframes
– Usually have separate controllers from the disk array
– Takes up less floor space and costs less
18-Feb-20 141 of 162

Elements of Intelligent Storage Systems
• Intelligent storage systems are organized into the following
areas:
– Front End
– Cache
– Back End
– Physical disks
18-Feb-20 142 of 162

Elements of Intelligent Storage Systems
18-Feb-20
Intelligent Storage System
Cache
Front-End Back-End
Cache
Physical Disks
Host Connectivity
• Intelligent storage systems are organized into following
areas:
– Front End
– Cache
– Back End
– Physical disks
143 of 162

Intelligent Storage System: Front-end (1/3)
18-Feb-20
Note: Include redundancy in the channels to and from the ports.
Ports
Host Connectivity
Controllers
Front-End Back-End
Cache
Physical Disks
144 of 162

• Provides communication between storage system and host
• Main parts
– Ports & Controllers
• Storage Ports
– External interfaces for connectivity to host
– Each port has processing logic responsible for executing
appropriate transport protocol for storage connections.
• E.g. SCSI, FC, or iSCSI
– To maintain data availability, the front end of the storage
systems generally have multiple ports.
• Provides redundancy in case of a failure
• Balance the load when the system is experiencing heavy use.
• Mid-range storage system: ranges from 1-8 (Typically 4)
• Large monolithic array: about 64 or 12818-Feb-20 145 of 162

• Controllers
– Available behind the storage ports to route data to the cache
via the internal data bus
– Sends an acknowledgement message back to host as soon as
the cache receives the data
18-Feb-20 146 of 162

Front-End Command Queuing (1/3)
18-Feb-20
F
R
O
N
T
E
N
D
Request 1
Request 2
Request 3
Request 4
1234
F
R
O
N
T
E
N
D
Request 1
Request 2
Request 3
Request 4
1324
Without Command Queuing
With Command Queuing
1
2
3
4
1
2
3
4
147 of 162

• Processes multiple concurrent commands based on disk data
organization, regardless of the order in which the commands
were received
• Command queuing software
– Reorders commands, Assigns a tag to each command to
identify when it is executed (efficiently)
– Some disk drives (SCSI & FC disks) are intelligent enough to
manage their own command queuing
• Intelligent storage systems may make use of this native disk
intelligence, and may supplement it with queuing performed by
the controller
• Queue Depth Setting
– Defines number of outstanding requests that are active at the
same time in the queue
– Many manufactures have configurable queue depths
18-Feb-20 148 of 162

• Common Command queuing algorithms
– FIFO
• Commands are executed in the order in which they arrive
• Limitation: Identical to having no queuing – Inefficient
– Seek Time Optimization
• Faster than FIFO
• Optimizing seek times only, without regard for rotational latency,
will not normally produce the best results
– E.g. Consider two requests on cylinders that are very close to each
other, but in very different places within the track. Meanwhile,
there might be a third sector that is a few cylinders further away
but much closer overall to the location of the first request which
could be considered
– Access Time Optimization
• Combines seek time optimization with an analysis of rotational
latency for optimal performance
18-Feb-20 149 of 162

Intelligent Storage System: Cache
• Cache is a high speed memory
– Improves system performance by isolating hosts from
mechanical delays associated with physical disks (due to seek
times and rotational latency) and minimizes delay (< ms)
– Improves performance of R/W
18-Feb-20
Host Connectivity
Front-End Back-End
Cache
Physical Disks
150 of 162

Intelligent Storage System: Back End (1/3)
18-Feb-20
Host Connectivity
PortsControllers
Front-End Back-End
Cache
Physical Disks
151 of 162

Intelligent Storage System: Back-End (2/3)
• Data from Cache gets transferred through I/O bus to back
end, where it is routed to the correct drive
• Disk Controllers provides communication with disks for R/W
operations
– Manages data transfer between I/O bus and disks
– Handles device addressing, translating logical blocks into
physical locations on the disk
– Provides additional and limited temporary storage for data
– Provides error detection and correction – often in conjunction
with similar features on the disks
– Allows multiple devices to communicate to HBA on the host
– Facilitates performance enhancement
18-Feb-20 152 of 162

Intelligent Storage System: Back-End (3/3)
• Disk controllers
– Implemented as hardware with firmware that communicates
with disks via disk interface, sending commands to initiate
R/W process on disks
– The design of the controller is vendor specific
• Multiple Disk Controllers
– Provide maximum data protection and availability (with
alternative path in case of a failure)
• Reliability is enhanced if the disks used are dual-ported; each
disk port can connect to a separate controller. Having more than
one port on each controller will provide additional protection in
the event of certain types of failure
– Facilitate load balancing
18-Feb-20 153 of 162

Intelligent Storage System: Physical Disks (1/2)
18-Feb-20
Host Connectivity
Front End Back End
Cache
Physical Disks
154 of 162

Intelligent Storage System: Physical Disks (2/2)
• Physical disks are where the storage actually takes place
• Drives are connected to controller with either SCSI (SCSI
interface and copper cable) or FC or copper cables
• This could be a single disk drive or a more complex RAID set
– ATA drives are used when a storage system is used in
environments where performance is not critical
• Connection: Parallel ATA (PATA) or serial ATA (SATA) copper cables
– Mixture of SCSI or FC drives and ATA drives
• Higher performing drives are used for application data storage
• Slower ATA drives are used for backup and archiving
18-Feb-20 155 of 162

I/O Example: Read Requests
Host Connectivity
Front End Back End
Cache
Physical Disks
18-Feb-20 156 of 162

I/O Example: Write Requests
Host Connectivity
Front End Back End
Cache
Physical Disks
18-Feb-20 157 of 162

What the Host Sees
LUN 0
LUN 1
LUN 2
LUN 0
LUN 1
LUN 2
Host
Host
Back End
Physical Disks
Cache
18-Feb-20 158 of 162

The Host and Logical Device Names
Host
Volume
Manager
Host
/dev/rdsk/c1t1d0
/dev/rdsk/c1t1d1
.PhysicalDrive0
Volume
Manager
LUN 0
LUN 1
LUN 2
LUN 0
LUN 1
LUN 2
Back End
Physical Disks
Cache
18-Feb-20 159 of 162

Disk Organization in a Storage System
LUN 0
LUN 1
Host
Host
LUN 0
LUN 1
Back End Physical Disks
Cache
18-Feb-20 160 of 162

References
• Information Storage & Management, EMC Education
Services (E – Book)
18-Feb-20 161 of 162

STORAGE SYSTEM ARCHITECTURE OVERVIEW

STORAGE SYSTEM ARCHITECTURE OVERVIEW

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to STORAGE SYSTEM ARCHITECTURE OVERVIEW

Similar to STORAGE SYSTEM ARCHITECTURE OVERVIEW (20)

More from Christalin Nelson

More from Christalin Nelson (18)

Recently uploaded

Recently uploaded (20)

STORAGE SYSTEM ARCHITECTURE OVERVIEW