A Telemarketers Guide
By Fiaz Khan
Introduction – What does Data Storage mean? Page 3
Storage Mediums Page 4
Network Storage – The Basics? Page 5
Direct Attached Storage (DAS) Page 6
Network Attached Storage (NAS) Page 7
Storage Area Network (SAN) Page 9
What is RAID? Page 12
Tiered Storage Page 15
Backup Storage Page 16
Storage Glossary Page 17
References Page 25
What does Data Storage mean?
Storage is the place where data is held in an electromagnetic or optical form
for access by a computer processor. There are two general usages.
1) Storage is the devices and data connected to the computer through
input/output operations - that is, hard disk and tape systems and other
forms of storage that don't include computer memory and other in-
computer storage. For the enterprise, the options for this kind of
storage are of much greater variety and expense than that related to
memory. This meaning is more common in cross-vertical sectors than
2) In a more formal usage, storage has been divided into:
Primary storage, which holds data in memory (sometimes called
random access memory or RAM) and other "built-in" devices such as
the processor's L1 cache, and
Secondary storage, which holds data on hard disks, tapes, and other
devices requiring input/output operations.
1) Also known as main storage or memory is the main area in a computer in
which data is stored for quick access by the computer's processor. On today's
smaller computers, especially personal computers and workstations, the term
random access memory (RAM) - or just memory - is used instead of primary
or main storage, and the hard disk, diskette, CD, and DVD collectively
describe secondary storage or auxiliary storage. (Simple!)
2) Also means storage for data that is in active use in contrast to storage that
is used for backup purposes.
Secondary storage, sometimes called auxiliary storage, is all data storage that
is not currently in a computer's primary storage or memory. An additional
synonym (see Storage Mediums below) is external storage.
In a personal computer, secondary storage typically consists of storage on the
hard disk and on any removable media, if present, such as a CD or DVD.
A storage medium is any technology (including devices and materials) used to
place, keep, and retrieve data. A medium is an element used in
communicating a message; on a storage medium, the "messages" - in the
form of data - are suspended for use when needed. The plural form of this
term is storage media.
Although the term storage includes both primary storage (memory), a storage
medium usually means a place to hold secondary storage such as that on a
hard disk or tape.
Storage media can be arranged for access in many ways. Some well-known
A redundant array of independent disks (RAID)
A storage area network
Network Storage – The Basics?
'What is network storage?' and 'Why do we use it?'
In basic terms, network storage is simply about storing data using a method
by which it can be made available to clients on the network. Over the years,
the storage of data has evolved through various phases. This evolution has
been driven partly by the changing ways in which we use technology, and in
part by the exponential increase in the volume of data we need to store. It has
also been driven by new technologies, which allow us to store and manage
data in a more effective manner.
In the days of mainframes, data was stored physically separate from the
actual processing unit, but was still only accessible through the processing
units. As PC based servers became more commonplace, storage devices
went 'inside the box' or in external boxes that were connected directly to the
system. Each of these approaches was valid in its time, but as our need to
store increasing volumes of data and our need to make it more accessible
grew, other alternatives were needed. This is where network storage was born!
The next pages will give you a rundown of some of the basic terminology that
you would need to effectively discuss and approach prospects professionally
(In other words…sound like you know what the hell you are on about!).
Direct Attached Storage (DAS)
Direct attached storage is the term used to describe a storage device that is
directly attached to a host system.
Network Attached Storage (NAS)
Network Attached Storage, or NAS, is a data storage mechanism that uses
special devices connected directly to the network media. These devices are
assigned an IP address and can then be accessed by clients via a server that
acts as a gateway to the data or in some cases allows the device to be
accessed directly by the clients without an intermediary.
The beauty of the NAS structure is that it means that in an environment with
many servers running different operating systems, storage of data can be
centralised, as can the security, management, and backup of the data. An
increasing number of companies already make use of NAS technology.
Some of the big advantages of NAS include the expandability; need more
storage space, add another NAS device and expand the available storage.
NAS also bring an extra level of fault tolerance to the network. In a DAS
environment, a server going down means that the data that that server holds
is no longer available. With NAS, the data is still available on the network and
accessible by clients. Fault tolerant measures such as RAID, can be used to
make sure that the NAS device does not become a point of failure.
Network-attached storage (NAS) is hard disk storage that is set up with its
own network address rather than being attached to the department computer
that is serving applications to a network's workstation users. By removing
storage access and its management from the department server, both
application programming and files can be served faster because they are not
competing for the same processor resources. The network-attached storage
device is attached to a local area network (typically, an Ethernet network) and
assigned an IP address. File requests are mapped by the main server to the
NAS file server.
NAS software can usually handle a number of network protocols, including
Microsoft's Internetwork Packet Exchange and NetBEUI, Novell's Netware
Internetwork Packet Exchange, and Sun Microsystems' Network File System.
Configuration, including the setting of user access priorities, is usually
possible using a Web browser.
Network-attached storage consists of hard disk storage, including multi-disk
RAID systems, and software for configuring and mapping file locations to the
network-attached device. Network-attached storage can be a step toward and
included as part of a more sophisticated storage system known as a storage
area network (SAN).
Storage Area Network (SAN)
A SAN is a network of storage devices that are connected to each other and
to a server, or cluster of servers, which act as an access point to the SAN. In
some configurations a SAN is also connected to the network. SAN's use
special switches as a mechanism to connect the devices. These switches,
which look a lot like a normal Ethernet networking switch, act as the
connectivity point for SAN's. Making it possible for devices to communicate
with each other on a separate network.
A storage area network can use existing communication technology such as
IBM's optical fiber ESCON or it may use the newer Fibre Channel technology.
Some SAN system integrators like it to the common storage bus (flow of data)
in a personal computer that is shared by different kinds of storage devices
such as a hard disk or a CD-ROM player.
SANs support disk mirroring, backup and restore, archival and retrieval of
archived data, data migration from one storage device to another, and the
sharing of data among different servers in a network. SANs can incorporate
sub networks with network-attached storage (NAS) systems.
Many IT organizations today are scratching their heads debating whether the
advantages of implementing a SAN solution justify the associated costs.
Others are trying to get a handle on today's storage options and whether SAN
is simply Network Attached Storage spelled backwards.
To completely understand the basic purpose and function of a SAN we would
need to examine its role in modern network environments. We will also look at
how SANs meet the network storage needs of today's organizations.
In very basic terms, a SAN can be anything from two servers on a network
accessing a central pool of storage devices to several thousand servers
accessing many millions of megabytes of storage. Conceptually, a SAN can
be thought of as a separate network of storage devices physically removed
from, but still connected to, the network. SANs evolved from the concept of
taking storage devices, and therefore storage traffic, off the LAN and creating
a separate back-end network designed specifically for data.
SANs represent the evolution of data storage technology to this point.
Traditionally, on client server systems, data was stored on devices either
inside or directly attached to the server. Next in the evolutionary scale came
Network Attached Storage (NAS) which took the storage devices away from
the server and connected them directly to the network. SANs take the
principle one step further by allowing storage devices to exist on their own
separate network and communicate directly with each other over very fast
media. Users can gain access to these storage devices through server
systems which are connected to both the LAN and the SAN.
This is in contrast to the use of a traditional LAN for providing a connection for
server-storage, a strategy that limits overall network bandwidth. SANs
address the bandwidth bottlenecks associated with LAN based server storage
and the scalability limitations found with SCSI bus based implementations.
SANs provide modular scalability, high-availability, increased fault tolerance
and centralized storage management. These advantages have led to an
increase in the popularity of SANs as they are quite simply better suited to
address the data storage needs of today's data intensive network
Other developments are coming through that will change the way that we use
and access network storage. One of these advances pegged to make a large
contribution to the growing success of network storage in general is iSCSI.
iSCSI is a technology that allows data to be transported to and from storage
devices over an IP network. What it actually does is serialize the data from a
SCSI connection. Using iSCSI, the concept of network storage can be taken
anywhere that IP can go, which as the Internet proves, is basically anywhere.
Technologies like Fibre Channel and iSCSI are a big factor in how fast people
are able to afford and implement network storage solutions.
What is RAID?
RAID (redundant array of independent disks; originally redundant array of
inexpensive disks) is a way of storing the same data in different places (thus,
redundantly) on multiple hard disks. By placing data on multiple disks, I/O
(input/output) operations can overlap in a balanced way, improving
performance. Since multiple disks increases the mean time between failures
(MTBF), storing data redundantly also increases fault tolerance.
A RAID appears to the operating system to be a single logical hard disk. RAID
employs the technique of disk striping, which involves partitioning each drive's
storage space into units ranging from a sector (512 bytes) up to several
megabytes. The stripes of all the disks are interleaved and addressed in
In a single-user system where large records, such as medical or other
scientific images, are stored, the stripes are typically set up to be small
(perhaps 512 bytes) so that a single record spans all disks and can be
accessed quickly by reading all disks at the same time.
In a multi-user system, better performance requires establishing a stripe wide
enough to hold the typical or maximum size record. This allows overlapped
disk I/O across drives.
There are at least nine types of RAID plus a non-redundant array (RAID-0):
RAID-0: This technique has striping but no redundancy of data. It offers
the best performance but no fault-tolerance.
RAID-1: This type is also known as disk mirroring and consists of at
least two drives that duplicate the storage of data. There is no striping.
Read performance is improved since either disk can be read at the
same time. Write performance is the same as for single disk storage.
RAID-1 provides the best performance and the best fault-tolerance in a
RAID-2: This type uses striping across disks with some disks storing
error checking and correcting (ECC) information. It has no advantage
RAID-3: This type uses striping and dedicates one drive to storing
parity information. The embedded error checking (ECC) information is
used to detect errors. Data recovery is accomplished by calculating the
exclusive OR (XOR) of the information recorded on the other drives.
Since an I/O operation addresses all drives at the same time, RAID-3
cannot overlap I/O. For this reason, RAID-3 is best for single-user
systems with long record applications.
RAID-4: This type uses large stripes, which means you can read
records from any single drive. This allows you to take advantage of
overlapped I/O for read operations. Since all write operations have to
update the parity drive, no I/O overlapping is possible. RAID-4 offers no
advantage over RAID-5.
RAID-5: This type includes a rotating parity array, thus addressing the
write limitation in RAID-4. Thus, all read and write operations can be
overlapped. RAID-5 stores parity information but not redundant data
(but parity information can be used to reconstruct data). RAID-5
requires at least three and usually five disks for the array. It's best for
multi-user systems in which performance is not critical or which do few
RAID-6: This type is similar to RAID-5 but includes a second parity
scheme that is distributed across different drives and thus offers
extremely high fault- and drive-failure tolerance.
RAID-7: This type includes a real-time embedded operating system as
a controller, caching via a high-speed bus, and other characteristics of
a stand-alone computer. One vendor offers this system.
RAID-10: Combining RAID-0 and RAID-1 is often referred to as RAID-
10, which offers higher performance than RAID-1 but at much higher
cost. There are two subtypes: In RAID-0+1, data is organized as
stripes across multiple disks, and then the striped disk sets are
mirrored. In RAID-1+0, the data is mirrored and the mirrors are striped.
RAID-50 (or RAID-5+0): This type consists of a series of RAID-5
groups and striped in RAID-0 fashion to improve RAID-5 performance
without reducing data protection.
RAID-53 (or RAID-5+3): This type uses striping (in RAID-0 style) for
RAID-3's virtual disk blocks. This offers higher performance than RAID-
3 but at much higher cost.
RAID-S (also known as Parity RAID): This is an alternate, proprietary
method for striped parity RAID from EMC Symmetrix that is no longer
in use on current equipment. It appears to be similar to RAID-5 with
some performance enhancements as well as the enhancements that
come from having a high-speed disk cache on the disk array.
Tiered storage is the assignment of different categories of data to different
types of storage media in order to reduce total storage cost. Categories may
be based on levels of protection needed, performance requirements,
frequency of use, and other considerations. Since assigning data to particular
media may be an ongoing and complex activity, some vendors provide
software for automatically managing the process based on a company-defined
As an example of tiered storage, tier 1 data (such as mission-critical, recently
accessed, or top secret files) might be stored on expensive and high-quality
media such as double-parity RAIDs (redundant arrays of independent disks).
Tier 2 data (such as financial, seldom-used, or classified files) might be stored
on less expensive media in conventional storage area networks (SANs). As
the tier number increased, cheaper media could be used. Thus, tier 3 in a 3-
tier system might contain event-driven, rarely used, or unclassified files on
recordable compact discs (CD-Rs) or tapes.
Backup storage is storage that is intended as a copy of the storage that is
actively in use so that, if the storage medium such as a hard disk fails and
data is lost on that medium, it can be recovered from the copy. In an
enterprise, because the loss of business data can be catastrophic, it is
important that backup storage be provided.
On a personal computer, backup storage is commonly achieved with Zip
drives and DVDs. In an enterprise, backup storage can sometimes be
achieved through replication of data in multidisk storage systems, such as
RAID; as part of network-attached storage (NAS); as part of a storage area
network (SAN); or as part of a tiered storage system. Enterprise backup
storage often makes use of both disk and tape as storage media. Special
software is used to manage backup as part of a storage system.
The same means and media used for backup storage are often used for
Basic Storage Terms
After data has been written to the primary storage site, new writes to that site
can be accepted, without having to wait for the secondary (remote) storage
site to also finish its writes. Asynchronous Replication does not have the
latency impact that synchronous replication does, but has the disadvantage of
incurring data loss, should the primary site fail before the data has been
written to the secondary site. See also replication.
A two step process. Information is first copied to non-volatile disk or tape
media. In the event of computer problems (such as disk drive failures, power
outages, or virus infection) resulting in data loss or damage to the original
data, the copy is subsequently retrieved and restored to a functional system.
A basic disk is a physical disk that can be accessed by MS–DOS and all
Windows-based operating systems. Basic disks can contain up to four primary
partitions, or three primary partitions and an extended partition with multiple
logical drives. Compare to dynamic disks.
Raw data which does not have a file structure imposed on it. Database
applications such as Microsoft SQL Server and Microsoft Exchange Server
transfer data in blocks. Block transfer is the most efficient way to write to disk.
The ability of an organization to continue to function even after a disastrous
event, accomplished through the deployment of redundant hardware and
software, the use of fault tolerant systems, as well as a solid backup and
A group of servers that together act as a single system, enabling load
balancing and high availability. Clustering can be housed in the same physical
location (basic cluster) or can be distributed across multiple sites (geo-
dispersed clusters) for disaster recovery.
DAS (Direct Attached Storage)
DAS is storage that is directly connected to a server by connectivity media
such as parallel SCSI cables. This direct connection provides fast access to
the data; however, storage is only accessible from that server. DAS include
the internally attached local disk drives or externally attached RAID
(redundant array of independent disks) or JBOD (just a bunch of disks).
Although Fibre Channel can be used for direct attached, it is more commonly
used in storage area networks.
DFS (Distributed File System)
DFS allows administrators to group shared folders located on different servers
by transparently connecting them to one or more DFS namespaces. A DFS
namespace is a virtual view of shared folders in an organization.
The ability to recover from the loss of a complete site, whether due to natural
disaster or malicious intent. Disaster recovery strategies include replication
A dynamic disk is a physical disk that provides features that basic disks do not,
such as support for volumes spanning multiple disks. Dynamic disks use a
hidden database to track information about dynamic volumes on the disk and
other dynamic disks in the computer.
A Fibre Channel (or iSCSI) topology with at least one switch present on the
In the event of a physical disruption to a network component, data is
immediately rerouted to an alternate path so that services remain
uninterrupted. Failover applies both to clustering and to multiple paths to
storage. In the case of clustering, one or more services (such as Exchange) is
moved over to a standby server in the event of a failure. In the case of
multiple paths to storage, a path failure results in data being rerouted to a
different physical connection to the storage.
Fault–tolerance is the ability of computer hardware or software to ensure data
integrity when hardware failures occur. Fault-tolerant features appear in many
server operating systems and include mirrored volumes, RAID– volumes, and
Data which has an associated file system.
A high–speed interconnect used in storage area networks (SANs) to connect
servers to shared storage. Fibre Channel components include HBAs, hubs,
switches, and cabling. The term Fibre Channel also refers to the storage
File Replication service (FRS) is a technology that replicates files and folders
stored in the SYSVOL shared folder on domain controllers and Distributed File
System (DFS) shared folders. When FRS detects that a change has been
made to a file or folder within a replicated shared folder, FRS replicates the
updated file or folder to other servers.
A geo–dispersed, or multi-site, cluster is a cluster configuration used to help
ensure high system and application availability in the event of site disaster. In
this configuration, servers are separated geographically and the physical
storage (quorum disk) is synchronously replicated between sites.
Global File System
In some configurations, as with clusters or multiple NAS boxes, it is useful to
have a means to make the file systems on multiple servers or devices look
like a single file system. A global or dispersed file system would enable
storage administrators to globally build or make changes to file systems. To
date this remains an emerging technology.
A continuously available computer system is characterized as having
essentially no downtime in any given year. A system with 99.999% availability
experiences only about five minutes of downtime. In contrast, a high
availability system is defined as having 99.9% uptime, which translates into a
few hours of planned or unplanned downtime per year.
HBA (Host Bus Adapter)
The HBA is the intelligent hardware residing on the host server which controls
the transfer of data between the host and the target storage device.
ILM (Information Lifecycle Management)
The process of managing information growth, storage, and retrieval over time,
based on its value to the organization. Sometimes referred to as data lifecycle
An initiator is the device (usually contained within a server) that makes the
application requests; which are then sent to the target device.
iSCSI (Internet SCSI)
A protocol that enables transport of block data over IP networks, without the
need for a specialized network infrastructure, such as Fibre Channel.
JBOD (Just a Bunch of Disks)
As the name suggests, a group of disks housed in its own box; JBOD differs
from RAID in not having any storage controller intelligence or data
Referring to the ability to redistribute load (read/write requests) to an alternate
path between server and storage device, load balancing helps to maintain
high performance networking.
LUN (Logical Unit Number)
A logical unit is a conceptual division (a subunit) of a storage disk or a set of
disks. Logical units can directly correspond to a volume drive (for example, C:
can be a logical unit). Each logical unit has an address, known as the logical
unit number (LUN), which allows it to be uniquely identified.
A method to restrict server access to storage not specifically allocated to that
server. LUN masking is similar to zoning, but is implemented in the storage
array, not the switch.
A mount point is a directory on a volume that an application can use to
"mount" (set up for use) a different volume. Mount points overcome the
limitation on drive letters and allow more logical organization of files and
Multipathing is the use of redundant storage network components responsible
for transfer of data between the server and storage. These components
include cabling, adapters and switches and the software that enables this.
NAS (Network Attached Storage)
A NAS device is a server that runs an operating system specifically designed
for handling files (rather than block data). Network-attached storage is
accessible directly on the local area network (LAN) through LAN protocols
such as TCP/IP. Compare to DAS and SAN.
NTFS File System
A file system that provides performance, security, reliability, and advanced
features that are not found in any version of the file allocation table (FAT)
filesystem. For example, NTFS guarantees volume consistency by using
standard transaction logging and recovery techniques. If a system fails, NTFS
uses its log file and checkpoint information to restore the consistency of the
file system. NTFS also provides advanced features, such as file and folder
permissions, encryption, disk quotas, and compression.
A partition is the portion of a physical disk or LUN that functions as though it
were a physically separate disk. Once the partition is created, it must be
formatted and assigned a drive letter before data can be stored on it. On basic
disks, partitions can contain basic volumes, which include primary partitions
and logical drives. On dynamic disks, partitions are known as dynamic
volumes, which include simple, striped, spanned, mirrored, and RAID–5
(striped with parity) volumes.
The physical connection point on computers, switches, storage arrays, etc,
which is used to connect to other devices on a network. Ports on a Fibre
Channel network are identified by their Worldwide Port Name (WWPN) IDs;
on iSCSI networks, ports are commonly given an iSCSI name. Not to be
confused with TCP/IP ports, which are used as virtual addresses assigned to
each IP address.
RAID (Redundant Array of Independent Disks)
A way of storing the same data over multiple physical disks to ensure that if a
hard disk fails a redundant copy of the data can be accessed instead.
Example schemes include mirroring and RAID–5.
The duplication of information or hardware equipment components to ensure
that should a primary resource fail, a secondary resource can take over its
Replication is the process of duplicating mission critical data from one highly
available site to another. The replication process can be synchronous or
asynchronous; duplicates are known as clones, point-in-time copies, or
snapshots, depending on the type of copy being made.
SAN (Storage Area Network)
A storage area network (SAN) is a specialized network that provides access
to high performance and highly available storage subsystems using block
storage protocols. The SAN is made up of specific devices, such as host bus
adapters (HBAs) in the host servers, switches that help route storage traffic,
and disk storage subsystems. The main characteristic of a SAN is that the
storage subsystems are generally available to multiple hosts at the same time,
which makes them scalable and flexible. Compare with NAS and DAS.
SCSI (Small Computer System Interface)
A set of standards allowing computers to communicate with attached devices,
such as storage devices (disk drives, tape libraries etc) and printers. SCSI
also refers to a parallel interconnect technology which implements the SCSI
A shadow copy is a high fidelity point–in–time copy of the original data. In the
Windows environment, shadow copies are created using the Volume Shadow
Copy Service (VSS); third party applications can create shadow copies also.
A subsystem which houses a group of disks (or tapes), together controlled by
software usually housed within the subsystem.
Providing such functionality as disk aggregation (RAID), I/O routing, and error
detection and recovery, the controller provides the intelligence for the storage
subsystem. Each storage subsystem contains one or more storage controllers.
An intelligent device residing on the network responsible for directing data
from the source (such as a server) or sources directly to a specific target
device (such as a specific storage device) with minimum delay. Switches differ
in their capabilities; a director class switch, for example, is a high end switch
that provide advanced management and availability features.
In synchronous replication, each write to the primary disk and the secondary
(remote) disk must be complete before the next write can begin. The
advantage of this approach is that the two sets of data are always
synchronized. The disadvantage is that if the distance between the two
storage disks is substantial, the replication process can take a long time and
slows down the application writing the data. See also asynchronous
A target is the device to which the initiator sends data. Most commonly the
target is the storage array, but the term also applies to bridges, tape libraries,
tape drives or other devices.
Data is stored according to its intended use. For instance, data intended for
restoration in the event of data loss or corruption is stored locally, for fast
recovery. Data required to be kept for regulatory purposes is archived to lower
VDS (Virtual Disk Service)
VDS is a set of application programming interfaces (APIs) that provides a
single interface for managing disks in Windows Server 2003 operating
systems. VDS provides a means of managing storage hardware and disks,
and for creating volumes on those disks.
In storage, virtualization is a means by which multiple physical storage
devices are viewed as a single logical unit. Virtualization can be accomplished
in–band (in the data path) or out-of-band. Out–of–band virtualization does not
compete for host resources, and can virtualize storage resources irrespective
of whether they are DAS, NAS or SAN.
A volume is an area of storage on a hard disk. A volume is formatted by using
a file system, such as file allocation table (FAT) or NTFS, and typically has a
drive letter assigned to it. A single hard disk can have multiple volumes, and
volumes can also span multiple disks.
VSS (Volume Shadow Copy Service)
The Volume Shadow Copy Service provides the backup infrastructure for the
Microsoft Windows XP and Microsoft Windows Server 2003 operating
systems, as well as a mechanism for creating consistent point-in-time copies
of data known as shadow copies.
A method used to restrict server access to storage resources that are not
allocated to that server. Zoning is similar to LUN masking, but is implemented
in the switch and operates on the basis of port identification (either port
numbers on the switch or by WWPN of the attached initiators and targets).
References for further information:
In the first instance I would like to stress that Jess would be the best option for
any further questions or concerns surrounding the wonderful world of Data
However, if you dare to get technical check out the following sites:
Further Terminology links (for any technical jargon you don’t understand):