Storage Training July 10


Published on

An effective guide to understanding data and network storage architecture to enable successful prospect discussions...

Published in: Technology, Business
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Storage Training July 10

  1. 1. A Telemarketers Guide By Fiaz Khan 1
  2. 2. Contents: Introduction – What does Data Storage mean? Page 3 Storage Mediums Page 4 Network Storage – The Basics? Page 5 Direct Attached Storage (DAS) Page 6 Network Attached Storage (NAS) Page 7 Storage Area Network (SAN) Page 9 What is RAID? Page 12 Tiered Storage Page 15 Backup Storage Page 16 Storage Glossary Page 17 References Page 25 2
  3. 3. Introduction: What does Data Storage mean? Storage is the place where data is held in an electromagnetic or optical form for access by a computer processor. There are two general usages. 1) Storage is the devices and data connected to the computer through input/output operations - that is, hard disk and tape systems and other forms of storage that don't include computer memory and other in- computer storage. For the enterprise, the options for this kind of storage are of much greater variety and expense than that related to memory. This meaning is more common in cross-vertical sectors than meaning 2. 2) In a more formal usage, storage has been divided into: Primary storage, which holds data in memory (sometimes called random access memory or RAM) and other "built-in" devices such as the processor's L1 cache, and Secondary storage, which holds data on hard disks, tapes, and other devices requiring input/output operations. Primary storage: 1) Also known as main storage or memory is the main area in a computer in which data is stored for quick access by the computer's processor. On today's smaller computers, especially personal computers and workstations, the term random access memory (RAM) - or just memory - is used instead of primary or main storage, and the hard disk, diskette, CD, and DVD collectively describe secondary storage or auxiliary storage. (Simple!) 2) Also means storage for data that is in active use in contrast to storage that is used for backup purposes. Secondary storage: Secondary storage, sometimes called auxiliary storage, is all data storage that is not currently in a computer's primary storage or memory. An additional synonym (see Storage Mediums below) is external storage. In a personal computer, secondary storage typically consists of storage on the hard disk and on any removable media, if present, such as a CD or DVD. 3
  4. 4. Storage Mediums: A storage medium is any technology (including devices and materials) used to place, keep, and retrieve data. A medium is an element used in communicating a message; on a storage medium, the "messages" - in the form of data - are suspended for use when needed. The plural form of this term is storage media. Although the term storage includes both primary storage (memory), a storage medium usually means a place to hold secondary storage such as that on a hard disk or tape. Storage media can be arranged for access in many ways. Some well-known arrangements include: A redundant array of independent disks (RAID) network-attached storage A storage area network 4
  5. 5. Network Storage – The Basics? 'What is network storage?' and 'Why do we use it?' In basic terms, network storage is simply about storing data using a method by which it can be made available to clients on the network. Over the years, the storage of data has evolved through various phases. This evolution has been driven partly by the changing ways in which we use technology, and in part by the exponential increase in the volume of data we need to store. It has also been driven by new technologies, which allow us to store and manage data in a more effective manner. In the days of mainframes, data was stored physically separate from the actual processing unit, but was still only accessible through the processing units. As PC based servers became more commonplace, storage devices went 'inside the box' or in external boxes that were connected directly to the system. Each of these approaches was valid in its time, but as our need to store increasing volumes of data and our need to make it more accessible grew, other alternatives were needed. This is where network storage was born! The next pages will give you a rundown of some of the basic terminology that you would need to effectively discuss and approach prospects professionally (In other words…sound like you know what the hell you are on about!). 5
  6. 6. Direct Attached Storage (DAS) Direct attached storage is the term used to describe a storage device that is directly attached to a host system. 6
  7. 7. Network Attached Storage (NAS) Network Attached Storage, or NAS, is a data storage mechanism that uses special devices connected directly to the network media. These devices are assigned an IP address and can then be accessed by clients via a server that acts as a gateway to the data or in some cases allows the device to be accessed directly by the clients without an intermediary. The beauty of the NAS structure is that it means that in an environment with many servers running different operating systems, storage of data can be centralised, as can the security, management, and backup of the data. An increasing number of companies already make use of NAS technology. Some of the big advantages of NAS include the expandability; need more storage space, add another NAS device and expand the available storage. NAS also bring an extra level of fault tolerance to the network. In a DAS environment, a server going down means that the data that that server holds is no longer available. With NAS, the data is still available on the network and accessible by clients. Fault tolerant measures such as RAID, can be used to make sure that the NAS device does not become a point of failure. 7
  8. 8. Network-attached storage (NAS) is hard disk storage that is set up with its own network address rather than being attached to the department computer that is serving applications to a network's workstation users. By removing storage access and its management from the department server, both application programming and files can be served faster because they are not competing for the same processor resources. The network-attached storage device is attached to a local area network (typically, an Ethernet network) and assigned an IP address. File requests are mapped by the main server to the NAS file server. NAS software can usually handle a number of network protocols, including Microsoft's Internetwork Packet Exchange and NetBEUI, Novell's Netware Internetwork Packet Exchange, and Sun Microsystems' Network File System. Configuration, including the setting of user access priorities, is usually possible using a Web browser. Network-attached storage consists of hard disk storage, including multi-disk RAID systems, and software for configuring and mapping file locations to the network-attached device. Network-attached storage can be a step toward and included as part of a more sophisticated storage system known as a storage area network (SAN). 8
  9. 9. Storage Area Network (SAN) A SAN is a network of storage devices that are connected to each other and to a server, or cluster of servers, which act as an access point to the SAN. In some configurations a SAN is also connected to the network. SAN's use special switches as a mechanism to connect the devices. These switches, which look a lot like a normal Ethernet networking switch, act as the connectivity point for SAN's. Making it possible for devices to communicate with each other on a separate network. A storage area network can use existing communication technology such as IBM's optical fiber ESCON or it may use the newer Fibre Channel technology. Some SAN system integrators like it to the common storage bus (flow of data) in a personal computer that is shared by different kinds of storage devices such as a hard disk or a CD-ROM player. SANs support disk mirroring, backup and restore, archival and retrieval of archived data, data migration from one storage device to another, and the sharing of data among different servers in a network. SANs can incorporate sub networks with network-attached storage (NAS) systems. 9
  10. 10. Many IT organizations today are scratching their heads debating whether the advantages of implementing a SAN solution justify the associated costs. Others are trying to get a handle on today's storage options and whether SAN is simply Network Attached Storage spelled backwards. To completely understand the basic purpose and function of a SAN we would need to examine its role in modern network environments. We will also look at how SANs meet the network storage needs of today's organizations. In very basic terms, a SAN can be anything from two servers on a network accessing a central pool of storage devices to several thousand servers accessing many millions of megabytes of storage. Conceptually, a SAN can be thought of as a separate network of storage devices physically removed from, but still connected to, the network. SANs evolved from the concept of taking storage devices, and therefore storage traffic, off the LAN and creating a separate back-end network designed specifically for data. SANs represent the evolution of data storage technology to this point. Traditionally, on client server systems, data was stored on devices either inside or directly attached to the server. Next in the evolutionary scale came Network Attached Storage (NAS) which took the storage devices away from the server and connected them directly to the network. SANs take the principle one step further by allowing storage devices to exist on their own separate network and communicate directly with each other over very fast media. Users can gain access to these storage devices through server systems which are connected to both the LAN and the SAN. This is in contrast to the use of a traditional LAN for providing a connection for server-storage, a strategy that limits overall network bandwidth. SANs address the bandwidth bottlenecks associated with LAN based server storage and the scalability limitations found with SCSI bus based implementations. SANs provide modular scalability, high-availability, increased fault tolerance and centralized storage management. These advantages have led to an increase in the popularity of SANs as they are quite simply better suited to address the data storage needs of today's data intensive network environments. Other developments are coming through that will change the way that we use and access network storage. One of these advances pegged to make a large contribution to the growing success of network storage in general is iSCSI. 10
  11. 11. iSCSI is a technology that allows data to be transported to and from storage devices over an IP network. What it actually does is serialize the data from a SCSI connection. Using iSCSI, the concept of network storage can be taken anywhere that IP can go, which as the Internet proves, is basically anywhere. Technologies like Fibre Channel and iSCSI are a big factor in how fast people are able to afford and implement network storage solutions. 11
  12. 12. What is RAID? RAID (redundant array of independent disks; originally redundant array of inexpensive disks) is a way of storing the same data in different places (thus, redundantly) on multiple hard disks. By placing data on multiple disks, I/O (input/output) operations can overlap in a balanced way, improving performance. Since multiple disks increases the mean time between failures (MTBF), storing data redundantly also increases fault tolerance. A RAID appears to the operating system to be a single logical hard disk. RAID employs the technique of disk striping, which involves partitioning each drive's storage space into units ranging from a sector (512 bytes) up to several megabytes. The stripes of all the disks are interleaved and addressed in order. In a single-user system where large records, such as medical or other scientific images, are stored, the stripes are typically set up to be small (perhaps 512 bytes) so that a single record spans all disks and can be accessed quickly by reading all disks at the same time. In a multi-user system, better performance requires establishing a stripe wide enough to hold the typical or maximum size record. This allows overlapped disk I/O across drives. There are at least nine types of RAID plus a non-redundant array (RAID-0): RAID-0: This technique has striping but no redundancy of data. It offers the best performance but no fault-tolerance. RAID-1: This type is also known as disk mirroring and consists of at least two drives that duplicate the storage of data. There is no striping. Read performance is improved since either disk can be read at the same time. Write performance is the same as for single disk storage. RAID-1 provides the best performance and the best fault-tolerance in a multi-user system. RAID-2: This type uses striping across disks with some disks storing error checking and correcting (ECC) information. It has no advantage over RAID-3. RAID-3: This type uses striping and dedicates one drive to storing parity information. The embedded error checking (ECC) information is used to detect errors. Data recovery is accomplished by calculating the exclusive OR (XOR) of the information recorded on the other drives. Since an I/O operation addresses all drives at the same time, RAID-3 cannot overlap I/O. For this reason, RAID-3 is best for single-user systems with long record applications. 12
  13. 13. RAID-4: This type uses large stripes, which means you can read records from any single drive. This allows you to take advantage of overlapped I/O for read operations. Since all write operations have to update the parity drive, no I/O overlapping is possible. RAID-4 offers no advantage over RAID-5. RAID-5: This type includes a rotating parity array, thus addressing the write limitation in RAID-4. Thus, all read and write operations can be overlapped. RAID-5 stores parity information but not redundant data (but parity information can be used to reconstruct data). RAID-5 requires at least three and usually five disks for the array. It's best for multi-user systems in which performance is not critical or which do few write operations. RAID-6: This type is similar to RAID-5 but includes a second parity scheme that is distributed across different drives and thus offers extremely high fault- and drive-failure tolerance. RAID-7: This type includes a real-time embedded operating system as a controller, caching via a high-speed bus, and other characteristics of a stand-alone computer. One vendor offers this system. RAID-10: Combining RAID-0 and RAID-1 is often referred to as RAID- 10, which offers higher performance than RAID-1 but at much higher cost. There are two subtypes: In RAID-0+1, data is organized as stripes across multiple disks, and then the striped disk sets are mirrored. In RAID-1+0, the data is mirrored and the mirrors are striped. RAID-50 (or RAID-5+0): This type consists of a series of RAID-5 groups and striped in RAID-0 fashion to improve RAID-5 performance without reducing data protection. RAID-53 (or RAID-5+3): This type uses striping (in RAID-0 style) for RAID-3's virtual disk blocks. This offers higher performance than RAID- 3 but at much higher cost. RAID-S (also known as Parity RAID): This is an alternate, proprietary method for striped parity RAID from EMC Symmetrix that is no longer in use on current equipment. It appears to be similar to RAID-5 with some performance enhancements as well as the enhancements that come from having a high-speed disk cache on the disk array. 13
  14. 14. 14
  15. 15. Tiered Storage: Tiered storage is the assignment of different categories of data to different types of storage media in order to reduce total storage cost. Categories may be based on levels of protection needed, performance requirements, frequency of use, and other considerations. Since assigning data to particular media may be an ongoing and complex activity, some vendors provide software for automatically managing the process based on a company-defined policy. As an example of tiered storage, tier 1 data (such as mission-critical, recently accessed, or top secret files) might be stored on expensive and high-quality media such as double-parity RAIDs (redundant arrays of independent disks). Tier 2 data (such as financial, seldom-used, or classified files) might be stored on less expensive media in conventional storage area networks (SANs). As the tier number increased, cheaper media could be used. Thus, tier 3 in a 3- tier system might contain event-driven, rarely used, or unclassified files on recordable compact discs (CD-Rs) or tapes. 15
  16. 16. Backup storage: Backup storage is storage that is intended as a copy of the storage that is actively in use so that, if the storage medium such as a hard disk fails and data is lost on that medium, it can be recovered from the copy. In an enterprise, because the loss of business data can be catastrophic, it is important that backup storage be provided. On a personal computer, backup storage is commonly achieved with Zip drives and DVDs. In an enterprise, backup storage can sometimes be achieved through replication of data in multidisk storage systems, such as RAID; as part of network-attached storage (NAS); as part of a storage area network (SAN); or as part of a tiered storage system. Enterprise backup storage often makes use of both disk and tape as storage media. Special software is used to manage backup as part of a storage system. The same means and media used for backup storage are often used for archival storage. 16
  17. 17. Storage Glossary: Basic Storage Terms A Asynchronous Replication After data has been written to the primary storage site, new writes to that site can be accepted, without having to wait for the secondary (remote) storage site to also finish its writes. Asynchronous Replication does not have the latency impact that synchronous replication does, but has the disadvantage of incurring data loss, should the primary site fail before the data has been written to the secondary site. See also replication. B Backup/Restore A two step process. Information is first copied to non-volatile disk or tape media. In the event of computer problems (such as disk drive failures, power outages, or virus infection) resulting in data loss or damage to the original data, the copy is subsequently retrieved and restored to a functional system. Basic Disk A basic disk is a physical disk that can be accessed by MS–DOS and all Windows-based operating systems. Basic disks can contain up to four primary partitions, or three primary partitions and an extended partition with multiple logical drives. Compare to dynamic disks. Block Data Raw data which does not have a file structure imposed on it. Database applications such as Microsoft SQL Server and Microsoft Exchange Server transfer data in blocks. Block transfer is the most efficient way to write to disk. Business Continuity The ability of an organization to continue to function even after a disastrous event, accomplished through the deployment of redundant hardware and software, the use of fault tolerant systems, as well as a solid backup and recovery strategy. C Cluster A group of servers that together act as a single system, enabling load balancing and high availability. Clustering can be housed in the same physical location (basic cluster) or can be distributed across multiple sites (geo- dispersed clusters) for disaster recovery. 17
  18. 18. D DAS (Direct Attached Storage) DAS is storage that is directly connected to a server by connectivity media such as parallel SCSI cables. This direct connection provides fast access to the data; however, storage is only accessible from that server. DAS include the internally attached local disk drives or externally attached RAID (redundant array of independent disks) or JBOD (just a bunch of disks). Although Fibre Channel can be used for direct attached, it is more commonly used in storage area networks. DFS (Distributed File System) DFS allows administrators to group shared folders located on different servers by transparently connecting them to one or more DFS namespaces. A DFS namespace is a virtual view of shared folders in an organization. Disaster Recovery The ability to recover from the loss of a complete site, whether due to natural disaster or malicious intent. Disaster recovery strategies include replication and backup/restore. Dynamic Disk A dynamic disk is a physical disk that provides features that basic disks do not, such as support for volumes spanning multiple disks. Dynamic disks use a hidden database to track information about dynamic volumes on the disk and other dynamic disks in the computer. F Fabric A Fibre Channel (or iSCSI) topology with at least one switch present on the network. Failover In the event of a physical disruption to a network component, data is immediately rerouted to an alternate path so that services remain uninterrupted. Failover applies both to clustering and to multiple paths to storage. In the case of clustering, one or more services (such as Exchange) is moved over to a standby server in the event of a failure. In the case of multiple paths to storage, a path failure results in data being rerouted to a different physical connection to the storage. Fault–Tolerance Fault–tolerance is the ability of computer hardware or software to ensure data integrity when hardware failures occur. Fault-tolerant features appear in many 18
  19. 19. server operating systems and include mirrored volumes, RAID– volumes, and server clusters. File Data Data which has an associated file system. Fibre Channel A high–speed interconnect used in storage area networks (SANs) to connect servers to shared storage. Fibre Channel components include HBAs, hubs, switches, and cabling. The term Fibre Channel also refers to the storage protocol. FRS File Replication service (FRS) is a technology that replicates files and folders stored in the SYSVOL shared folder on domain controllers and Distributed File System (DFS) shared folders. When FRS detects that a change has been made to a file or folder within a replicated shared folder, FRS replicates the updated file or folder to other servers. G Geo–Dispersed Cluster A geo–dispersed, or multi-site, cluster is a cluster configuration used to help ensure high system and application availability in the event of site disaster. In this configuration, servers are separated geographically and the physical storage (quorum disk) is synchronously replicated between sites. Global File System In some configurations, as with clusters or multiple NAS boxes, it is useful to have a means to make the file systems on multiple servers or devices look like a single file system. A global or dispersed file system would enable storage administrators to globally build or make changes to file systems. To date this remains an emerging technology. H High Availability A continuously available computer system is characterized as having essentially no downtime in any given year. A system with 99.999% availability experiences only about five minutes of downtime. In contrast, a high availability system is defined as having 99.9% uptime, which translates into a few hours of planned or unplanned downtime per year. HBA (Host Bus Adapter) The HBA is the intelligent hardware residing on the host server which controls the transfer of data between the host and the target storage device. 19
  20. 20. ILM (Information Lifecycle Management) The process of managing information growth, storage, and retrieval over time, based on its value to the organization. Sometimes referred to as data lifecycle management. Initiator An initiator is the device (usually contained within a server) that makes the application requests; which are then sent to the target device. iSCSI (Internet SCSI) A protocol that enables transport of block data over IP networks, without the need for a specialized network infrastructure, such as Fibre Channel. J JBOD (Just a Bunch of Disks) As the name suggests, a group of disks housed in its own box; JBOD differs from RAID in not having any storage controller intelligence or data redundancy capabilities. L Load Balancing Referring to the ability to redistribute load (read/write requests) to an alternate path between server and storage device, load balancing helps to maintain high performance networking. LUN (Logical Unit Number) A logical unit is a conceptual division (a subunit) of a storage disk or a set of disks. Logical units can directly correspond to a volume drive (for example, C: can be a logical unit). Each logical unit has an address, known as the logical unit number (LUN), which allows it to be uniquely identified. LUN Masking A method to restrict server access to storage not specifically allocated to that server. LUN masking is similar to zoning, but is implemented in the storage array, not the switch. M Mount Point A mount point is a directory on a volume that an application can use to "mount" (set up for use) a different volume. Mount points overcome the limitation on drive letters and allow more logical organization of files and folders. 20
  21. 21. Multipathing Multipathing is the use of redundant storage network components responsible for transfer of data between the server and storage. These components include cabling, adapters and switches and the software that enables this. N NAS (Network Attached Storage) A NAS device is a server that runs an operating system specifically designed for handling files (rather than block data). Network-attached storage is accessible directly on the local area network (LAN) through LAN protocols such as TCP/IP. Compare to DAS and SAN. NTFS File System A file system that provides performance, security, reliability, and advanced features that are not found in any version of the file allocation table (FAT) filesystem. For example, NTFS guarantees volume consistency by using standard transaction logging and recovery techniques. If a system fails, NTFS uses its log file and checkpoint information to restore the consistency of the file system. NTFS also provides advanced features, such as file and folder permissions, encryption, disk quotas, and compression. P Partition A partition is the portion of a physical disk or LUN that functions as though it were a physically separate disk. Once the partition is created, it must be formatted and assigned a drive letter before data can be stored on it. On basic disks, partitions can contain basic volumes, which include primary partitions and logical drives. On dynamic disks, partitions are known as dynamic volumes, which include simple, striped, spanned, mirrored, and RAID–5 (striped with parity) volumes. Port The physical connection point on computers, switches, storage arrays, etc, which is used to connect to other devices on a network. Ports on a Fibre Channel network are identified by their Worldwide Port Name (WWPN) IDs; on iSCSI networks, ports are commonly given an iSCSI name. Not to be confused with TCP/IP ports, which are used as virtual addresses assigned to each IP address. R RAID (Redundant Array of Independent Disks) A way of storing the same data over multiple physical disks to ensure that if a 21
  22. 22. hard disk fails a redundant copy of the data can be accessed instead. Example schemes include mirroring and RAID–5. Redundancy The duplication of information or hardware equipment components to ensure that should a primary resource fail, a secondary resource can take over its function. Replication Replication is the process of duplicating mission critical data from one highly available site to another. The replication process can be synchronous or asynchronous; duplicates are known as clones, point-in-time copies, or snapshots, depending on the type of copy being made. S SAN (Storage Area Network) A storage area network (SAN) is a specialized network that provides access to high performance and highly available storage subsystems using block storage protocols. The SAN is made up of specific devices, such as host bus adapters (HBAs) in the host servers, switches that help route storage traffic, and disk storage subsystems. The main characteristic of a SAN is that the storage subsystems are generally available to multiple hosts at the same time, which makes them scalable and flexible. Compare with NAS and DAS. SCSI (Small Computer System Interface) A set of standards allowing computers to communicate with attached devices, such as storage devices (disk drives, tape libraries etc) and printers. SCSI also refers to a parallel interconnect technology which implements the SCSI protocol. Shadow Copy A shadow copy is a high fidelity point–in–time copy of the original data. In the Windows environment, shadow copies are created using the Volume Shadow Copy Service (VSS); third party applications can create shadow copies also. Storage Array A subsystem which houses a group of disks (or tapes), together controlled by software usually housed within the subsystem. Storage Controller Providing such functionality as disk aggregation (RAID), I/O routing, and error detection and recovery, the controller provides the intelligence for the storage subsystem. Each storage subsystem contains one or more storage controllers. 22
  23. 23. Switch An intelligent device residing on the network responsible for directing data from the source (such as a server) or sources directly to a specific target device (such as a specific storage device) with minimum delay. Switches differ in their capabilities; a director class switch, for example, is a high end switch that provide advanced management and availability features. Synchronous Replication In synchronous replication, each write to the primary disk and the secondary (remote) disk must be complete before the next write can begin. The advantage of this approach is that the two sets of data are always synchronized. The disadvantage is that if the distance between the two storage disks is substantial, the replication process can take a long time and slows down the application writing the data. See also asynchronous replication. T Target A target is the device to which the initiator sends data. Most commonly the target is the storage array, but the term also applies to bridges, tape libraries, tape drives or other devices. Tiered Storage Data is stored according to its intended use. For instance, data intended for restoration in the event of data loss or corruption is stored locally, for fast recovery. Data required to be kept for regulatory purposes is archived to lower cost disks. V VDS (Virtual Disk Service) VDS is a set of application programming interfaces (APIs) that provides a single interface for managing disks in Windows Server 2003 operating systems. VDS provides a means of managing storage hardware and disks, and for creating volumes on those disks. Virtualization In storage, virtualization is a means by which multiple physical storage devices are viewed as a single logical unit. Virtualization can be accomplished in–band (in the data path) or out-of-band. Out–of–band virtualization does not compete for host resources, and can virtualize storage resources irrespective of whether they are DAS, NAS or SAN. Volume A volume is an area of storage on a hard disk. A volume is formatted by using 23
  24. 24. a file system, such as file allocation table (FAT) or NTFS, and typically has a drive letter assigned to it. A single hard disk can have multiple volumes, and volumes can also span multiple disks. VSS (Volume Shadow Copy Service) The Volume Shadow Copy Service provides the backup infrastructure for the Microsoft Windows XP and Microsoft Windows Server 2003 operating systems, as well as a mechanism for creating consistent point-in-time copies of data known as shadow copies. Z Zoning A method used to restrict server access to storage resources that are not allocated to that server. Zoning is similar to LUN masking, but is implemented in the switch and operates on the basis of port identification (either port numbers on the switch or by WWPN of the attached initiators and targets). 24
  25. 25. References for further information: In the first instance I would like to stress that Jess would be the best option for any further questions or concerns surrounding the wonderful world of Data Storage. However, if you dare to get technical check out the following sites: solutions.html Further Terminology links (for any technical jargon you don’t understand): 25