Online Storage Virtualization: The Key to Managing the Data ...


Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Online Storage Virtualization: The Key to Managing the Data ...

  1. 1. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 Online Storage Virtualization: The key to managing the data explosion Charles Milligan Sid Selkirk StorageTek StorageTek One StorageTek Drive One StorageTek Drive Louisville, CO 80028-2201 Louisville, CO 80028-2129, email: email: Abstract The primary business concern is to understand how the information content of the data has changed over time High value software functionality (such as virtual (for trends) and what it is at any moment of decision volume mapping, floating data positioning, and making. However, there are other concerns that are just SnapShot) are key to the success of online data storage as overwhelming to the operations staff such as simply systems today and in the future. The introduction of new managing the placement of data and the configuration of storage technologies that introduce advances in metadata equipment. Another important concern is the area of generation, interfaces, capacities, & bandwidth without recovery from failures. Failures come about for a number requiring installation upheaval is required. Virtualization of reasons and some are hard to detect. The most is the mechanism that allows the next generation online catastrophic failures, such as an act of nature or an act of storage to be seamlessly integrated. It also allows cost war, that render a particular site inoperable, are sensitive installations to make use of alternative disk straightforward to detect. The process of business types, (such as commodity PC (ATA) drives instead of continuance in these cases is also well understood. One SCSI or Fibre Channel drives), to be used effectively must have an alternative source of the data. That without requiring complex installation management. The alternative source must be in a form that can be quickly principles of virtualization are described and some brought into use. Other failures occur in operations that vendor options that incorporate these principles are result in a device or piece of media containing some data discussed. becoming inoperable. A third failure mode is in the processes that derive the data or that analyze the data. 1. Introduction When these processes fail, they often corrupt the database itself and render the information invalid or unavailable As the amount of data available to individuals trying to even when the data itself is still extant. manage a business explodes, the value of data The possibility of failures of any type is exacerbated management software & firmware (such as virtual by the change in the business operating model. The 7 volume mapping, floating data positioning, and AM to 9 PM - 6-day week has given way to 7 X 24 X SnapShot) become critical in the race to keep up. forever operations requirements, which has two effects. Analysts for the computer storage industry estimate that The first is that there is now no window for taking care of the growth of storage used to sustain business processes is those housekeeping chores that allowed the 'failure' 75% to 100% per year. At the same time, the amount of processes to get ready for recovery. The second effect is budget money available to retain the needed skills for that there is no time to recover even if housekeeping has managing the installations and for improving the data collected the necessary data. processing techniques is increasing at less than 10% per The key to the success of online data storage systems year. When the budget increases are discounted for today, and in the future, is to have in place systems and annual inflation, the growth in availability of skilled functions that significantly improve the capabilities of people for solving problems is in the 5% range. This operations staff to cope with the explosive growth of the means that the people who build and maintain and data to be managed, while these same systems and process the data in large warehousing environments must functions anticipate the various failure modes and provide individually learn how to manage 50 to 100 times more automated means to detect and recover from those data over the next seven years than they currently failures. Virtualization of the devices and the subsystem manage. structures is the foundation for providing these systems and functions. The basic functions and characteristics of 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 1
  2. 2. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 storage virtualization need to be understood before the (single), large “address space” for each of the users, high value functionality can be explained. Once that has servers, or clusters, to apparently own (i.e., the virtual been explained, then the value to the users and system storage). It also takes on the burden of automating the administrators can be described. The intent of this paper management of the physical storage that underlies the is to discuss the basics of virtualization in laymen's virtual storage. A virtual system should allow the terms so that the use of virtualization can be made much assignment of logical addresses to storage devices, and less a mystery. We then discuss how virtualization can allow partitioning and concatenation of devices to form support decision-making based on the availability of more logically smaller or larger devices. It should provide an and more data (requiring more and more processing to get emulation of one device using a different model of at information). The hope is that this will inspire further device. However, if the responsibility of managing the study of the subject, and suggestions for how to measure address mapping, allocation, partitioning and and test the assertions made here. concatenation still remains on a human administrator, then that system has not reached the goal of storage 2. Definitions virtualization. It has not removed the burden from the administrator. It has not automated the management tasks. The Robert Frances Group defines virtual as “…those architectures and products designed to emulate a physical 3. Basic Functions device where the characteristics of the emulated device are mapped over another physical device.” Another way 3.1 Naming and Addressing to express this is to say that virtualization separates the presentation of storage to the using system from the By “naming” is meant the assignment of a name (or actual physical devices. address) to a storage device or object from the viewpoint StorageTek Corporation white papers have explained of the using server, application or user. The name that virtualization is accomplished by using a answers the question: How do I find the storage device combination of code and hardware to overcome the limits logically from the host presentation point of view? It of physical components such as disk or tape devices, and may also be called a virtual name or virtual address. The that virtual means that a given storage block should not address is then carried into the storage subsystem, telling be expected to correspond to a media address and further it how to find the storage device or data block from the that the correspondence between virtual storage blocks viewpoint of the storage system. and physical media addresses can change over time. “Physical means that every storage block corresponds 3.2. Mapping to a physical media address; the correspondence between a storage block and a media address is immutable." This In a storage virtualization system there are several comes from the Virtual Storage Architecture Guide possible different levels of naming and addressing. There (VSAG), from IEEE Mass Storage Conference, 1995, R. is the address of the virtual device and block within the Baird, Hewlett-Packard Corporation, General Systems device from the viewpoint of the using system. This Solutions Laboratory, Cupertino, California, (Abstract). virtual address and/or name is then mapped to a logical Storage virtualization means dividing the available device, with an address from the viewpoint of the storage storage space into "virtual volumes" without regard to the system. This address is then mapped to one or more physical layout or topology of the actual storage elements addresses in the storage system. These may include such as disk drives, RAID (redundant array of additional logical devices and/or actual physical storage independent disks) subsystems and so on. From: “Users locations. need virtualization to manage storage” 3/15/01 David Legard, IDG News Service. 3.3. Mapping functions These definitions from leaders in the industry focus on what virtualization does to the mapping of the data in A storage virtualization system must have a mapping devices, but do not address the customer perspective of function. This function f (address, map) transforms the the problem. The goal of storage virtualization should be operating system or file system generated virtual address to make using the system simpler and easier and faster. It (or name) into one or more actual physical location should remove some of the burden of managing the addresses of the block. For example, this might include storage from the administrator. It should automate the how the block address issued by the file system or accessing and the administration tasks. A storage operating system is turned into the logical block address virtualization system gives the computing systems (hosts (LBA) issued to a disk drive. In the case of virtual tape and servers) and people (users and administrators) storage, the function might need to include some accessing the storage an “illusion” of a separate, simple, 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 2
  3. 3. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 Table 1: Storage naming conventions Viewpoint We call it Name or address of Example Using system Device name, or Virtual Virtual device Port #, Target ID, Logical (e.g. server) device address Unit Number (LUN) Using system Virtual block address block in virtual device SCSI Logical Block Address (LBA) Storage system Logical device address A logical device defined in Logical device number the storage system Storage system Logical block address logical block in logical device LBA Storage system Physical device address Actual physical storage Port #, Target ID, LUN device OR a separate storage systems device Storage system Physical block address physical device block SCSI LBA positioning or sequence state, as the tape access command 3.4. Allocation set does not always address blocks explicitly. There are likely multiple mapping functions in a Allocation of the storage system is done on both a virtual system to handle the different parts of a virtual virtual and a physical basis. Allocation includes both address. For example, there may be a table-based lookup capacity allocation (e.g. how much of the virtual storage to map the virtual device name or address to an internal capacity of the system is allocated to a virtual device) and logical device. Then there may be a separate and different location allocation (e.g. which specific logical address scheme for mapping the virtual block address within the ranges are assigned to a virtual device, or which physical virtual device to a block address in the logical device. addresses are used for storing a set of blocks.) Then there may be a mapping function to translate the logical addresses into one or more physical addresses. 3.5. Areas of allocation A linear mapping function provides for an offset to be added to the address issued by the file system. The 1. Allocation of virtual address space (or name space) physical storage can then be divided into multiple from the using system / host /server point of view: partitions, yet each using server can address its partition When a system administrator decides that a LUN is as if it was a separate storage area. This may occur at the needed for a particular application or server, and that it device level, with logical device addresses. The mapping needs 100 GB of capacity, and that it will have LUN function may divide the address space within the physical address 7, … this is doing virtual address space devices, and an offset applied to the LBA also. In some allocation. Can this be automated? Yes. It should be systems today this is called partitioning. automated. But in most systems today it is a manual The usefulness of partitioning may be extending by process. allowing several (possibly disjoint) segments of physical 2. Allocation of the back end physical storage space to storage space to be concatenated into a larger address actually store data, or to reserve space to later store data: space, and then the larger space partitioned as needed to In completely non-virtualized systems, this is performed provide the virtual address spaces to the using file at the same time as the front-end address space allocation. systems. The simplest form of concatenation joins (I.e. when deciding the LUN address and size, the system contiguous segments of the physical address space. A administrator also specifies somehow the physical disk more complex form adds the capability to concatenate locations to be used for the LUN). This should be (and non-contiguous segments. In the latter case, the mapping often is) automated in a storage virtualization system. The function becomes slightly more complex. storage system should be able to determine where to store At the opposite extreme from the simple linear offset the data for the virtual device without manual partitioning is a fully general mapping function. This type intervention. The allocation of back end space may be of mapping function allows complete independence of the somewhat static, in that it happens infrequently (it seldom virtual and physical address space. Contiguous blocks in changes) OR, it may be quite dynamic, changing as the the virtual space may be completely separated in the data is changed or as other operations occur in the storage physical address space, and vice versa. system. A key benefit of storage virtualization is for this physical storage allocation to be automated. 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 3
  4. 4. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 A fully virtualized storage system should automate the Another allocation characteristics may be one of naming allocation, the address allocation and the physical striping the data across multiple devices. space allocation. 3.9 Homogeneous and heterogeneous substitution 3.6. Capacity assignments (virtualization) Another basic function of storage virtualization is There is a great deal of variation in the way different substitution. The virtual system allows the substitution of vendors have chosen to assign capacity. The most one physical device for another or of one type or class of effective schemes will only allocate back end capacity as device for another. The substitution may be homogenous, it is really needed to store data modified by the using where one device is substituted for another of a like kind. system. Other schemes are much less efficient but are For example, a virtual system may substitute one Fibre simpler to implement. They generally pre-allocate some Channel disk drive for another Fibre Channel disk drive. storage areas, and often allocate larger blocks of storage The substitution may also be heterogeneous, where one than requested or allocate an area of the physical space device is substituted for another of a different kind. For before it is needed. example, a virtual tape system may substitute an LTO tape drive for a DLT tape drive. A heterogeneous 3.8. Location substitution may also involve different classes of devices. For example, a virtual system may substitute a disk drive In addition to just handing out slices or chunks of for a tape drive and cartridge, or vice versa. storage from a pool of storage, there may also be The virtual system may also substitute aggregations of allocation issues that have to do with other characteristics. partitions of devices for single devices. For example, a These may include redundancy or performance single virtual disk device may be logically replaced by an requirements on the virtual devices, which map into aggregation of partitions on several physical disk drives. requirements on the physical storage location of the data. A single virtual disk may be logically replaced by a For example, a logical device definition may include mirrored combination of a RAID disk group and a RAIT requirements for redundant storage of the data, or a tape group. The mirrored, aggregated disk and tape are remote copy, or a data bandwidth requirement. Such substituted for the single disk drive. (See also the section requirements heavily influence the allocation of logical on Layered Definitions, which follows). and /or physical devices to store the data. The substitution of devices requires the capability in Once the set of physical devices or logical devices that the storage system to emulate one device type while a set of data blocks is to be stored upon is decided, then storing the data on another. This may require address an allocation of where in that device or set of devices the mapping, command protocol translation, possibly blocks will be written. There are many allocation maintenance of additional metadata and/or some emulated methods.For this physical storage space allocation a log (virtual) device state. structured file system is one method. It has a The substitution of devices enables the integration of characteristic of “non-update in place,” i.e. not over- new storage technologies, and the presentation of virtual writing old data on the physical storage at the time of the devices for which no physical equivalent exists. write to the virtual block. (Old data is re-allocated later in a free space collection algorithm of some sort). Also, it 3.10. Layered definitions has a “write index point,” having one (or a small number of) physical location where new data is written. The write There are several possible layers of definitions of index indicates where the next write will be stored. This virtual storage. From the bottom up, there are physical write index normally moves sequentially across the storage devices, possibly of various types and models. storage blocks. Thus the data is stored on the media in the Mapped onto these physical devices are basic storage order it was written or updated. This tends to keep the constructs, such as basic logical storage volumes, logical most recently modified data together. The “non-update in devices, or simply chunks of storage space. Built on top place” characteristic makes it easier to implement a of these basic constructs may be more complex logical pointer based SnapShot copy mechanism. storage constructs, such as RAID groups, mirrored logical Other allocation methods may have different devices, remote mirrors, and RAIT sets. There may be characteristics. For example, one could use a “non-update even more layers of logical device constructs built on top in place, first fit” allocation method that stores new data of these. Finally, some logical device construct is in the first free space found when the media is searched presented as a virtual device, addressable by the using for free space starting at some beginning point. This system. An example of such a layered definition is shown method would tend to cluster the data as close to the in Table 2. beginning point as possible. 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 4
  5. 5. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 Table 2: Layered Definition to Provide Unique Device Characteristics Virtual disk device as seen by the using system, which is mapped to the: Logical disk device allocated to virtual device, which is mapped to the: Logical three way mirror, which is made up of: Logical high High availability logical disk device, maps to Logical automatic archive device, on a performance RAID construct built on the physical disks RAIT construct, allocated to disk, maps to Physical solid Physical Physical Physical Physical Physical tape Physical tape Physical tape state memory disk disk disk disk 3.11. Encryption 3.13. Access Control A storage system that is shared by multiple using The storage virtualization system functions also need a systems, multiple applications, and/or multiple users form of access control. There are multiple aspects of needs to provide data security for those applications and access control. users. One building block of good data security is encryption of the data. This encryption might be 3.13.1. Administrative access control. How does the accomplished at one of several locations in the overall system control who is allowed to perform administrative system. The best case, of course, is if the encryption and / or configuration tasks? occurs as close to the source of data as possible, and the decryption occurs just as the data is needed. Even if that 3.13.2. Operational or commands access control. How is not reasonable or feasible given the applications and does the system control who is allowed to request or operating system in use, a storage virtualization system command operations (other than basic read and write that provides encryption as the data enters, (along with operations)? good data access controls), can provide some protection against data being revealed to the wrong party. This does 3.13.3. Data access control: How does the system require a control interface to the authorized user or control who is allowed to access the data stored in the application to supply the encryption and/or decryption system? keys. This enables more secure sharing of the physical storage. It also enables more secure transmission of the The access controls in many systems have been data to remote storage locations and archives. associated with physical items, such as a physical I/O (input/output) port, (the access for specific devices is 3.12. Compression associated with a set of ports), or a physical device (the ability to read and/or write is controlled for a physical Data compression in the storage virtualization system device). In other cases it was all or nothing. In today’s has two possible benefits. One is the ability to store more larger, shared systems, with storage networks and data with less physical storage space (i.e. increased possibly many servers and applications attached to the capacity). The other is the possibility of higher effective same storage system, this is not adequate. In a storage data transfer bandwidths. The advantage of a storage virtualization system the access controls must be tied to virtualization system that implements compression is that non-physical entities, to virtual devices and virtual data the using system(s) do not have to know the data is paths. compressed, and don’t need to have or use compression Many of the access control questions are also issues and decompression software when accessing the data. for non-virtual storage systems. The additional functions Data compression in the storage system may be and layers of a storage virtualization system can actually accomplished without virtualization. However, it does not make the access control even more complicated. provide significant capacity benefit unless the storage Therefore a good virtualization system should provide system has at least some form of dynamic allocation of tools and automation to make the access controls as physical space and the ability to handle variable size painless and effortless as possible, yet allow the blocks. administrator that needs the control to take mode detailed control. The more the administration and control of the storage allocation and other advanced functions are 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 5
  6. 6. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 automated, the fewer explicit access controls there will be administration task is thereby simplified; it is not to to worry about. For example, if the ability to request an oversee the management of the resources to accomplish instant copy and access the instant copy is automatically the functionality, but rather simply to verify the end restricted to applications or users that also have at least results. A real world example of a complex business read access to the source data, then a separate access function invoked by using a unique single image device is control for who can request an instant copy may not be the best way to illustrate the principles described here. needed. Another approach would be to have the instant copy inherit the same read / write access controls as the 4.3. A real example source data. This, especially if combined with encrypted data requiring a key to de-code, would reduce the need A real example of a unique device definition for a for separate controls on instant copy requests. lowest cost approach includes the following requirements. Automatic copies of the data at multiple sites including 4. Virtualization Operations periodic ‘iron mountain’ archives, high performance (i.e., full fiber channel rate transfers), availability rated at 15- A virtualization operation is a process that combines a set 9’s (99.9999999999999% probability of successfully of the basic functions to accomplish a new task definition reading data), guarantee of 99% that the weekend archive and provides for the application of the basic functions to job will run to completion, immediate short term recall customer problems. (short term defined as 72 hours), and security to ensure that theft of the data does not compromise the information 4.1. Instant copy mechanisms e.g. SnapShot content. This is a composite set of requirements identified by One of the primary tasks associated with the efforts to an actual international banking concern. The database is accommodate failures in systems and processes and even an extremely large compendium of data (10’s of terabytes failures in the environment (acts of nature) is to copy growing at 70% per year). When the data is used, it is data. The idea of instantly copying large amounts of data accessed in a serial fashion so an automated tape solution is attractive but generally impractical. However, one was employed. However, because of the cost to the aspect of virtualization allows the appearance of instantly business of interrupting the ability to conduct business on copying a database and thus allowing the database a 7 X 24 X forever basis, the data must always be operations to proceed as if the copy operation were available. The operations are set up at multiple sites with accomplished. Because virtualization requires provision fail over from site to site. The systems administration for naming of data and for mapping of the data, there are processes are already set up to initiate alternate site pointers and tables available that describe the data. execution whenever a system failure is noted. However Making a copy of the mapping and giving that copy a in order to do this they must also make sure that multiple new name gives the appearance of having copied the data. copies of the data are available at the various potential If the data is appropriately marked so that there is an execution sites (multiple copies were to ensure the 15-9’s awareness of the existence of both mapping tables availability). They must also make sure that the data is (pointer sets), then the separation of the two instances of placed on a fast recall media for a 72-hour period and the data can proceed offline. While this is going on the then removed with the media being recycled. A enterprise users continue to access and update the original complication placed on the systems administration is to data and/or the virtual copy. reconfigure the system when new devices are added in order to reduce the overall operations costs. 4.2. Abstraction of device definitions The actual architecture designed to meet this set of customer requirements included a number of different One very attractive aspect of virtualization with the aspects of virtualization in combinations. The first two most potential for simplifying use and reducing the addressed the requirement for the data to be automatically administration tasks is the ability to define abstract virtual present at multiple sites. This required a combination of device structures that accomplish the objectives of use or device allocation virtualization with mirroring. First the administration inherently. The most useful examples system is placed behind a virtualization engine so that the include the ability to define unique devices that meet physical devices used to satisfy the user requests could be specific complex business operations requirements with a selected from a pool of available resources. The sites single device image. The virtual device is given an were physically located at distances greater than 25 installation specific name that is communicated to the kilometers so the connection between the sites required using community. When the device name is invoked, the networked fiber communication. This required that the required functionality and quality of service (QOS) is set virtualization engine accommodated networked traffic. up automatically by the storage system. The Second, the virtual device is defined as one with multiple 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 6
  7. 7. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 instantiations with one instantiation identified at each site. read or written. The inclusion in the definition of the When the data is to be read, the most convenient copy is virtual device of a mirror copy that is actually placed on a made available. When it is to be updated, the media is RAID 3 disk allows for the mount of the media for read automatically made available at each site and writes are to be only a few milliseconds. Disk drives are much mirrored to each of the media affected. more expensive than tape media however, and long term The third aspect of virtualization addressed the storage of many terabytes of data on disk is prohibitive. requirement for high bandwidth operation. The virtual The metadata of the system that describes the virtual device was defined with a RAID 0 striping that was device includes timing information. The creation or sufficient to drive the network at full rated speed. The update times of a collection of data such as a file or a data striping was an aspect of each one of the mirror copies. set are noted in the metadata. The inclusion of the disk The fourth aspect of virtualization addressed the 15-9’s mirror in the virtual volume is accompanied with a availability requirement. The mirroring at multiple sites scratch and reuse process. When the file has aged to the was only able to guarantee 10-9’s of availability. 72 hour mark from its creation (or update) the mirror Therefore, each instance of the data at each site was also copy that is on disk is scratched from the virtual volume covered by a RAID 3 redundancy using a new patented definition and the space recycled back to the pool of multiple parity approach. The RAID 3 redundancy was available space. The rest of the virtual volume definition applied to each of the individual mirror copies. remains in tact. The occasional archive of the data to a secure site for Finally, the seventh aspect of virtualization that is used storage is a task that required that one of the original satisfies the security concerns of the customer. Since the copies of the data be in turn copied and the media shipped data must be transferred across a network that traverses to the secure storage. This job currently takes 20 hours to 10’s of kilometers, the probability of the data being copy the terabytes of the database. The probability of this snooped on and possibly being copied is quite high. Even job running to completion is about 50% using today’s with all the precautions taken, there is a way to tap into a technologies. This is because more than a hundred tapes network and copy the traffic that streams by without are written and a device error check during the writing of being detected. Banking concerns are also notoriously one of these will cause the job to abort. The use of RAID paranoid about the privacy of their clientele so any 0 and 3 have several devices running in parallel and compromise of their data is serious. The virtualization shorten the job from 20 hours to 5 hours but do not affect that allows for unique device structures to be defined also the probability that the job will run to completion. In fact allows for processing routines to be inserted into the data it is exacerbated since the number devices required to path at any point beyond where the customer relinquishes complete the job has increased. The fifth aspect of the data to the storage system. These routines are used to virtualization employed however allows this 99% create metadata about the information content of the data guarantee of job completion to be met. The fact that the as it flows through the system and can be used to enhance data is being written to a virtual device means that the the ability of the storage system to support searches and physical devices that are employed can be configured at queries. Another use of these processing routines is to will. The definition of the virtual device is initially encrypt the data as it is flowing through the network and mapped with one extra device in the configuration. The even to encrypt the data as it resides on the media. When metadata that describes the data and how it is mapped on the data is encrypted on the media, the enterprise is the media accounts for the extra device. The additional assured that the theft of physical media will not media is simply used to add an additional parity to the compromise their customer’s information. RAID 3. If a device fails during the operation, that device and its corresponding piece of media are simply 4.4 The integration of new storage technologies mapped out and the metadata updated to reflect the new configuration. The job will run to completion with the The integration of new storage technologies that required performance and availability still met. If a introduce advances in metadata generation, interfaces, second device were to fail, the process would be repeated, capacities, & bandwidth without requiring installation this time mapping out a device used for customer data upheaval is a promise of virtualization. The virtualization transfer. There is no effect on availability and the effect mechanisms that allow the next generation of online of performance is only about 5%. Now, the probability storage to be seamlessly integrated are the same as those of the job running to completion is about 7-9’s which is that allowed the unique device descriptions to satisfy the well beyond the 99% required. customer requirements in the previous example. The fact The sixth aspect of virtualization used to satisfy these that the characteristics of the new technology can be requirements is the ability to do technology substitutions. described to the virtualization system allows the system The mirroring of the data so far has been to sets of tape to automatically make use of the technology to satisfy cartridges that need to be mounted on tape drives to be QOS (Quality of Service) requests. 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 7
  8. 8. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 Virtualization allows cost sensitive installations to operational costs. In addition, the intrinsic value of the make use of alternative disk types (such as commodity data is enhanced when the jobs complete. PC (ATA) drives) effectively without requiring complex installation management. In the above example, the 5.5 .Reduced operational cost RAID disk first employed the SCSI over fiber channel products that are so prevalent. However, there are now a The ability to purchase and configure new number of disk products using ATA drives that are an technologies that save both time and operational expense order of magnitude less expensive than the SCSI is not easily calculated. The reduction in direct costs of equivalents. The fact that the virtualization can map from having less expensive systems installed in a timely one technology, like tape, to another, like disk, can be fashion is straight forward to understand. However, extended to include simple protocol mapping such as indirect savings of data availability improvements often SCSI to ATA. are less obvious. 5. Value of virtualization 6. High value software functionality Customer benefits of the application of the The many uses of virtualization cannot be explained in virtualization features described above are broad and can detail because they in fact are only limited by the mean the difference between simply competing and imagination of those using them. A few have been actually winning. described above in the real world example. A few more ideas are briefly outlined in conclusion here. 5.1. Manpower 6.1. “Instant” applications The number of people needed to administer the storage system described above is significantly less than by An apparent instant copy of a data warehouse for traditional methods. The result is not that the customer backup or archive has been described in the real world necessarily reduces staffing, but the staff that is employed example above. is an order of magnitude more effective in their work. A Another example is to make a clone of a database or a great deal of the mundane has been moved into the subset of a data warehouse to be used for test or research storage subsystem and is not a bother to the customer trials operations. A third example is to make a clone of a set of data for immediate service in a personalized application. An 5.2. Stress reduction example of this could be when one wants to make a new instance of a web server for a new customer. An instant When critical jobs can be guaranteed to complete on copy of a basic web server with minimal personalization time, the operations environment is significantly can be made in minutes. enhanced. The worldwide availability of skilled personnel to manage these storage systems is limited. 6.2. Secure applications Having an environment that automates many of the complex tasks and that assumes many of the mundane Data anywhere without compromise can be tasks is very inviting to prospective employees. accomplished with virtualization via security drivers inserted into the data path. 5.3. Time reductions 6.3. Device abstractions The use of striping clearly enhances performance allowing an operation that took 20 hours to complete in 5. Niche applications that do not justify development of That is only the beginning of the timesaving. The fact special devices can be defined and accommodated for that the work is done as a monolithic definition rather individual customers. than scheduled as a set of interrelated tasks to be executed The next generation of performance and reliability can individually also saves a great deal of time. be provided using current products. These can be obviously useful devices or fanciful devices designed to 5.4. Reduced system costs answer ‘what if’ questions. The guarantee that a large job will run to completion and not abort midstream is a clear reduction in systems 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 8
  9. 9. Proceedings of the 35th Hawaii International Conference on System Sciences - 2002 6.4 Pooling / device sharing Reduced cost for installations, better operations responsiveness, and lower maintenance costs will all be provided by pooling and sharing which is accomplished via virtualization. 7. Conclusion In conclusion, the primary and most important benefit that should result from storage virtualization is the reduced effort needed to manage the storage system. If the storage virtualization scheme / method / system being used does not reduce the administration effort, then the system has failed to achieve the primary objective. 7.1. Areas for further study Methods for measuring the benefits of storage virtualization are needed. How does one measure the complexity of a storage management task? How does one measure the reduction in work required to manage a given set of data? Once such measures are understood, then more objective analysis of such methods is possible. 0-7695-1435-9/02 $17.00 (c) 2002 IEEE 9