Your SlideShare is downloading. ×
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
PASTA-WGF-Final.doc
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

PASTA-WGF-Final.doc

352

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
352
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. CERN Technology Tracking for LHC PASTA 2002 Edition Working Group F Storage Management Solutions M. Ernst, M. Gasthuber, N. Sinanis 16th September 2002
  • 2. STORAGE MANAGEMENT SOLUTIONS Preface The PASTA WG (F) has investigated the area of Storage Management Solutions taking the following points into consideration  Basic Technology: Network Storage (NAS, SAN, Mass Storage Systems)  Technology Trends (projections, limiting factors)  Basic Cost Metrics (cost drivers)  Market Trends (marketplace segmentation, demand, economic factors)  Systems Design Considerations and Directions (scalability, manageability, scenarios, case studies, best practices)  Systems Cost Analysis (TCO considerations, cost drivers, projections Given the scope of this particular WG it is natural that there will be some overlap with primarily WG E (Data Management Technologies) and to little extent also with WG C (Tertiary Mass Storage) and WG B (Secondary Storage). We will however try to stay with our focus on subjects being part of our charge. The report is structured in a way that we will  start describing current trends in industry including the driving factors coming from the scientific and the commercial fields  browse over solutions and/or proposed approaches by industry, including hot discussion points  move on to the most important part of the report which is devoted to related Technology, being the key issue because it's currently highly transient, rapidly evolving due to the exponentially growing commercial market place.  further develop scenarios by mapping requirements and use cases to architectures and technologies including recent developments and adoptions in the area of the various Grid initiatives  finally draw some conclusions on the developments which we think are likely to take place The past few years have seen a meteoric rise in network connected storage solutions in IT environments around the globe and much has been written about the benefits of storage networking - how centralized storage, accessible by hundreds of servers, reduces complexity and management. Software products are now emerging with wonderful claims of efficiencies and sophistication. This efficiency, however, doesn't come automatically when migrating to a certain technology, namely SAN and/or NAS - and let's face it: it's very likely that a migration to a new architecture/technology is going to happen over the course of the LHC program. Before we get to the interesting details of the technology involved we should realize that many cost-of-management metrics exist today, and some of the more common rules of thumb indicate that the cost to manage enterprise storage is six to eight times the original purchase price. So, approaching storage management with a wide aperture is critical for success. Fundamental issues include the organization required for managing storage networks, operational best practices, and finally, management automation. Trends in Industry Several reasons make storage in industry as well as in science fields, e.g. HEP as important as it is today. In industry of any kind we currently see significant financial investment in storage. The size of that investment is being driven by the growth requirement in storing data and is a very large line item in expenditures. The criticality of storage in normal operation warrants having a "storage strategy". A storage strategy is much more than an acquisition strategy. It must encompass all aspects of storing and retrieving data: 2
  • 3. STORAGE MANAGEMENT SOLUTIONS  The connectivity methodology and protocols for computing elements to have access to data  The security and integrity required for corporate data  The performance demands for data access in processing  The capacity scaling requirements  A rolling multi-year plan for adoption of new technologies that have (besides technical) economic benefit Implicit in all the issues with storage are the economics involved. Multiple factors of concern are usually grouped in a Total Cost of Ownership model that includes acquisition cost, administration, support, environment, etc. To examine the economics in a storage strategy, the dynamics of what is happening in the storage discipline must be understood. First is thorough understanding of what is driving the demand for storage and the implications that brings. There are many solutions in the market today with many more being delivered by high-profile vendors. How these solutions address problems and how they will change over time is a factor in developing a strategy. Finally, the hard economics of executing a storage strategy needs to be researched. Strategic storage decisions have long-term repercussions and can have impact on success or failure of companies as well as large-scale projects. Over the past few years, storage has evolved from an ancillary thought into one of the primary IT expenditures. The Internet's emergence and e-commerce have unleashed an information deluge that has flooded corporate enterprises and is requiring a transformation in storage. During the early 90s, large computers enjoyed a renaissance under the name of servers. As successful as large computers have been, it's clear that an even larger market has burst on the scene in the last few years - enterprise storage. According to recent surveys performed by IDC, storage is rising to 75% of enterprise hardware by 2003, making it increasingly a separate purchasing decision from servers. In the last 5 years, the amount of the world's information being stored electronically has risen from 1% to 10%. This is a dramatic change requiring significantly more storage. The implications of digitizing information are that a quantum leap in storage requirements is needed. Just moving to the next 10 % level will drive an unprecedented demand for storage. Figure 1 shows the growth of storage based on the disk drive shipments with a projection for the future. PB 3.000 2.500 2.000 1.500 1.000 500 2000 2001 2002 2003 2004 Figure 1: Storage growth estimates – worldwide disk systems shipments (source: Evaluator Group) 3
  • 4. STORAGE MANAGEMENT SOLUTIONS The number one issue with storage is the cost of administration. With the growth in storage, the cost of storage administration is also scaling linearly. The reason for high administrative costs is that there are almost no consistent management tools. Indeed, storage is typically administered on an individual device basis where different tools are used for each type of device. The lack of mature, common sets of tools for the management of a heterogeneous environment is a significant problem for storage professionals today. Part of laying the foundation or background to the storage problem is the understanding how storage is connected to the servers that run the applications. Storage Architectures Direct attached storage Focussing on secondary storage first the most common method has been a direct attached storage model (DAS). In this model, the storage is connected through an interface such as SCSI or Fibre Channel, in the open systems world typically running the SCSI command set with the intermix of the two leading to significant confusion. As we trust the reader is familiar with the characteristics of the DAS model we will not further elaborate on this. However, we won't leave it without mentioning one of the fundamental conclusions. The distributed nature of DAS poses significant management challenges. Managing storage attached to a few servers is certainly possible, but when the number of servers reaches into hundreds or thousands, management becomes virtually impossible because there is basically no central point. This imposes not only additional direct costs such as human resources cost but also potentially significant indirect costs associated with downtime and data loss. The rise of networked storage Over the past few years, changes in application needs, capabilities of new architectures, and requirements of data have revealed the limitations of DAS. This has precipitated the development of more sophisticated storage approaches: Storage area networks (SAN) and network-attached storage (NAS). At present, these two approaches are complementary, each with its own strengths. SANs are highly scalable and available, while NAS is less expensive and easier to implement. The debate between these two architectures has taken on the air of a religious war over the last few years: adherents of one point to the weaknesses of the other and declare their own superior. It is now becoming clear, however, that the two architectures are beginning to merge. Network attached storage (NAS) NAS devices are a combination of server and typically a large amount of RAID storage. Together, these form a storage appliance designed to be easy to manage, fast, and interoperable across operating systems. The key element of the NAS device is a "stripped down" server optimized for high-throughput storage operations; this server is also known as a "filer". The filer contains the intelligence of the device and is the location where the file system resides. This differs from the SAN approach where the file system resides on each server. Because the file system resides on the NAS device, servers issue file-level commands to NAS devices. This has several significant advantages: 4
  • 5. STORAGE MANAGEMENT SOLUTIONS  NAS devices are inherently interoperable with a wide range of server operating systems. These use two main types of network file systems: the common interface file system (CIFS) in the MS world, and network file systems (NFS) in UNIX. NAS appliances usually support both.  File sharing on NAS devices can be achieved easily. Because the NAS device owns the file system, it knows which data belong together logically. This allows the filer to perform the operations required for effective file sharing, such as notifications of changed data, even if the sharing requests come from several servers.  Because the file system is localized in one device, its setup and maintenance is much simpler than with either direct-attached storage or storage area networks. Despite these benefits of file level access, it can also result in drawbacks for some applications. This is particularly true for large database applications, in which the overhead involved in file level access can result in significant performance degradation. As we will explain below, the advent of VI [1] may help to overcome these performance challenges. All NAS devices are attached to the servers they support through the existing local area network (LAN), which is an Ethernet-based TCP/IP network. One could argue that this inherits a bottleneck but recent developments and future perspectives regarding performance and reliability have shown that the impact of this sharing of bandwidth will become less of an issue. A significant remaining hurdle for NAS devices stems from the very nature of TCP/IP, which is designed to deal with network congestion by dropping transmission packets. If this should occur in a storage environment and retransmission do not reestablish the original sequence quickly enough, I/O operations can be delayed sufficiently for some databases to report corrupted data. In addition, TCP/IP places a significant burden on both ends of the connection, which degrades server CPU performance and can lead to performance bottlenecks in the NAS filer. These drawbacks are beginning to disappear with the emergence of intelligent GigE NIC adapters and the Direct Access File System (DAFS) [2], which will be explained below. Storage Area Networks (SAN) A SAN is a dedicated, high-performance network specifically designed for multiple servers to communicate with large storage subsystems. In a SAN installation, typically a high-speed Fibre Channel (though we want to emphasize here that SANs are not limited to the FC interconnect) network is built to connect a group of storage subsystems to a group of servers. Unlike NAS, however, there is no filer, and the file system is located on the servers that run the application. In order to avoid conflicting operations on a particular storage resource, each storage subsystem is partitioned so that a range of hard disk drives can be logically assigned to a specific server. Thus today’s reality is, while SANs allow storage to be seen by all servers on the network, they often resemble merely large collections of centrally located disks. The ultimate solution to allowing true data sharing on a block-level would be through global file systems, which turn out to be very complicated with only few products having appeared on the market today (e.g. SANergy [15], GFS [16], Storage Tank [17], CentraVision [18]) Besides very good performance the most important benefits of SANs are reliability and scalability. SANs achieve very high reliability because of the redundancies built into the storage subsystems and the Fibre Channel network. Typical concerns include: high implementation costs, the immaturity of SAN technology, and the lack of SAN standards. Shift of the storage market towards networked storage Because of the clear benefits of networked storage, implementation of these architectures is growing at a rapid rate. According to recent studies performed by Merill Lynch, NAS has currently the highest penetration, especially in the domain of dot.coms. They predict, however, that over the foreseeable future, most customers expect to implement both architectures almost evenly - with a dramatic shift towards SANs. 5
  • 6. STORAGE MANAGEMENT SOLUTIONS 100% = $34 bn $48 bn 44 DAS 75 35 SAN 19 NAS 21 6 2000 2003 Figure 2: Share of storage software and hardware revenues (source: IDC) The convergence of SAN and NAS In recent years, the investment world was divided by the debate about the merits of SAN over NAS and vice versa. Recently, however, the boundaries between them have begun to blur. Before long, this could lead to a convergence of the two architectures that will allow data centers to evolve toward a seamless storage network, with multiple transport and various ways to access data. In large part, this convergence is being fueled by the evolution of networked storage architectures themselves, in particular NAS, and by the rapid development of various connectivity protocols and I/O technologies. The convergence of SAN and NAS may come to pass in several ways  The rise of DAFS. DAFS has the potential to raise the performance level of NAS because it eliminates the overhead of TCP/IP stack processing. This will further blur the distinction between the two architectures, making them merely ways of accessing data at the block and the file level.  Creation of a Fibre Channel SAN in the back of a NAS filer. Conceptually this makes the filer equivalent to a "server" attached to a "storage area network", providing a centralized file system. It is entirely conceivable that such a "back-end SAN" could become connected to other servers, which would use part of the available capacity for block-level storage access.  Development of SANs over IP. This would allow SANs and NAS devices to coexist on the same GigE network. Regardless of whether this network is a shared LAN or (most likely) a storage-dedicated network, it would allow servers to turn to NAS devices for file level access and to SAN subsystems for block-level access, depending on the performance needs of the specific application and the desired tradeoff between price and performance. Object Storage Devices Besides the three basic storage architectures explained above there is one evolving that aims at taking the strengths of each of the DAS/SAN/NAS architectures and incorporates them into a single framework. Motivation for work in this area arises from the fact that despite all the advanced electronics, processors, buffer caches disk drives still perform only basic functions, namely read and write operations. All the block management is left to the file system external to the storage device. Things such as content, structure, relationships, QoS, etc. are all pieces 6
  • 7. STORAGE MANAGEMENT SOLUTIONS of information that are external to the disk drive itself. The basic premise of OSD is that the storage device can be a far more useful device if it had more information about the data it manages and was able to act on it. The basic Object Storage Device (OSD) [5] architecture and its scalability is shown in the following diagram. Application Meta Operation File Manager Object Manager LAN/SAN Data Transfer Security OSD Intelligence Storage Device Figure 3: The OSD architecture allows unlimited scaling There are many similarities between OSD and DAS/NAS/SAN. These include the use of FC, Ethernet, TCP/IP and SCSI as transport protocol. Following logical components are, however, new compared to the existing architectures  Object Manager  OSD Intelligence  File Manager The Object Manager is used as a global resource to find the location of objects, mitigate secure access to these objects, and to assist in basic OSD management functions. This can be a single OSD that assumes these functions or it can be a completely separate, fully redundant cluster of systems. An Object Management Cluster would allow for scalability in the number of objects that can be managed as well as the access performance of the Object Manager itself. It is important to note that the Object Manager does not contain any user data or object meta- data nor does any of the data during data transfer operation move through the Object Manager. The OSD intelligence is the firmware that runs on the storage device. It is responsible for interpreting the OSD methods (create, delete, read, write object, and get/set attributes. The OSD intelligence facilitates the communication of the OSD to the Object Manager, mainly for managing data processing and transfers between itself and the File Manager on the client requesting the data transfer. Since the OSD now has the intelligence to perform basic data management functions (e.g. space allocation, free space management, etc.) those functions can be moved from the file 7
  • 8. STORAGE MANAGEMENT SOLUTIONS SYSTEM manager to the OSD. The file SYSTEM manager simply becomes a File Manager: an abstraction layer between the user application and the OSD. From the user application's point of view it does not know where the data is stored nor should it care. It does have certain data requirements (storage management, security, reliability, availability, performance etc.) that must be met and OSD provides a mechanism to specify and meet these requirements far more effectively than DAS/NAS/SAN. As mentioned in the beginning, current estimates show that the cost of managing storage resources is more than five times the cost of the actual hardware over the operational life of the storage subsystem. This is greatly independent of the type of storage (i.e. DAS/NAS/SAN). Given the tremendous growth in storage systems, storage resource management has been identified as the single most important problem to address in the coming decade. The DAS and SAN architectures rely on external resource management that is not always entirely effective and is in no way any kind of standard. The NAS model has some management built into it but it too suffers from a lack of management standards, in particular in a heterogeneous multi-box environment. The OSD management model relies on self-managed, policy driven storage devices that can be centrally managed and locally administered. What this means is that the high-level management functions can come from a central location and the execution of the management functions (i.e. backup, restore, mirror, etc.) can be carried out locally by each of the OSDs and on an OSD peer-to-peer basis (i.e. a disk OSD is backing itself up to a tape library OSD). The point here is that centralized management of storage resources (device, space performance, etc.) with distributed administrative capabilities (i.e. the ability to carry out management functions locally) is essential to future storage architectures. Some conclusions on data sharing. In the OSD model, the protocol is system agnostic therefore heterogeneous by nature. Since the OSD is the storage device and the underlying protocol is supported on either a SAN (SCSI) or a LAN (iSCSI), device sharing and therefore data sharing becomes simple. The objects stored on an OSD are available to any system that has permission to access them. It is interpretation of the object, which is outside of the scope of the OSD, that needs to be common among the systems that becomes important for effective data sharing. With respect to management overhead it is important to note that in the OSD model, the management of atomic storage units, e.g. blocks, would be moved to the device itself. The file system manager needs only to manage objects. Hence it does not have to know much about the internal characteristics of the storage device making it capable to managing a device pretty much independent from the underlying storage hardware. Lustre - a scalable "object oriented" cluster file system Now that there is a storage architecture capable of handling objects a variety of storage management functions, networking, locking and mass storage targets all aiming to support scalable cluster file systems for small to very large clusters is required. A development providing a novel modular storage framework is called Lustre (embodies "Linux" and "Cluster"). Lustre provides a clustered file system which combines features from scalable distributed file systems such as AFS, Coda, InterMezzo and Locus CFS, with ideas derived from traditional shared storage cluster file systems like Zebra, Berkeley XFS which evolved to various others with GFS being the most prominent. Lustre clients run the Lustre file system and interact with Object Storage Targets for file data I/O and with MetaData Servers for namespace operations. Fundamental in Lustre's design is that the Object Storage Targets (e.g. Object Storage 8
  • 9. STORAGE MANAGEMENT SOLUTIONS Devices) perform the block allocation for data objects, leading to distributed and scalable allocation of metadata. Visit [3] for details. Connectivity Protocols Many developments in connectivity protocols are taking place at present. There has been a debate about the relative merits of FC and TCP/IP for storage connectivity, as TCP/IP is quickly becoming a viable option. Although these trends have been receiving attention because they affect the product strategies of the vendors, they are unlikely to affect the structure of the industry in a profound way. As with storage architectures, different storage interconnects will likely coexist for the foreseeable future. At present, the three storage architectures are largely aligned with three corresponding connectivity protocols  DAS connects to its host system primarily with SCSI or IDE (low cost server systems). In some high performance systems, FC is used on the physical as well as on the transport layer, utilizing the SCSI protocol.  SANs exclusively use FC in mature products with iSCSI [5] based products starting to appear  NAS uses Ethernet with TCP/IP However, we expect these alignments to blur with the adoption of IP for SANs and the emergence of new connectivity protocols such as VI, IB and WARP for RDMA over IP. Fibre Channel In the early 1990s, Fibre Channel (FC) was developed to optimize server-to-storage communication. FC mandates a reliable delivery of data, including the sequence of data frames. Unlike Ethernet, which deals with network congestion by dropping packets, FC does not have to anticipate failed transmissions. This limits network congestion and reduces processing overhead required to reassemble the sequence of data and request retransmission of missing packets. FC has the ability to simultaneously carry multiple existing upper-layer transport protocols, including network protocols such as IP, ATM, and 802.2, as well as channel protocols such as SCSI, IPI, and HIPPI - making FC the "universal interface". FC frames have a maximum size of 2112 bytes and can be assembled in sequences of up to 65536 frames. Such a sequence requires the same processing and network overhead as a single IP packet with a maximum size of 1514 bytes - which makes FC extremely efficient. FC is designed into servers and storage subsystems at the host bus adapter (HBA) level. This makes migration to a fibre Channel network more difficult as the base of installed hosts needs to be converted as well. FC was designed as a backward-compatible upgrade to SCSI. It provides increased speed and yet is compatible with existing SCSI hardware, helping it gain acceptance in the storage space. FC equipment now ships at 2 Gbps, and 10 Gbps hardware is under development. The physical transport of FC can range from copper for short distances to single mode fiber for distances up to 10 km. With more than 190 members and affiliates worldwide the Fibre Channel Industry Association (FCIA) [4], an international organization of manufacturers, system vendors & integrators, developers and end-users, is committed to delivering a broad base of FC infrastructure to support a wide array of industry applications within the mass storage and IT-based arenas. According to Gartner, the Fibre Channel industry experienced slower growth than in previous years. Nevertheless the increase of 13% in 2001 is considered a strong performance in a difficult year for storage products and companies. In addition Fibre Channel products were 9
  • 10. STORAGE MANAGEMENT SOLUTIONS undergoing a transition from 1-Gbps to 2-Gps that was not completed in 2001, particularly in the area of core products. Worldwide FC SAN components revenue totaled $1.46 bn in 2001, according to Dataquest Inc. Gartner predicts an a compound annual growth rate (CAGR) for Fibre Channel revenue at 36 % from now to 2006. Gigabit Ethernet and TCP/IP for storage traffic Gigabit Ethernet (GigE) is an extension of the existing IEEE 802.3 Ethernet standard. It originally targeted 1 Gbps; 2 Gbps is beginning to ship and the latest development efforts are for 10 Gbps. Since the IEEE’s 802.3 Ethernet standards group has approved the final draft of the 10 GigE standard in mid-June 2002, vendors are expected to deliver non-proprietary products shortly. With $40k - $100k per port cost will remain high for quite some time, limiting usage to mostly carrier technology.  GigE retains the "unreliable network" assumption that Ethernet was built upon. This means that there are additional software controls to handle network congestion, which can result in packets arriving out of sequence or not at all. However, while increasing the reliability of IP networks this extra software imposes a heavy processing burden on the sending and receiving CPUs. The answer to this is to put the protocol stack in hardware. Some companies are currently shipping products that put the IP stack into NICs, which improve host utilization, boost throughput and decrease latency. It seems reasonable to believe that the vast demands of the IP marketplace should spur such development, greatly in contrast to reduced, possibly diminishing, demand in the marketplace for an equivalent chip for FC offload on host bus adapters. It will not be easy to bring TCP/IP offload chips to market, but with the price being so big, some of the best minds in the world of storage networking are being applied to the problems of servers being overloaded with TCP/IP stack processing. The solution will be imperative for aggregated message traffic on 10 GigE, as well as very desirable for either message or storage traffic on GigE. Importantly for storage networking, once the TCP/IP offload chip is designed, adding iSCSI offload should be fairly easy, according to designers' estimates. Among a handful providers Intel and Alacritech [14] have presented intelligent storage adapters with TCP Offload Engines (TOE, priced at $1k)) to the public and have shown in a recent test that a server with their accelerator card hooked to a Nishan IP storage switch and connected to a Hitachi Freedom storage array across a single GigE network can sustain iSCSI transfer rates in excess of 219 MB/s with less than 8% CPU utilization, which is very competitive both in terms of throughput and CPU utilization with achievable FC rates.  However, as serious performance figures derived from real use cases are still missing and other promising network technology (e.g. InfinBand) is progressing, it remains to be seen how successful this direction will be. To be sure, TOE technology seems to offer a lot for the money, but wrong turns, even if initially inexpensive, can turn out to be costly in the long run. By later this year there should be more early adopters with experience to review and more vendors delivering products – surely a better environment for preparing informed decisions.  The maximum size of an Ethernet packet is only 1514 bytes. FC is not much larger at 2112 bytes. However, the FC standard allows sequences with 65536 frames, thus communicating up to 132 MB in a single sequence. Therefore the networking overhead for Ethernet is huge, if no intelligent host adapter is used. The speed of the LAN based on Ethernet has now caught up to the speed of the SAN based on FC, which will only retain a small edge because of its lower frame overhead (Figure 4). Thus, GigE will become a serious contender for the transport layer for storage networks. TCP/Ethernet based communications solutions gain from the much bigger market compared to FC, which helps reducing component costs significantly. Ethernet switches evolved to high performance systems, having the right characteristics to drive storage access. Another important factor driving the economics is the fact that there is an enormous amount of expertise available in the area of Ethernet and TCP/IP. 10
  • 11. STORAGE MANAGEMENT SOLUTIONS So, all this is expected to speed up the convergence of SANs and NAS as explained earlier. Mbps Fibre Channel 1.000 Ethernet 100 10 1996 1997 1998 1999 2000 2001 2002 2003 Figure 4: Effective maximum data transfer speed (FC: long sequences, Ethernet: max. header size) Similarly, a number of initiatives currently underway are addressing the challenge of directing block-level storage traffic over an IP network. Referred to as "SCSI over IP", "Storage over IP", or "iSCSI" [5] (an open protocol backed by the Internet Engineering Task Force (IETF) [6] and Storage Networking Industry Association (SNIA) [5]), these initiatives are beginning to result in the first commercial products. These come in a variety of approaches, such as translation from FC to Ethernet and back, and native implementations that map SCSI commands onto TCP/IP. InfiniBand InfiniBand (IB) is an architecture and specification for data flow between processors and I/O devices that promises greater bandwidth and almost unlimited expandability in tomorrow's computer systems. InfiniBand creates a single, high-speed link extending out of the server into a switch, which then can communicate with other IB-connected servers and storage peripherals. While IB can exist inside the server, it will still not completely replace other local interconnects; in fact analysts believe IB is complementary to next generation internal buses, such as 3GIO (announced by Intel in mid-2001) and HyperTransport (announced by AMD in February 2001). Rather, IB’s primary environment is outside of the server, where I/O becomes more flexible and efficient, enhancing server scalability and management. Offering throughput of up to 2.5 Gbps (with trunking up to 30 Gbps), the architecture also promises increased reliability, better sharing of data between clustered processors, and built- in security. InfiniBand is the result of merging two competing designs, Future I/O, developed by Compaq, IBM, and Hewlett-Packard, with Next Generation I/O, developed by Intel, Microsoft, and Sun Microsystems. For a short time before the group came up with a new name, InfiniBand was called System I/O. The serial bus can carry multiple channels of data at the same time in a multiplexing signal. InfiniBand also supports multiple memory areas, each of which can be addressed by both processors and storage devices. Unlike the present I/O subsystem in a computer, InfiniBand appears like a full-fledged network. The InfiniBand Trade Association [7] describes the new bus as an I/O network and views the bus itself as a switch since control information will determine the route a given 11
  • 12. STORAGE MANAGEMENT SOLUTIONS message follows in getting to its destination address. In fact, InfiniBand uses Internet Protocol Version 6 (IPv6) with its 128-bit address, allowing to support an almost unlimited amount of devices. Data exchanged in an InfiniBand based network are called messages. A message can be a remote direct memory access (RDMA) read or write operation, a channel send or receive message, a transaction-based operation (that can be reversed), or a multicast transmission. Like the channel model in the mainframe world, information exchange is performed via channel adapters (host channel and target channel adapter for the peripheral device). Characteristics include in-built security and multiple QoS levels. InfiniBand is being investigated for use both within large disk arrays, and externally for connectivity. First InfiniBand systems are on the scene already - mostly in very high-end, four- and eight-way Intel systems. Many will follow and according to analysts within four years most of the systems out there will be InifiniBand-based. There are those who argue that this will become another ill-fated Intel mission, and others wonder why Intel doesn't use 10 GigE, but the analysts believe InfiniBand will flourish and in a big way. What will be the impact for legacy Ethernet and FC storage installations? Nothing. Intel and other vendors will build bridges to connect legacy infrastructure to an InfiniBand network. Most major storage vendors are looking at the technology, preparing to jump on the bandwagon. Among future products we will potentially see native InfiniBand disk arrays, which is easy to do, because vendors can just put IB front ends on their array controllers. Doubtful on the other hand is, whether there will be native IB disk drives. Conclusion on IB: If IB is successful it will drive costs way down, so it could well become the bus of choice, or the storage network in the future. The Virtual Interface architecture The Virtual Interface architecture (VI) is an I/O technology that can run over a variety of network protocols, including Ethernet, FC and IB. VI was developed to bypass the operating system and the TCP/IP stack for server-to server communications. It creates reserved memory buffers for each application, which can be read from and written to directly from the NIC with only minimal involvement from the server OS. VI Inter-node communication TCP/IP Inter-node communication Application Buffer Buffer Application Application Buffer Buffer Application VI VI TCP Buffer Buffer TCP Host Adapter Host Adapter IP IP NIC Driver NIC Driver Server OS Involvement NIC NIC Figure 5: VI bypasses most of the server OS to boost performance 12
  • 13. STORAGE MANAGEMENT SOLUTIONS VI is important for storage  Because VI bypasses the server OS, storage traffic can travel with minimal server involvement, allowing fast/low latency access to storage and accelerating storage-to- storage traffic. This could contribute to making storage devices the primary performance bottleneck in the system.  VI could also become a high-performance conduit for communications between servers and a NAS filer. The direct access file system (DAFS) initiative is just an example. This could improve the performance characteristics of NAS dramatically and thus contribute to the convergence of NAS and SANs. Adoption of VI is still in the early stages, with the first VI products beginning to ship in the form of intelligent HBAs. Since VI does not require substantial new infrastructure investments and is compatible with IB, it could enjoy speedy and wide adoption. However, the VI architecture is optimized for communications within a controlled high-speed, low latency network; it is not suitable for general WAN communications via the Internet. The Direct Access File System The Direct File System (DAFS) protocol is a new file access method designed to provide application servers with high performance low latency access to shared storage pools over Fibre Channel, Gigabit Ethernet, InfiniBand and other VI-compliant transports in data center environments. Designed from ground up to take full advantage of these next generation interconnect technologies, DAFS is a lightweight protocol that enables applications to directly access transport resources. Consequently, a DAFS-enabled application can transfer data from its application buffers to the network transport, bypassing the operating system while still preserving file semantics. In addition, since DAFS is designed specifically for data center environments, it provides data integrity and availability features such as high speed locking, graceful recovery and fail-over of clients and servers, fencing and enhanced data recovery. All of this translates into high-performance file I/O, significantly improved CPU utilization, and greatly reduced system overhead due to data copies, user/kernel context switches, thread context switches, interrupts and network protocol processing. DAFS has a fundamental advantage over other file access methods when reading data. By using the remote memory addressing capability of transports like VI and InfiniBand, an application using the API can read a file without requiring any copies on the client side. Using the “direct” DAFS operations, a client’s read or write request causes the DAFS server to issue remote DMA requests back to the client, so data can be transferred to and from a client’s application buffer without any CPU overhead at all on the client side. The DAFS write path is also efficient; to avoid extra data copies on write requests, a traditional local or remote file system must lock down the application’s I/O buffers before each request. A DAFS client allows an application to register its buffer with the NIC ones, which avoids the per-operation registration overhead. Applications can take advantage of these capabilities in several ways. The first is through a user library that implements the DAFS protocol and is loaded as a shared library or a DLL. Alternatively, an application may access a DAFS server transparently through a loadable kernel module. Though the approach using a user library offers the greatest potential for increased I/O performance, the disadvantage is its lack of compatibility with the usual system call interface to the host OS file systems, requiring applications to be modified to take advantage of these capabilities. This approach, therefore, is intended for high-performance applications that are sensitive to either throughput or latency, or applications that can make use of the extended DAFS services made available through the user API. Using a loadable kernel module, in the UNIX world takes the form of a Virtual File System (VFS) while in the Windows world it is called an Installable File System (IFS). It is a peer both 13
  • 14. STORAGE MANAGEMENT SOLUTIONS to the other remote/redirector file system implementations (e.g. NFS and CIFS), and to the local file system (e.g. ufs, NTFS). Under this approach, when an application uses the standard kernel interface to read from or write to a file, the VFS/IFS layer passes the request on to the individual file system responsible for that particular file. If the file is on a DAFS file system, then the kernel passes the I/O request to the DAFS VFS/IFS, which issues DAFS requests to a server. Like other remote file systems, each call to the VFS/IFS layer may map to one or more over-the-wire protocol requests to a remote file server. The advantage of this type of DAFS client implementation is that applications can use it transparently, just like other remote file system implementations (NFS, AFS, DFS for UNIX, CIFS for WNT, W2k). However, performance is limited by kernel transitions involved in any operation to a VFS/IFS, but the kernel still uses remote DMA and other VI capabilities, so it consumes significantly fewer CPU cycles compared to other Finally, we took note that the current version of DAFS borrows heavily from the IETF NFS v4 specification to provide a full set of file management operations. Implications for the industry (Storage users and technology suppliers) Storage user perspective As analysts have indicated, administration/manageability and interoperability, not truly independent of each other, are of primary concern to the storage user community. The migration of intelligence to the network or the emergence of a "Storage OS" is the most likely scenario for driving true interoperability. Storage users would like to see a virtualized world in which the physical properties of devices and fabrics are abstracted, therefore allowing for simple data management, migration, and eventually plug&play hardware compatibility. According to our understanding, the bulk of the intelligence in servers, HBAs, switches, and storage subsystems must migrate to a Storage OS or to the network itself in order to reap the full potential of seamless enterprise networks - and this development has already started and will hopefully see good progress over the next years - despite the economic downturn industry is facing since a couple of years. However, whether, and how quickly, OS and protocol standards for networked storage develop will define where value is captured in the industry. Hardware attackers are mounting efforts to create open, interoperable platforms, while incumbents are racing to satisfy customers with proprietary solutions. Independent software vendors and service providers are attempting to establish a Storage OS to make their value independent of the outcome. A word on cost. Total cost of storage is not likely to grow significantly as a share of IT spending in the near term, because innovations in architectures, software, connectivity, hardware technology are largely aimed at allowing storage users to take full advantage of density increases and reducing system administration costs. However, as we have stated at the beginning, investment in storage products is already dominating the IT spending profile at or close to 75%. Storage industry perspective Storage is responsive to macroeconomic conditions, and its growth has slowed as the overall economy slowed since the beginning of 2001. Coupled with the shift to networked storage will put (margin) pressure on (at least) parts of the storage industry. And yet analysts predict the shift to networked storage is likely to create significant value, and a growing number of players are competing for it. The principal opportunities for storage vendors to differentiate themselves today are in ease of management, availability, and random I/O throughput. Actually the market is helping HEP in a sense that many customers’ storage performance 14
  • 15. STORAGE MANAGEMENT SOLUTIONS needs are evolving from a focus on reliability and scalability to a more sophisticated tradeoff between performance and cost for different types of applications. The LHC Perspective In LHC computing, all resources are distributed between the individual workstations and other resources of the organizations, their regional centers and participants belonging to a virtual organization (e.g. experiment). Hence, the computing resources, application programs and data are shared by many sites in the framework of that community, and the data will be replicated and scattered all around the network its members are connected to. The data storage consists of regional storage systems, whose architectural details, transfer capabilities and storage capacities vary widely. The vast amount of computational effort required to perform physics event analysis in the context of the LHC experiments needs to combine software and hardware elements in a way that users do not have to know nor care where and how data is shared or applications run. This premise, which includes a vast amount of data that can be processed independent of each other, maps well to the proposed Computing Grid architecture [9]. With respect to the Grid architecture this report is focused on storage management solutions relevant to the Fabric layer with very limited outreach to the Connectivity layer. The current discussions about the persistency model and respective technology within the scope of the LCG project leads to the following conclusion. It is likely that a homegrown persistency mechanism will be used which will be based on the functionality Root provides today, or equivalent. This direction will be based on a POSIX-style interface which is suitable for data access at any granularity level of the inventory within the storage systems. Many commercial products today (and in the future) are and will be based on this interface (i.e. Oracle 9i, Objectivity etc.). By fully supporting a fast, reliable POSIX interface, the storage system will match all types of applications in use today and currently under development. A few storage systems and applications also support different interfaces in order to achieve higher performance, reliability and/or manageability, but these interfaces are by no means standardized nor are there any activities in this direction. In the case of homegrown persistency systems, e.g. Root, it will be easily possible to adopt new storage access interfaces and profit from related improvements. In case of the management interface the situation looks different. Today most vendors support management APIs for their own products only. A few (smaller) vendors are starting to support storage management software in order to manage heterogeneous storage systems. Related 15
  • 16. STORAGE MANAGEMENT SOLUTIONS features are typically included in storage virtualization products. However, there is no standard interface in place today, although a few candidates exist (e.g. Jiro [10] (formerly called StoreX/Sun) and others). The importance of the management issue also depends on the storage system itself (i.e. today’s Cache managers in HEP, e.g. dCache [19] are helping to ease the pain by providing an appropriate abstraction level by decoupling the application from physical disks, IO paths, and, importantly enough, the fault handling). Tape access and management For many HEP applications, datasets are so large that storage on tape is an economic necessity. Such applications are typically built on top of a mass storage system, which controls data movement between tape storage and a disk farm that serves both as a staging pool and as a cache. Tape handling has a long history in HEP. Hence, an enormous amount of expertise and experience in this area exists. Today the HEP community, especially at the major accelerator labs, knows exactly how to build and operate high performance and reliable tape systems (e.g. CASTOR [11], Enstore [12], JASMine [13] etc.). As a matter of fact, today’s hard- and software components can satisfy the LHC requirements in the area of Central Data Recording (CDR) and Raw-Data reconstruction. Existing mass storage management systems are able to utilize underlying resources (primarily tape drives and LAN elements) at nearly 100%. And there are new components in the area of storage devices (e.g. STK 9940B, 200 GB/cartridge, 30 MB/s) and LAN networks (e.g. 10GigE, InfiniBand) already available or on the horizon. These components are capable of handling streams up to an aggregate bandwidth of several hundred MB/s. Regarding mass storage management systems, the market for commercial systems has not changed very much since the PASTA II report was published in 1999 and many of the findings in this area are still valid. Only a few HSM systems in the low and mid range market and just HPSS [20] for the very demanding high-end market, especially the supercomputing sector, are remaining candidates. According to our judgment, all the experience gathered by experts in the HEP community over the past years by looking into these systems, is still valid. No product has appeared at a mature or even just promising level since PASTA II, which looks like having the characteristics to satisfy the LHC requirements and would also be affordable from the economic point of view. The mid range products (AMASS [18], DMF [21], LSC [22] etc.) are targeting a market primarily seeking for unlimited capacity with random access capabilities – or simply the virtual unlimited disk, making automated tape storage look like a big disk. These systems typically have only limited scaling capabilities because of a simplified architecture and require the data flow through a single system. HPSS on the other hand, has been and is still in use at a few laboratories with similar experience. The basic architecture is scalable, but the implementation strategy chosen by the designers (i.e. DCE) has severe drawbacks and is definitely not state of the art. It has been announced by the HPSS collaboration that a major re-design will take place in order to replace these components. To conclude the paragraph on tape storage management, we believe that HEP has - respectively must have – sufficient expertise and resources to drive the required development in the area of tertiary storage management (examples given above). Provided there is good cooperation between all parties involved, clear benefits of holding the strings include full influence on ‘mission critical’ items, e.g. • Media Format • Usability of new technology (robot, drive) • Applicability of the software by Tier 1, 2, 3 centers (i.e. licensing issues, required hardware) • Better risk determination Based on the experience of major HEP sites running large tape systems, we can estimate costs for hard- and software bought from vendors and costs for development and operation (personnel). For a 1PB automated storage system (robot, 10 drives, 5000 tapes) the investment on hard- and software is in the order of $ 900.000. Drive vendors expect 16
  • 17. STORAGE MANAGEMENT SOLUTIONS an increase in capacity up to 1 TB/cartridge within the next 2 to 4 years, which will bring the basic storage costs again down. The downside to this development is that new generation drive technologies are having a negative impact on random access (high rate of mount/dismount), and are more or less only suitable for backup/archive kind of applications (sequential access profile). The costs associated with development and operation is usually in the order of 1 to 2 highly skilled FTEs with 1 to 2 technicians in addition. Development will (has to) continue over time (sometimes with low intensity) due to a continuous demand to incorporate new technologies and changing requirements (which usually happens). Asking for the risk involved, the major one arises from losing the expert. This does not only apply to homegrown solutions, it applies to using commercial solutions as well. Documentation (up to date with the current version) helps in this case. Provided sufficient and up to date documentation exists, no major technical risks are associated with homegrown solutions, whereas commercial ones imply the risk getting stuck because of discontinuation by the vendor. Scenarios The discussion in this section will outline two basic scenarios. The first will be based on the current approach selected by most of the HEP laboratories. The second is based on the imagination that we will be able to deploy very big scale cluster file systems with several hundreds of servers and several thousand of compute nodes (clients) on a dedicated storage network. Why scenarios? The idea is that this should help understanding the following issues: • Understanding and check of the complete architecture o Detect missing parts o Check for scaling issues • Identify the ‘enabling technologies’ and risks o For the critical components o Priorities and goals for further investigations Scenario based on current HEP trends The current architecture is built from components primarily based on the paradigm ‘commodity, commodity, commodity’, leading to the following hard- and software building blocks: • PC based motherboards and CPUs (Intel based) • Fast and Gigabit Ethernet • IP based network protocols (TCP/UDP) • IDE disk subsystems (with RAID 0,1,5,10,50 configurations) • Linux as the operating system The same components are used to build tape- and disk servers, servers for bulk data transfers and CPU servers. The main driving force behind this is the cost/performance ratio looking primarily focusing on acquisition costs. A common exception to this rule is the selection of high-end tape subsystems (robotics and drives), primarily driven by the bad experience made in the past using commodity style tape systems and the acceptance of the ‘mission critical’ nature of this service. Figure 6 shows a typical composition in use today. Apart from many low level technical issues, there are questions left, which haven’t been answered today and need to be resolved in order to scale the approach to meet the LHC requirements. • Operation costs. In order to scale one need to make sure that the operation costs will be lower per unit than those we observe today. Key factors include better integration into fabric management frameworks, policy based configurations and careful selection of commodity components. 17
  • 18. STORAGE MANAGEMENT SOLUTIONS • Performance and capacity scaling expected for LHC computing. We expect no major scaling issues for the tertiary storage (notably tape). This area is well understood and scales already today to almost the required magnitude and we expect to being able to linearly grow with respect to number of drives, robots and mover nodes. For the disk based storage environment (cache) the required scale is larger than for tape and requires the following areas to be examined: o Scaling of the namespace management service. Although there are good indications that today’s name service implementations will scale to several thousand of clients, the load depends highly on the access profile of the individual user and the way the application frameworks (i.e. Root, Gaudi etc.) are designed. In addition the load from GRID components (e.g. replication catalogue) induced to the local storage fabric is not exactly known. o Network latency and bandwidth being important for the application accessing the disk based data in a random fashion (through the POSIX style interface) over the network. The two choices here – latency reduction and latency masking are based on different approaches. The first will require an additional network technology to be used (e.g. InfiniBand) and thus will have cost implications and probably use of a non-commodity technology. The second alternative utilizes asynchronous operations. Fortunately the use of application frameworks (like Root, Gaudi etc.) hiding the low level IO operations, allows passing hints to the disk cache system for effective pre-fetch operations of data on the client and server side. In order to efficiently tackle this problem, a close cooperation between storage system architects with the application framework experts might help significantly. o Scaling of the disk based cache management software. This system has to manage all of the stored objects (files) and storage devices and thus becomes a critical component. Current implementations show good scaling behavior for several hundred clients and a few hundreds server nodes. Going to several thousand clients and nearly a thousand server nodes is not straightforward and needs further investigations. One common approach to this problem (also used in large scale cluster file system implementations currently under development) is to partition the storage space based on directories or other vehicles to create large set of stored objects. o Performance of the single storage device, network and server node. For ‘hot spot’ cases (multiple clients accessing the same object) carefully designed replication mechanism are needed to balance the resulting load over different server nodes. Besides the problems mentioned above, the architecture looks promising and has a good potential to be in place at LHC startup. A general question (besides the storage system specific ones) is the future direction of the industry regarding commodity technology, a question, which will be addressed by other Pasta III working groups. Risks are somewhat evenly distributed among the listed areas above. However, the experience from the past lets us assume that the problem of the operation and administration costs will be the hardest to solve. It will demand a collaborative effort among many different skilled computing people and a high degree of accepted interfaces and standards. 18
  • 19. STORAGE MANAGEMENT SOLUTIONS Figure 6: The classic approach Scenario based on large-scale cluster file systems Imagine that cluster file systems like Storage Tank or Lustre show within the next few years that they really scale to the expected levels, than we can think of a much simpler architecture in which the storage system is built out of: • Low latency, high bandwidth dedicated storage networks (e.g. InfiniBand) • Remote DMA (RDMA) capable network protocols (e.g. DAFS) • Cluster file systems clients and storage devices directly attached to the network. Also the tertiary storage systems are integral part of the cluster file system. The cache management service will be connected through a DMAPI style interface with the ability to store arbitrary metadata related to a stored object. • Probably new types of storage devices – e.g. Object Storage Device (OSD). A storage device in this sense could also be an IDE/Linux based server machine, in this case not being accessed as a block device. These cluster file systems directly address the cost of management aspect because they are designed for very large-scale systems. The situation looks promising because there is sufficient demand also outside the HEP community for such systems. Figure II gives a simplified overview of the named components. Compared to the previous scenario, this approach gives a native full functional (read, write) POSIX style file system service with increased performance. The cost for the improvement is an additional network infrastructure for the storage traffic and a new network protocol. The current R&D in the area of cluster file systems should be closely followed and if relevant results arrive in time, these systems can become a serious alternative. Although we touch on only two scenarios, it should be pointed out here that also combinations of both (i.e. using 19
  • 20. STORAGE MANAGEMENT SOLUTIONS RDMA capable IP protocols) are suitable. There are key technologies which can’t be rated today, but the estimated time to complete make them candidates and thus further detailed investigations are required to judge their usability. Figure 7: Cluster File System based architecture (simplified) Summary of Findings and Conclusions  The number one issue with storage is the Cost of Administration. With the growth in storage, the cost of storage administration is also scaling linearly. The reason for high administrative costs is that there are almost no consistent management tools. Indeed, storage is typically administered on an individual device basis where different tools are used for each type of device.  Cost-of-management Metrics exist today, and some of the more common rules of thumb indicate that the cost to manage enterprise storage is six to eight times the original purchase price. So, approaching storage management with a wide aperture is critical for success. Fundamental issues include the organization required for managing storage networks, operational best practices, and finally, management automation.  Storage area networks (SAN) and network-attached storage (NAS) are at present two complementary approaches, each with its own strengths. SANs are highly scalable and available, while NAS is less expensive and easier to implement. The debate between these two architectures has taken on the air of a religious war over the last few years: adherents of one point to the weaknesses of the other and declare their own superior. It is now becoming clear, however, that the two architectures are beginning to merge. 20
  • 21. STORAGE MANAGEMENT SOLUTIONS  Besides the three basic storage architectures, there is the Object Storage Device (OSD) architecture evolving that aims at taking the strengths of each of the DAS/SAN/NAS architectures and incorporates them into a single framework. Motivation for work in this area arises from the fact that, despite all the advanced electronics, processors, buffer caches, disk drives still perform only basic functions, namely read and write operations. All the block management is left to the file system external to the storage device. Things such as content, structure, relationships, QoS, etc. are all pieces of information that are external to the disk drive itself. The basic premise of OSD is that the storage device can be a far more useful device if it had more information about the data it manages and was able to act on it. In conjunction with object storage devices a new generation of scalable, “object oriented” cluster file systems are in development, e.g. Lustre. Fundamental in Lustre’s design is that the Object Storage Targets (i.e. OSDs) perform the block allocation for data objects, leading to distributed and scalable allocation of metadata.  At present, the three Storage Architectures are largely aligned with three corresponding connectivity protocols  DAS connects to its host system primarily with SCSI or IDE (low cost server systems). In some high performance systems, FC is used on the physical as well as on the transport layer, utilizing the SCSI protocol.  SANs exclusively use FC in today’s mature products with iSCSI [5] based products starting to appear  NAS uses Ethernet with TCP/IP However, we expect these alignments to blur with the adoption of IP for SANs and the emergence of new connectivity protocols such as VI, IB and WARP for RDMA over IP. The speed of the LAN based on Ethernet has now caught up to the speed of the SAN based on FC, which will only retain a small edge because of its lower frame overhead (Figure 4). Thus, GigE will become a serious contender for the transport layer for storage networks. TCP/Ethernet based communications solutions gain from the much bigger market compared to FC, which helps reducing component costs significantly. Ethernet switches evolved to high performance systems, having the right characteristics to drive storage access. Another important factor driving the economics is the fact that there is an enormous amount of expertise available in the area of Ethernet and TCP/IP. So, all this is expected to speed up the convergence of SANs and NAS as explained earlier.  Concerning Magnetic Tape based storage we are facing scaling issues and arguable operations overhead today and we expect no change here. The next generation high capacity tape technology will significantly reduce the pure storage cost, however, it will result in increased access times. Caching systems used today are helping to reduce related bottlenecks and well designed interfaces between them and underlying tertiary storage systems might even result in improved overall throughput figures. Interface design details include “hints” and an advanced scheduler, helping to manage scarce resources in order to keep the efficiency of the tape storage system (drives and robot) high. The optimization needs to be done at the point where the requests get queued (either cache or tape system), which depends considerably on the implementation decisions.  Current developments in the area of Cluster File Systems (i.e. DAFS, StorageTank, Lustre) are important to our field and should be followed closely. One of the fundamental ingredients, a fast and low latency network technology (InfiniBand, VI etc.) is on the horizon, coming with promising prospects both in terms of technical characteristics and reduced cost (initial investment and operations cost). All the file systems mentioned above are in a stage allowing for clear expectations regarding their usability in a LHC scale environment within the next few years.  Storage Virtualization in the network is inevitable because it masks the differences between different storage solutions. The fantasy is that vendors will offer fully integrated 21
  • 22. STORAGE MANAGEMENT SOLUTIONS solutions based on open standards and it can be viewed as the key for getting to “true commodity storage”. Clearly, there is a need to clarify the “virtual” landscape. According to our understanding virtualization implies some fundamental positions:  The purpose of virtualization is to enable better management and consolidation of storage resources  Virtualization may be implemented at multiple points on the continuum between the application and the data  For simplicity’s sake, those points are at the host, the network and the storage device Briefly referring to the last point, there are within the span-of-control of the “virtualization engine” two types of implementation of the device-level virtualization where logical and physical devices exist. These are virtual disk and virtual tape. In tape today, virtualization is being introduced primarily to improve cartridge capacity utilization now that cartridges can cost $100. The unanticipated bigger benefits of tape virtualization have been application performance, and the ability to achieve 100% tape automation by leveraging existing libraries and drives to automate those cartridges that were still within a manual environment. This latter factor alone justifies the use of virtual tape in UNIX and W2k. Looking at it from a different perspective, the network is the most logical place to implement storage virtualization. It is neither a server, nor a storage device, so in existing between these two environments, it may be the most open implementation of virtualization. This is likely to support virtualization of any server, any operating system, any application, any storage device type and any storage vendor. 22
  • 23. STORAGE MANAGEMENT SOLUTIONS References [1] http://www.vidf.org [2] http://www.ietf.org [3] http://www.lustre.org [4] http://www.fibrechannel.org [5] http://www.snia.org [6] http://wwwietf.org [7] http://www.infiniband.org [8] http://grid-data-management.web.cern.ch/grid-data-management/docs/GridFTP-rfio- report.pdf [9] http://www.globus.org/research/papers/anatomy.pdf [10] http://www.sun.com/jiro [11] http://it-div-ds.web.cern.ch/it-div-ds/HSM/CASTOR [12] http://www-hppc.fnal.gov/enstore/ [13] http://cc.jlab.org/scicomp/JASMine/ [14] http://www.Alacritech.com [15] http://www.tivoli.com/products/solutions/san/ [16] http://www.sistina.com [17] http://www.almaden.ibm.com/cs/storagesystems/stortank/ [18] http://www.adic.com [19] http://www-dcache.desy.de/ [20] http://www.sdsc.edu/hpss/ [21] http://www.sgi.com/products/storage/software/html#dmf [22] http://www.lsci.com/ 23

×