ESG Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments Analyst Report

  • 1,149 views
Uploaded on

Virtual server initiatives are straining already overwhelmed file storage environments. Unstructured data is growing faster than ever, thanks largely to the proliferation of endpoint capture devices …

Virtual server initiatives are straining already overwhelmed file storage environments. Unstructured data is growing faster than ever, thanks largely to the proliferation of endpoint capture devices and advances in hardware and software that allow bigger, richer files to be created. Virtualization is exacerbating the problem because many organizations are underpinning their virtualized environments for both server and desktop virtualization with file servers, since it can be much easier to manage the virtual server storage environment using NAS. This further accelerates the growth rate of unstructured data. This paper looks at the challenges associated with managing unstructured data in virtualized environments at scale, and how to get unstructured data under control through file server consolidation. It provides guidelines to help organizations understand what to look for in their consolidated file storage environments in order to make them as efficient as possible through deduplication, tiering, and migration while efficiently keeping data protected and meeting SLAs.

For more information on NAS solutions please visit: http://www.hds.com/products/file-and-content/network-attached-storage/?WT.ac=us_mg_pro_hnasp

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,149
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. White Paper Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments By Terri McClure, Senior Analyst July 2013 This ESG White Paper was commissioned by Hitachi Data Systems (HDS) and is distributed under license from ESG. © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved.
  • 2. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 2 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Contents Overview.......................................................................................................................................................3 Virtualization’s Impact on the Storage Environment ...................................................................................3 The Shift toward NAS for Virtualized Environments............................................................................................... 3 Storage Challenges in Virtualized Environments .................................................................................................... 4 Consolidation: Driving Efficiency in Virtualized Environments ....................................................................6 Automated Storage Tiering and Migration............................................................................................................. 7 Primary Deduplication............................................................................................................................................ 8 Efficient Data Protection ........................................................................................................................................ 9 The Bigger Truth .........................................................................................................................................10 All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at 508.482.0188.
  • 3. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 3 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Overview Virtual server initiatives are straining already overwhelmed file storage environments. Unstructured data is growing faster than ever, thanks largely to the proliferation of endpoint capture devices and advances in hardware and software that allow bigger, richer files to be created. Virtualization is exacerbating the problem because many organizations are underpinning their virtualized environments for both server and desktop virtualization with file servers, since it can be much easier to manage the virtual server storage environment using NAS. This further accelerates the growth rate of unstructured data. This paper looks at the challenges associated with managing unstructured data in virtualized environments at scale, and how to get unstructured data under control through file server consolidation. It provides guidelines to help organizations understand what to look for in their consolidated file storage environments in order to make them as efficient as possible through deduplication, tiering, and migration while efficiently keeping data protected and meeting SLAs. Virtualization’s Impact on the Storage Environment Server virtualization—in other words, using software to divide a single physical server into multiple isolated virtual environments—is driving significant technology and process change across storage, disaster recovery, and management environments in enterprise organizations and small/medium size businesses alike. Server virtualization technology is driving demand for networked storage solutions due to the net increase in storage capacity requirements brought about by server virtualization initiatives. More importantly, the ability to realize many of the key benefits of server virtualization—such as the mobility of virtual machines between physical servers for load balancing, high availability, and maximum utilization of resources—fundamentally requires an underlying networked storage infrastructure. But supporting a virtual server environment introduces a number of storage challenges. First, with multiple virtual machines hosted on a single physical server, chances are good that the associated applications have differing storage policies. This can lead to some pretty complex storage provisioning exercises as storage is logically mapped and provisioned to each virtual machine. And then there is the performance aspect. The storage infrastructure must provide predictable performance scalability for the wide variety of mixed application workloads the virtual machines will drive, with a variety of I/O patterns—for example small, large, sequential, or random operations. Consider that virtual server data protection methods—which are often radically different than traditional physical server methods—need to be designed and tested. And consider the implications of supporting backup and recovery on a single physical machine that supports multiple virtual machines—kicking off backup for one physical server can spike CPU usage and starve the other machines of resources. And when routine maintenance is performed, instead of impacting a single application environment, multiple application environments are affected. ESG has seen instances in which ten or twenty (and in a few edge cases even more), virtual machines share a single physical server, all of which would need to be taken down or moved just to perform routine maintenance. This is really where the importance of networked storage comes in: keeping applications available during everything from routine maintenance to disaster handling by enabling virtual machines to move from physical server to physical server without losing access to data. The Shift toward NAS for Virtualized Environments In fact, many storage challenges associated with server virtualization can be mitigated by leveraging networked attached storage technologies. At their core, virtual machine and desktop images are files. Storing image files on NAS systems simplifies image management significantly: It removes multiple layers of storage management required in a block-based environment. Take the example of provisioning capacity in a Fibre Channel SAN environment. For a Fibre Channel SAN, a storage administrator needs to carve out and assign LUNs to each virtual machine hosted in the physical server; establish and manage switch ports and zones; map HBAs; set up multi-pathing; and cross-mount the appropriate LUNs and
  • 4. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 4 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. zones to multiple physical servers to allow for virtual machine portability. There is more to the process, but describing everything involved would make this a much longer and more technical paper. The point is: That’s a pretty complex and error-prone manual process. In these types of environments, all the mapping and zoning is typically tracked in spreadsheets. It can become an even more complex, time-consuming, and error-prone task as more virtual servers come online or as storage capacity is added and the environment needs to scale. Each time capacity is added, the whole process needs to be repeated. And when you consider the implications of each virtual machine having different protection requirements and performance characteristics, figuring out what LUNs are supporting which virtual machine to ensure appropriate timeliness of snapshots or to perform load balancing can become nearly impossible, especially at scale. In an NFS environment, once a file system is exported to a virtual machine and mounted, it travels with the virtual machine across physical servers, maintaining the relationship. And to add capacity, file system sizes can be expanded on the fly, with no downtime. And because users are managing information,—a file system for each virtual machine rather than a collection of HBAs, LUNs, and worldwide names—overall management is simplified. So, provisioning capacity is much simpler when you treat VMDK files as files! When it comes to NFS for data protection, the snapshot and remote replication capabilities of file systems are often used for improved recoverability, space efficiency, and speed (more on that later). With networked storage, multiple copies of virtual machines can be quickly created, efficiently stored and accessed for replication and disaster recovery purposes, and used to more efficiently perform bare-metal restores. To alleviate the issue with backing up the virtual machine, the backup load can be shifted from the physical server to the file server, leveraging snapshot copies to meet recovery point and time objectives. Storage Challenges in Virtualized Environments When ESG surveyed just over 400 North American IT professionals concerning their organizations’ current data storage environments, including current storage resources, challenges, purchase criteria, and forward-looking data storage plans, participants were asked about their “significant” storage challenges related to their virtual server environments. Almost half, or 43%, of participants indicated that the capital cost of a new storage infrastructure is a significant challenge, and more than one in four (28%) cited operational cost of storage related to server virtualization as a significant challenge (see Figure 1).1 1 Source: ESG Research Report, 2012 Storage Market Survey, November 2012.
  • 5. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 5 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Figure 1. Storage Challenges Stemming from Server Virtualization Usage Source: Enterprise Strategy Group, 2013. In that same survey, respondents were asked about their biggest storage challenges in general, and their primary storage challenge in particular. Rapid growth and management of unstructured data was cited by 40% of respondents as a challenge and as the primary challenge by 15% of respondents. Data protection was close behind, with 39% of respondents citing it as a challenge and 11% citing it as their primary challenge. Also in the top five responses (out of 19 possible) were hardware costs, running out of physical space, and supporting a growing virtual server environment (see Figure 2).2 The influx of unstructured data associated with virtualized environments is certain to continue to strain the unstructured data storage environment as IT organizations struggle to scale and meet these varied and unpredictable workload requirements. 2 Ibid. 5% 14% 19% 22% 24% 28% 29% 36% 42% 43% 0% 10% 20% 30% 40% 50% We have not encountered any challenges Lack of scalability Sizing IOPS requirements to support virtual server environments Poor application response times Impact on overall volume of storage capacity Operational cost of new storage infrastructure Limited I/O bandwidth, especially when workload spikes occur Sizing true capacity (storage) required to support virtual server environment Disaster recovery strategy Capital cost of new storage infrastructure From a storage infrastructure perspective, which of the following would you consider to be significant challenges related to your organization’s server virtualization usage? (Percent of respondents, N=418, multiple responses accepted)
  • 6. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 6 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Figure 2.Top Ten Storage Environment Challenges, by 2012 Storage Budget Source: Enterprise Strategy Group, 2013. Of course, all of this brings up a question for users: How do I rein in unstructured data growth, cost-effectively protect my data, and reduce my overall footprint while still maintaining service levels for my virtual server environment? Undertaking a comprehensive file server consolidation exercise can be an answer—but only if it is built on the right core principals. Consolidation: Driving Efficiency in Virtualized Environments Consolidation is the process of identifying and eliminating legacy storage silos that are the result of the way IT has managed data growth to date, and then putting in place best practices for managing the storage environment in a holistic manner that reduces the overall physical footprint (and costs) of data. Before diving in to consolidation, let’s look at how we’ve arrived here. Why is rapid growth and management of unstructured data the top storage challenge for almost half of IT organizations surveyed? This unrelenting increase of data stems from natural application growth and from the new workloads being generated by social media; web 2.0 applications; and the creation of video, audio, photos, and similar content. Endpoint capture devices have proliferated hugely: A smartphone is in almost everyone’s pocket. A tablet computer (business as much as personal) is in many people’s laps. The ability to create and consume content requires nothing more than the press of a button. Websites and barcode readers collect more data each second—data that organizations slice and dice to identify what their customers need, or more accurately, what their customers will buy. Big data is everywhere, and the rampant copying of data sets for analytics is only one reason for it. Other data- growth culprits include snapshots and remote replication to increase uptime and availability, and programs or initiatives to improve data protection and regulatory compliance. Those are good things, of course, but they certainly accelerate overall data growth rates. 17% 19% 19% 20% 25% 25% 25% 39% 39% 40% 5% 6% 5% 5% 4% 5% 7% 10% 11% 15% 0% 10% 20% 30% 40% 50% Discovery, analysis and reporting of storage… Lack of skilled staff resources Management, optimization & automation of… Staff costs Data migration Running out of physical space Hardware costs Data protection (e.g., backup/recovery, etc.) Rapid growth and management of unstructured… In general, what would you say are your organization’s biggest challenges in terms of its storage environment? Which would you characterize as the primary storage challenge for your organization? (Percent of respondents, N=418) Primary storage challenge All storage challenges
  • 7. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 7 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Historically, the most common way to address the growth problem has been to toss even more storage capacity at it:  You want copies for testing and development? Here’s a server and some storage.  You’d like offsite replication? We’ll build another infrastructure stack.  You need backup? We’ll build another.  You need an application server? We’ll carve out a VM for you, and somehow we’ll find the storage to provision for your virtual machine image and the data it is going to need. That strategy results in ever-expanding, unsharable silos of storage that are usually poorly utilized. They cost more to buy; they take up more data center floor space; they use more energy to power and cool; and they require more staff to manage. All these things are pretty much the opposite of efficiency, which is what most IT organizations are after; yet too often, it was easier to continue to pour money into a suboptimal solution than “bite the bullet” and make things right for the longer term. But in this era of changing consumption models, throwing capacity at everything just won’t work.3 It’s also an ineffective way to spend money. The first step in a comprehensive file storage consolidation strategy is to identify and eliminate these silos. This is not an easy task—in fact, many organizations attempt this effort and at the end of the day, they just create bigger silos, albeit fewer of them. But without the right underlying technology, this is only a Band-Aid that will provide short term relief—the inefficient silo problem still exists, and IT organizations pay more than they need to for their storage from both a CAPEX and OPEX standpoint. The underlying technology in any comprehensive consolidation strategy must be seamless, scalable, and efficient in order to truly eliminate silos all together, but it also needs to support sufficient performance to maintain SLAs in unpredictable virtualized environments. It can’t trade off performance for efficiency because too many workloads could be affected. That means seamless tiering, both within (as the classic definition of tiering has been) and between (which is required to eliminate silos) systems. It also means efficient deduplication of primary data without a major performance hit/impact, and the ability to maintain performance as the environment scales. But most importantly, to maintain SLAs in virtual server environments, it means tight integration into the tools of those environments. Hitachi Data Systems offers such technology and can help IT organizations accomplish this. Automated Storage Tiering and Migration Automated storage tiering has been the topic of much discussion in the industry. Typically, when vendors discuss this capability, they mean tiering within an array and using some combination of flash or solid-state storage for highly active data with (possibly) some serial-attached SCSI (SAS) drives and (likely) the bulk of data on slower rotating, high capacity, nearline SAS (NL-SAS) drives. This makes sense as most data is only active within 30 days of creation, and afterwards is retained yet rarely accessed (this is often called long tail data). In a traditional single tier architecture, this would mean buying an array full of SAS drives to support the active data, and storing long tail data on the same expensive drives. Even worse, it means buying a high performance, highly available tier, one flash array to support the highly active data, and storing the long tail data on that same tier-1 system—but more on that later. Using a small amount of solid-state storage for active data, with a tier of high capacity, slower rotating (hence less power-consuming) NL-SAS disks is a highly effective way to reduce the overall storage footprint as well as cut power and cooling costs. In virtualized environments, where performance is typically a bit write heavy (due to the way virtual servers cache data and stage I/Os) with lots of random I/O, accessing many VMDKs creates a lot of metadata activity. In fact, metadata operations can be very disk intensive and can make up as much as half of all file operations. Automatically moving metadata to a flash or SSD tier, as HDS does, can significantly improve performance in virtual server environments by speeding metadata lookups. 3 A portion of the text in the previous three paragraphs of this section is from the ESG White Paper, Hitachi Data Systems Storage: Improving Storage Efficiency Can Be a Catalyst for IT Innovation, June 2013.
  • 8. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 8 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Even with automated storage tiering within storage systems, IT organizations still find themselves with the challenge of having the bulk of their expensive, tier-1 storage arrays taken up by long tail data. The need is for automated tiering, based on user-defined policies, between storage systems. This is rarely discussed by storage vendors because (a) storage vendors like to sell lots of tier-1 storage and when it fills up, the IT organizations need to buy more, and (b) most storage vendors just don’t have a good story when it comes to automatically migrating data off of tier-1 arrays and onto secondary or tertiary tiers. But HDS does. HDS offers intelligent file tiering that allows IT organizations to search across the file environment and set policies that will trigger automated migration between arrays or even to a cloud tier such as Amazon S3. IT can set policies based on parameters like age, activity, or content type. Think of the power such functionality could have in virtual desktop environments, where users are creating many versions of documents that are rarely, if ever, accessed after 30 days. Highly performing, highly available tier -1 storage systems need to be deployed to meet the demands of virtual desktop environments. Moving user documents off of tier-1 storage as they age or their activity tails off allows IT organizations to reclaim tier-1 storage capacity to service active use cases. HDS claims users can reclaim up to 60% of primary storage capacity via automated migration in the virtual desktop use case. Primary Deduplication Deduplication is the process of identifying duplicate data if it is written to the file system and storing it just once, instead of every time the same data is written. In most cases, a “virtual” file is created that just has pointers to the original copy of the data. Deduplication has largely been deployed in backup environments to reduce storage capacity associated with keeping backup data, which is by nature highly duplicative. Deduplication can be performed at the source file system (which requires server CPU and can drain performance in volatile virtualized environments), inline as data is written (which often drains performance because the process happens during the write, which cannot be committed until the operation is complete), or in a post process (which is often a scheduled, batch-oriented process done off hours) in which the pointers are created. Space needs to be reserved to perform the deduplication process, and the space that the duplicate data resided in needs to be reclaimed after the deduplication process completes. Many IT organizations are hesitant to use deduplication in primary storage environments because of the overhead associated with identifying duplicate data and the negative impact that may have on the system’s file serving performance. HDS has developed deduplication technology that mitigates much of the associated overhead and makes it viable to use deduplication in a primary storage environment. Hitachi NAS hardware acceleration, inherent with its “Hybrid- core” architecture, helps calculate secure hash algorithm (SHA-256) values to speed dedupe comparisons without interfering with file sharing workflow (whether through NFS or SMB/CIFS). It also has intelligence that knows when new data is added and automatically starts up to four parallel deduplication engines if needed to eliminate redundant data. When file serving load reaches 50% of available IOPS, the deduplication engines throttle back to prevent impacting user performance, then automatically resumes when the system is less busy. This unique and patented approach to deduplication enables customers to enjoy the benefits of increased capacity efficiency and reduced total cost of ownership provided by deduplication without compromising performance or scalability. The HDS approach features data in-place deduplication. Data is stored as it normally would be. The deduplication process then combs through that data, eliminating redundancy. Data in-place deduplication eliminates the need to set aside capacity to be used as temporary deduplication “workspace,” minimizes the space needed to track deduplicated data, and delivers greater ROI. Deduplication can be as highly effective in virtual server environments as it is in backup environments because virtual machines often have many of the same files, such as operating system images. In virtualized environments (server and desktop), IT organizations can see as much as 90% capacity reduction through the use of deduplication. Deduplication provides a big “bang for the buck” and offers one of the best ways to reduce the overall storage footprint. HDS makes it a viable choice for primary storage.
  • 9. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 9 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. Efficient Data Protection Data protection can really pile on to storage management challenges, and the challenges are magnified in virtualized environments. To manage copies efficiently, bidirectional block-level replication technologies that can utilize the deduplicated storage pool should be used. By doing that, only the unique data elements are transmitted to the other appropriate repositories. In this case, efficiency happens in terms of how little space is consumed (regardless of the number of copies) and how little network throughput is saturated (due to smarter discernment of what should be replicated). But it requires a management tier that understands all of the storage assets across the enterprise, such as HDS does. In virtualized environments, it is important that snapshots are performed at the VM level (as opposed to the LUN, file, or file system level, in which IT administrators could risk cloning the wrong LUN or file thinking it was associated with the VM). This level of granularity is not only efficient, but also effective. It allows for rapid virtual machine and application cloning, with no additional scrubbing operations to get up and running. A highly efficient approach, such as that taken by HDS, only stores pointers to the original data, and only unique data is added to the clone. HDS supports a highly scalable model with up to 100 million snapshots per file system and 100 million clones per file system. An effective data protection strategy in a virtualized environment must be tightly integrated into the virtual environment management tools to ensure the storage administrator and virtualization administrator are working in concert, rather than at odds. Hitachi NAS Virtual Infrastructure Integrator (Virtual V2I) is a VMware vCenter plugin plus associated software that addresses virtual machine backup and recovery and cloning services. It allows users to create storage-based snapshots at intervals ranging from hours between backups to minutes resulting in improved recovery point objectives. Because restores are pointer-based, recovery time can be near instantaneous (a matter of seconds) regardless of size. Virtual V2I allows users to schedule and monitor VM backups to ensure they have an application-consistent recoverable environment. Leveraging space-saving snapshot and clone technology can significantly reduce the storage and network overhead associated with data protection and copy management. But it isn’t just about data protection. Having an efficient copy management engine can speed test and development as well as provisioning. In a dynamic virtual server world, where new servers can be spun up easily and quickly, speeding provisioning or the deployment of new applications or patches can provide businesses the high-tech edge they need to stay ahead of the pack in an increasingly competitive world.
  • 10. White Paper: Unstructured Data Efficiency and Cost Savings in Virtualized Server Environments 10 © 2013 by The Enterprise Strategy Group, Inc. All Rights Reserved. The Bigger Truth Over the last decade, almost all areas of IT have been forced to adapt to transformations. Server virtualization is now ubiquitous. Leading-edge IT organizations are now beginning to realize a much broader spectrum of benefits from server virtualization initiatives, such as expanding virtualization to the next tier of applications, automating manual tasks, and streamlining access to IT resources. All of these advantages, in turn, drive hard savings, such as reduced OPEX and CAPEX (from deferred procurement as well as waste reduction), and soft savings from simplified management, reduced downtime, and performance gains. Server virtualization has spawned a need for change in other areas of IT infrastructure, perhaps most significantly in storage. As noted in Figure 1, the biggest storage challenge associated with server virtualization among respondent organizations is the capital cost of the storage infrastructure to support it. Storage costs can quickly eat away at any CAPEX and OPEX savings achieved from virtualization initiatives. As we’ve observed for the past decade, server virtualization accelerates storage growth. But we are only just beginning to see the impact of desktop virtualization on storage, and the emerging picture does not bode well for storage administrators. When ESG surveyed storage administrators that said desktop virtualization presented a storage challenge, 77% of them said that desktop virtualization significantly increased storage capacity requirements, and 51% said it had a negative impact on performance. Taking a holistic view and consolidating the storage environment can help mitigate the storage costs associated with supporting virtualized environments. But consolidation alone is not enough. For many storage vendors, consolidation just means putting everything on a tier-1 storage system that tiers internally. A truly efficient consolidation strategy ensures data is stored on the right tier (within a system to meet performance needs, or on a separate long term archive tier for long tail data) at the right costs at the right time. And it means storing only one copy of data, while creating space efficient copies to use as a basis for backup and restore operations. Combined, this can significantly reduce the overall storage footprint and not only help organizations maintain the cost saving associated with virtualization initiatives, but also attain significant cost savings on the storage front. Not all users will see a 90% reduction in capacity associated with deduplication, but a 20, 30, or 40% reduction would pay off handsomely in the primary storage environment. Add that to the reclamation of tier-1 storage from migrating data between tiers, and the savings multiply quickly.
  • 11. 20 Asylum Street | Milford, MA 01757 | Tel: 508.482.0188 Fax: 508.482.0218 | www.esg-global.com