SlideShare a Scribd company logo
1 of 15
Download to read offline
A Huginn Consulting Technical White Paper.
Page: 1
The End of Appliances
Creating Compute Intensive Storage Solutions for
Integrated IT Infrastructures
September, 2011
A Huginn Consulting Technical White Paper.
Page: 2
Executive Summary
Improving IT efficiencies, reducing costs and simplifying management are common goals in enterprise
data centers. IT service providers play a key role in helping businesses achieve these goals. When
considering IT Infrastructure as a service (IaaS) environments, reliability and uptime are critical in these
demanding environments. The impact of a single failure can be devastating to the multitude of
businesses using the hosting service.
As such, our goal was to create a fully integrated IT infrastructure and, through testing, isolate and
quantify the performance gains of adding memory and compute power to the server and storage
systems. A team of engineers and architects from Huginns Consulting and Genesis Hosting, an IT IaaS
provider, built and tested a prototype of a truly integrated IT infrastructure environment. This Proof of
Concept (POC) demonstrated the steps for building and deploying an infrastructure for a genuinely
responsive, service-oriented IT organizations in 90 days or less. Complete with servers, networking, and
storage, the model also integrated three diverse and essential IT control elements: self-provisioning,
self-service and automated management.
Our POC shows how to create efficient and high performance compute and storage intensive
solutions—including data de-duplication and compression from generic storage, server and network
resources—into a virtualized and integrated IT infrastructure. These solutions fit comfortably into the
service provider’s business model, and can easily be integrated into the existing management
framework.
“IT infrastructure is moving quickly toward becoming delivered through a service
model. Machines are becoming virtual, running in secure data centers on large,
partitionable machines. Self-provisioned virtual IT resources are the key to success for
a service model, which requires all aspects of the physical hardware to be abstracted
or partitioned with permissions and given only to a tenant of the service.”
Eric Miller, CEO, Genesis Hosting
A Huginn Consulting Technical White Paper.
Page: 3
To begin, we defined the drivers behind the prototype POC to include three critical aspects for IT service
provisioning:
1. The business model for IaaS providers must be centered on a self-service, self-provisioning
model.
2. To effectively manage and share these solutions across multiple customers and applications in a
self-service hosting environment, the hosting infrastructure needs to be fully virtualized at all
levels.
3. Self-service in a shared service hosting environment requires a new management stack.
We tested two aspects of the integrated infrastructure model. First, we tested the requirements for the
management stack in an IaaS service provider infrastructure through the design of a prototype self-
service management facility. The results of this study can be found in the Huginn Consulting report
titled, “Enabling Self-Service and Self-Provisioning in an IT Infrastructure.”
The second test analyzed the creation of compute and storage intensive solutions in this infrastructure.
These solutions must be available for self-provisioning within the tenant’s VDC (virtualized data center),
which we will explore further in this study. The POC was tested in a service provider environment at
Genesis Hosting using an infrastructure model identical to Genesis’ deployed infrastructure.
Findings from this POC validated the proposition that computing resources can be added to storage to
create much more cost-effective storage systems and solutions in IaaS environments:
1. A two-thirds reduction in storage consumption can be realized from intensifying the
compute resources dedicated to storage management and operation.
2. Adding compute resources to storage arrays can improve the resilience of storage
infrastructure by supporting full performance with RAID-6 protection.
3. Array recovery and maintenance operations time, which includes RAID rebuild time and LUN
expansion can be reduced by approximately 80%.
The Service Hosting Business
Profitable and efficient service hosting relies on large-scale infrastructures, a high degree of utilization of
the resources, and a completely integrated infrastructure based on virtualization of servers and storage.
The service provider builds and manages the shared infrastructure. A self-service UI and a virtualization
client enable the tenant to provision and manage his leased virtual data center, which is a collection of
VMs (virtual machines) with storage and resources running licensed applications or solutions. The
resources are leased time based.
A Huginn Consulting Technical White Paper.
Page: 4
The scale, utilization, efficiency and simplified management of this infrastructure enables the service
provider to provide IT services that are lower than the cost of running these services in house1
New Technologies
.
Computational power in the form of CPU cores and RAM memory are becoming abundant at lower price
points. In a fully virtualized IT infrastructure that supports the self-service service model, these
resources can be applied easily, even in storage intensive solutions. Data de-duplication and
compression have long been used in secondary storage applications, backup, archiving and more. These
technologies are now available in software products, including open source software. The availability of
compute power, and faster and cheaper RAM makes these technologies applicable for primary storage
applications with surprising affordability and performance.
Compute Intensive Storage Solutions for Self-Service
The service provider’s business model relies on being able to support customers with a single, virtualized
and integrated infrastructure. The customers’ needs are met by creating and running the customer’s
VDC in the infrastructure. All applications, solutions and services are implemented as VMs with
provisioned servers, storage and other resources from the infrastructure’s resource pool.
Compute intensive storage solutions must follow the same model in order to meet the service providers’
requirements. Solutions created by first provisioning a VM with generic server, network and storage
resources, then installing the software into this VM, are ideally suited for the IaaS business model. These
new solutions can be added simply by integrating the new software into the licensing and billing system,
making these available for self-service by the tenant, and adding generic storage and server capacity to
the existing resource pools.
Solutions that rely on adding hardware appliances built for a specific purpose to the infrastructure are
much more difficult to integrate. Purpose built appliances also lead to less than optimal scaling because
they often have interfaces not optimized for this type of deployment. New products must be added as
separate resource pools, which results in a management framework that gets progressively more
complex. Resource efficiency is also compromised as it becomes more difficult to limit idle resource
capacity.
1
For a more detailed overview of the service provider business model and the required infrastructure, please refer to the
companion report: “Enabling Self-Service and Self-Provisioning in an IT Infrastructure.”
A Huginn Consulting Technical White Paper.
Page: 5
Protection, Protection, Protection
Service hosting and virtualization implies an increased concentration of tenants and users supported by
a single physical infrastructure. As a result, the effects of failures and data losses are amplified. A single
failure can be catastrophic for thousands or even tens of thousands of users or numerous organizations.
Service hosting companies are even more at risk of failure; therefore they must provide much stronger
protection of customers’ data. As a consequence, all data must be carefully protected against double
disk failures, as well as other storage-related failure modes. This requires increasing storage system
computational power. Genesis Hosting exclusively deploys storage arrays configured with RAID-6- to
protect its customers against double disk failures.
Infrastructure for Self-Service Solutions
Effective service hosting with a self-service model implies virtualization at all levels of the infrastructure:
storage, networking and servers. This is the only way that shared physical compute and storage
resources can be integrated into one infrastructure; this approach enables the most scalable, flexible
platform for IT solutions. This infrastructure can then be partitioned into the logical application entities
or solutions that customers can provision and manage for themselves. Self-service enables service
hosting organizations to build a single scalable and integrated infrastructure of servers, storage arrays,
network equipment and more—all of which can be managed from a single console. Customers are able,
through service portals and management clients, to provision, implement and manage their own VDCs.
The entire premise of this POC revolves around the integrated multi-layered infrastructure enabled
through virtualization. The focus of this report is to document two aspects of the POC:
1. The requirements for storage to be deployed in a self-service infrastructure.
2. The creation of more efficient storage solutions within the framework of the self-service
infrastructure.
Designing a Proof of Concept
The hypothesis that intensifying the
compute resources in storage solutions,
both in VM-based solutions as well as inside
the underlying storage, was shaped by the
following questions:
1. How can compute resources be used to
improve the performance, efficiency
and user experience of storage
solutions? What types of compute
intensive storage solutions can be
created? Figure 1: POC Physical Configuration
A Huginn Consulting Technical White Paper.
Page: 6
2. Which architecture is required by an IT infrastructure that supports these solutions and the self-
service hosting model?
Structure and Organization
The prototype infrastructure for the POC was constructed at Genesis Hosting’s facilities. The
architecture was chosen to mirror Genesis’ production infrastructure where Genesis’ customers are
provisioning and building their VDCs and running their applications. In fact, the POC team operated as a
typical Genesis customer. The prototype was configured as a VDC where the team members provisioned
resources and built the VMs used for the compute and storage intensive services tests in this POC.
The Hardware Components
• NEC Express5800/A1080a (GX) server: The server is configured with four compute modules,
each with two Intel “Westmere” processors and 128 GB of RAM.
• The new NEC M100 and the previous generation NEC D4 storage arrays. Both were configured
with 7.2K RPM SATA disks. Performance of the two arrays was used as a measure for the benefit
of increasing the compute and storage intensity factors.
o All tests were run on RAID-6 configured LUNs.
o Two disk configurations were used: 6 disks, 12 disks.
• The Qlogic 8/4 GBit FC switch connected servers and storage.
• NEC 1Gbit Ethernet ProgrammableFlow (PF) switch provided connectivity for system
management.
The Software Stack
• All software was run on VMs in VMware vSphere 4.1 environments on the NEC GX server.
• The Blackball Search-In Software indexing engine and the Microsoft Exchange JetStress load
generator were run on VMs with Microsoft Server 2008 R2.
• The NexentaStor (version 3.1.1) software was installed as a virtual appliance in vSphere.
A Huginn Consulting Technical White Paper.
Page: 7
The POC Prototype—Storage for Service Hosing
The compute and storage intensive prototype includes NEC’s M100 storage array, data compaction
solutions, a file system indexing solution, and vCenter for cloning VMs and VM templates.
Each of the two controllers for the M100 includes the new high performance Jasper Forest processor
from Intel and 8GB of RAM. This is considerably more compute capacity than typical arrays; this
configuration of resources is required for deployments in a shared service hosting infrastructure as it
provides maximum protection while maintaining full service levels to the users.
Data Compaction
The data compaction solutions (Figure 2)
were used to test the effectiveness and the
performance of in-line data de-duplication
and compression in a solution stack. The
solution is based on the NexentaStor virtual
appliance. Data writes and reads to and
from the LUN exported by NexentaStor are
compacted or expanded in real-time.
NexentaStor uses the array for storing the
compacted data.
The compaction and expansion
performance and efficiency, as well as the
overall solution performance, are tested by
changing the configuration of the VMs and
the underlying storage configuration. Two
storage configurations were used: 6 and 12 disks.
Testing the Solutions
The prototype workloads were designed to test the performance and efficiency of two compute
intensive storage solutions. The first solution tested was the new NEC M100 storage array with
considerably more compute performance compared to previous generation products. The second was a
set of compute intensive storage solutions created in the virtualized POC infrastructure by using the
NexentaStor software. The tests were run using storage allocated directly from the storage array, and
with the storage allocated from the de-duplicated or compressed LUNs exported by NexentaStor.
VM Cloning
VM cloning was used to test bulk read and write performance of storage solutions. vSphere was used to
clone the VM with the file system to and from the compacting LUN exported by NexentaStor. The
Figure 2: Logical Data Compacting Configuration
A Huginn Consulting Technical White Paper.
Page: 8
respective source or destination was a 12 disk SAS data store on the NEC D4 storage array, which
delivered much higher I/O rates than Nexenta or SATA.
Two tests were performed. First, the 62.2 GB VM with the file system used in the indexing test was
cloned. Second, another 9.2 GB VM template with only the Microsoft Server 2008 R2 OS was cloned to
see the differential efficiency and performance after the first OS instance had been written.
File System Indexing
The Blackball indexing engine generates a very small index, only ~2% of the indexed data. The file
indexing solution was therefore used to test the file read performance of the compacted LUNs.
The indexing engine performed indexing of the file system in its VM. The file system consisted of a total
of 53 GB of file data, including text, email, music, images, a document archive and more.
Email Performance and Resiliency
The Exchange 2010 JetStress load testing tool was used to measure the performance of a storage
subsystem for a synthetic Exchange email workload. Since JetStress generates its own data, it was not
used to test the performance of compacted storage.
JetStress determines sustained performance in Microsoft Exchange IOPS (input/output per second), i.e.,
the total number of Exchange reads and writes to the storage subsystem. The number includes message
and log file I/O, and uses a fixed ratio between reads and writes.
A Huginn Consulting Technical White Paper.
Page: 9
Outcomes
The following outcomes related to compute and storage intensive solutions were generated in this POC.
Data Compaction
We investigated the bulk write and read performance and the efficiency of the data compaction
solutions2
Three configurations of the NexentaStor VM were used in testing:
. The cloned file indexing VM – including the file system data, and a barebones VM template –
was cloned to the compacting LUN exported by NexentaStor.
• 4 cores, 8 GB RAM, 6 disk LUN
• 8 cores, 32 GB RAM, 6 disk LUN
• 8 cores, 32 GB RAM, 12 disk LUN
Table 1: Write Performance and Compacting Ratios for Compacting LUN
CPU
Cores
RAM
(GB)
Disk Set
Size
Compaction
Type
Clone Time
(Seconds)
Compaction
Ratio
Clone Time
(Normalized)
Clone Write
Rate
(MB/s)
4 8 6 disks None 613 1.00 1.00 101
4 8 6 disks Dedupe 5362 0.68 8.75 12
4 8 6 disks Compress 1398 0.86 2.28 44
4 8 6 disks D+C 3635 0.60 5.93 17
8 32 6 disks None 420 1.00 1.00 148
8 32 6 disks Dedupe 1720 0.68 4.09 36
8 32 6 disks Compress 464 0.86 1.10 134
8 32 6 disks D+C 1196 0.60 2.85 52
8 32 12 disks None 421 1.00 1.00 147
8 32 12 disks Dedupe 1274 0.68 3.03 49
8 32 12 disks Compress 469 0.86 1.11 133
8 32 12 disks D+C 1144 0.60 2.72 54
We also tested the read performance of the compacting LUNs. The test was performed by cloning the
VM from the compacting LUNs to a SAS LUN on the NEC D4 storage array.
2
See the File Indexing section for the relative file system read performance.
A Huginn Consulting Technical White Paper.
Page: 10
Table 2: Read Performance from a Compacting LUN
CPU
Cores
RAM
(GB)
Disk Set Size Compaction Type Read Rate
(MB/s)
8 32 12 disks None 90
8 32 12 disks Dedupe 74
8 32 12 disks Compress 86
8 32 12 disks Compress+dedupe 63
Table 3 shows the results for cloning a second VM to the same compacting LUN. The first row is for the
same VM as the first, the row is for a VM template with only the Windows 2008 Server R2 guest OS
installed. The size of the VM template is 9.2 GB.
Table 3: Incremental VM Cloning Performance
CPU
Cores
RAM
(GB)
Disk
Set
Size
Data
Set
(GB)
Compaction
Type
Clone
Time(s)
Compactio
n Ratio
Clone Time
(Normalized)
Clone
Write
Rate
(MB/s)
8 32 12
disks
62.2 Dedupe 2010 0.03 4.78 21
8 32 12
disks
9.2 Dedupe 250 0.03 4.00 37
Results can be summarized as follows:
• For the initial writing of the VM with data to the compacting LUN, the resulting compacted data
set varied from 86% (compression), 68% (dedupe) and 60% (compress + dedupe). These
compaction ratios were independent of VM resources or storage subsystem.
• Both compression and de-duplication benefit significantly from the increased number of cores
and the amount of RAM. The relative performance degradation is reduced by a factor of 2 when
increasing VM resources from 4 to 8 cores, and from 8 to 32 GB of RAM.
• Write speed to a compressed LUN is 90% when compared to non-compressed LUN.
• Write speed to a de-duplicated LUN is 33% when compared to non-compressed LUN.
• Write speed to a compressed and de-duplicated LUN is 36% when compared to non-compressed
LUN.
• Read performance for compacted LUNs is much closer to non-compacted read performance.
The performance reduction is only 18%, 5% and 30% for dedupe, compressed and compress +
dedupe, respectively.
• The incremental cloning of both the full VM and the VM template was compacted to only 3.2%
of original size, from 62.2 to 2 GB and from 9.2 GB to .3 GB, respectively.
A Huginn Consulting Technical White Paper.
Page: 11
File Indexing
Table 4 presents the indexing performance for file indexing of a file system stored in un-compacted
RAID-6 storage, de-duplicated storage, and compressed storage. These tests all involved compacting
storage; the NexentaStor compacting VM was configured with 8 cores and 32 GB RAM.
Table 4: Indexing Performance
Cores RAM
(GB)
Disk
Configuration
Compaction Type Elapsed
Time (h:m)
Improvement
Relative to
Initial
Configuration
Improvement
Relative to
“No-compact”
2 4 6 disks None 4:39 0% 0%
2 4 6 disks De-duped 4:30 3.3% 3%
4 16 6 disks None 4:39 0% 0%
4 16 6 disks De-duped 4:30 3.3% 3%
4 16 12 Disk None 4:00 14% 0%
4 16 12 Disk De-duped 4:34 1.8% (14.2%)
4 16 12 Disk Compressed 5:04 (8.9%) (26.7%)
4 16 12 Disk Compress+Dedupe (1) 3:52 16% 3.3%
8 64 12 Disk None (1) 4:15 8.6% 0%
8 64 12 Disk De-duped 3:50 17.6% 9.8%
8 64 12 Disk Compressed 4:14 8.9% 0.4%
8 64 12 Disk Compress+Dedupe 4.28 3.9% (5.1%)
Results can be summarized as follows:
• De-duplicated storage indexes are marginally faster than for non-compacting storage. This is
most likely due to caching in the NexentaStor engine.3
• Compressed storage indexes 20% slower than for non-compacting storage.4
• Increasing the number of cores and amount of RAM improves indexing performance between
5-20%. We believe the increased RAM is the most significant factor.
• Doubling the number of disks from 6 to 12 improves non-compacted performance by 17%.
3
There are some inconsistencies in the measured results. Further or repeated tests are required to understand the source and
significance.
4
The performance measurements give a clear indication of the general read performance of compacting storage.
A Huginn Consulting Technical White Paper.
Page: 12
Data Compaction Summary
When a VM that performs the compaction or expansion has sufficient compute and memory resources,
we observed interesting results. A data de-duplicated LUN shows significant degradation in write
performance, but equal or better than equal read performance when compared to non-compacted LUN.
De-duplication of an initial large data set typically reduces data to 2/3 of the original size. A LUN
containing a large number of largely identical data sets, such as the VM system disks in virtualized
infrastructure will see much higher compaction ratios. A compressed LUN shows only a small read and
write performance degradation. Data is compacted to 87% of original size. Combining de-duplication
and compression yields the highest data compaction ration, 40%, but both read and write performance
is significantly reduced. This reduction can be mitigated by adding more CPU power or RAM.
Data de-duplication is better suited for read intensive workloads, e.g., document archives or storage for
the system disk when the system (OS and application code) and the application data are separated into
separate (virtual) disks. Compression is better suited for write intensive workloads. The data compaction
ratio is smaller than for de-duplication.
These findings show that for the right applications, computing resources can be applied with great
benefit and can have a dramatic impact on the data footprint and the system performance, as well as
the overall economies of IT operations. While this solution benefits significantly from increasing the
number of cores and amount of RAM, compression performed well with 4 cores and 8 GB of RAM.
Email
JetStress was used to test the sustained performance of the array. In the prototype, the JetStress load
ran for 1 hour. Table 5 presents the sustained performance observed when using JetStress to generate
the simulated Exchange email workload.
Table 5: Simulated Exchange Back-end Throughput
Cores RAM (GB) Disk Configuration Compaction Exchange
Storage IOPS
Relative
Increase
4 8 6 disks (RAID-6) None 160 0%
8 32 6 disks (RAID-6) None 166 4%
8 32 12 disks (RAID-6) None 279 68%
Results can be summarized as follows:
The Exchange workload is I/O bound. There were only minor improvements between the small (4 cores,
8 GB RAM) and the large (8 cores, 32 GB RAM) VM configurations. The performance increased by a
factor 1.68 when changing from 6 to 12 disks in the array for un-compacted storage.
A Huginn Consulting Technical White Paper.
Page: 13
The Storage Array
We ran all tests on the NEC M100 array and on the previous generation D4 storage arrays. The following
results indicate that the M100 is a good candidate for service hosting environments.
• The M100 delivered high performance when all disk groups were configured for RAID-6.
• The prototype ran two high load performance tests (JetStress + data de-duplication)
concurrently on the M100. There was no observed degradation on either test when compared
to running these tests independently. The M100, even when configured for RAID-6, has ample
compute power to support the JetStress and the data compaction tests in parallel without any
measurable performance effect on either test.
• The extra compute power in the M100 improves the performance of recovery operations, such
as LUN rebuild and maintenance operations including disk group expansion. The staging and
configuring of the POC system indicates that the M100 is up to 5X faster than the D4, but these
operations have less impact on running applications.
Compute Intensive Solutions for Self-Provisioning.
Our POC shows how to create efficient and high performance compute and storage intensive
solutions—including data de-duplication and compression from generic storage, server and network
resources—into a virtualized and integrated IT infrastructure. These solutions fit into the service
provider’s business model. They can be integrated easily into the existing management framework and
included in the list of solutions, software, etc. that is available for self-provisioning by the tenant.
The above tests show how a service provider can create a set of new storage solutions with different
benefits from simply using available software products with the server, storage and networking
resources already in use. These solutions are made available to the IaaS customers as new types of
storage pools in the existing self-service framework. The solutions are created simply by creating a new
VM and installing software, or in this case installing a pre-created “virtual appliance,” provisioning the
required compute and RAM resources, then making the compacting LUNs available as new data stores
that the tenant can allocate as storage for their VMs.
For an IaaS service provider with a self-service based service offering, it is critical that the solutions can
be built from generic resources of existing resource pools. This assures simple integration into the
infrastructure and the self-service model, and assures a high level of resource utilization. All of this is
essential when running a service provider operation.
Conclusion
IT service providers rely on the self-service model in order to remain efficient, competitive and
profitable. The most cost-effective solutions are virtualized solutions that can be supported directly by
A Huginn Consulting Technical White Paper.
Page: 14
the service provider’s virtualized and integrated infrastructure. No purpose-built hardware appliances
are required. Created with VMs and generic resources like CPU cores, RAM, and storage, these solutions
can be made available easily to tenants in a self-service service hosting environment.
The falling price point of computing resources, including CPU cores and RAM make it cost-effective to
use compute power to create more effective storage solutions. These resources are integrated into the
arrays to provide better performance and resiliency. In addition, as environments take on more users
and customers, the increased number of users or applications supported by a single array requires a
higher degree of protection against failures, and shorter time/ less client impact when carrying out
recovery or maintenance operations. All of these data protection and disaster recovery imperatives
require more compute power.
One can also apply compute resources to create compute-intensive storage solutions in the
infrastructure. Software for solutions, including data de-duplication and compression, are becoming
readily available. The resulting solutions provide the same efficiencies as purpose-built appliances, yet
these solutions fit comfortably in the service provider business model.
This POC was created to test virtualized solutions for data compacting. The measured data compaction
ratios for a large write into a LUN are .86, .68 and .60 for compression, de-duplication and de-
duplication plus compression, respectively. The data compaction ratios were independent of VM
resources. Both de-duplication and compression require significant compute resources and a good
amount of RAM to give acceptable read and write speed.
The POC demonstrates and documents how adding compute resources to storage arrays increase their
value in a service provider environment, and how compute resources can be used in the virtualized
infrastructure to create efficient data compacting storage solutions. More specifically, we have shown
that these solutions can be created, provisioned and utilized in a virtualized infrastructure designed for
self-service and self-provisioning. In the companion white paper, “Enabling Self-Service and Self-
Provisioning in an IT Infrastructure,” we have also outlined a new management stack that is required in a
service provider infrastructure in order to support the self-service and self-provisioning model, including
creating and provisioning the solutions described here.
We believe that implementation of this prototype will enable forward-thinking IT architects and
managers to reap the full benefit of virtualization and to operate far more efficiently and cost
effectively. It will also enable IT executives to re-organize and reshape their operations as corporate
service hosting providers. Under this model, the IT organization’s primary mission evolves to building
and managing IT infrastructures on behalf of business units, which are then charged-back via a self-
service model. This is by far the most effective way to organize corporate IT.
A Huginn Consulting Technical White Paper.
Page: 15
Disclaimers
The mention of any vendor’s name or specific product in this white paper does not imply any
endorsement of the vendor or product.
The products used in the proof of concept were selected based on consultation with the customer,
Genesis Hosting.
Other products can be incorporated in future efforts based on circumstances or goals.
Huginn Consulting was commissioned by NEC Corporation to build and evaluate the proof of concept
outcomes and to write this technical white paper.
Huginn Consulting
The Huginn team has a total of more than 50 years in product development and engineering and
business management in the field of IT. Huginn Consulting provides IT consulting services including
building and testing proof of concepts, technical concept evaluation, specification development,
requirement analysis, and prototype creation in the areas that include storage and data management.

More Related Content

What's hot

Hitachi compute blade 2000 executive overview
Hitachi compute blade 2000 executive overviewHitachi compute blade 2000 executive overview
Hitachi compute blade 2000 executive overviewHitachi Vantara
 
Vmware virtualization in data centers
Vmware virtualization in data centersVmware virtualization in data centers
Vmware virtualization in data centersHarshitTaneja13
 
Encoding Enhancers Woolpack virtualization services
Encoding Enhancers   Woolpack virtualization servicesEncoding Enhancers   Woolpack virtualization services
Encoding Enhancers Woolpack virtualization servicesAditi Shrivastava
 
A Complete Guide to Select your Virtual Data Center Virtual
A Complete Guide to Select your Virtual Data Center Virtual A Complete Guide to Select your Virtual Data Center Virtual
A Complete Guide to Select your Virtual Data Center Virtual Go4hosting Web Hosting Provider
 
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Whitepaper   Exchange 2007 Changes, Resilience And Storage ManagementWhitepaper   Exchange 2007 Changes, Resilience And Storage Management
Whitepaper Exchange 2007 Changes, Resilience And Storage ManagementAlan McSweeney
 
Data Center Transformation
Data Center TransformationData Center Transformation
Data Center TransformationArraya Solutions
 
Unified Compute Platform Pro for VMware vSphere
Unified Compute Platform Pro for VMware vSphereUnified Compute Platform Pro for VMware vSphere
Unified Compute Platform Pro for VMware vSphereHitachi Vantara
 
Storage virtualization: deliver storage as a utility for the cloud webinar
Storage virtualization: deliver storage as a utility for the cloud webinarStorage virtualization: deliver storage as a utility for the cloud webinar
Storage virtualization: deliver storage as a utility for the cloud webinarHitachi Vantara
 
IT service transformation with hybrid cloud: Buy or build?
IT service transformation with hybrid cloud: Buy or build?IT service transformation with hybrid cloud: Buy or build?
IT service transformation with hybrid cloud: Buy or build?Principled Technologies
 
Managing the move to virtualization and cloud
Managing the move to virtualization and cloudManaging the move to virtualization and cloud
Managing the move to virtualization and cloudBhaskar Jayaraman
 
Hyper-converged infrastructure
Hyper-converged infrastructureHyper-converged infrastructure
Hyper-converged infrastructureIgor Malts
 
Best cloud computing training institute in noida
Best cloud computing training institute in noidaBest cloud computing training institute in noida
Best cloud computing training institute in noidataramandal
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Cloud Management Mechanisms
Cloud Management MechanismsCloud Management Mechanisms
Cloud Management MechanismsSouparnika Patil
 
developing-highly-available-dynamic-hybrid-cloud-environment
developing-highly-available-dynamic-hybrid-cloud-environmentdeveloping-highly-available-dynamic-hybrid-cloud-environment
developing-highly-available-dynamic-hybrid-cloud-environmentTom Fieldhouse
 
Point of View -Converged Infrastructure
Point of View -Converged InfrastructurePoint of View -Converged Infrastructure
Point of View -Converged InfrastructureChaitanya Gaajula
 
IDC WHITE PAPER - IBM PureFlex System Ready for Cloud
IDC WHITE PAPER - IBM PureFlex System Ready for CloudIDC WHITE PAPER - IBM PureFlex System Ready for Cloud
IDC WHITE PAPER - IBM PureFlex System Ready for CloudAngel Villar Garea
 
Efficient Data Centers Are Built On New Technologies and Strategies
Efficient Data Centers Are Built On New Technologies and StrategiesEfficient Data Centers Are Built On New Technologies and Strategies
Efficient Data Centers Are Built On New Technologies and StrategiesCMI, Inc.
 

What's hot (20)

Hitachi compute blade 2000 executive overview
Hitachi compute blade 2000 executive overviewHitachi compute blade 2000 executive overview
Hitachi compute blade 2000 executive overview
 
Vmware virtualization in data centers
Vmware virtualization in data centersVmware virtualization in data centers
Vmware virtualization in data centers
 
Encoding Enhancers Woolpack virtualization services
Encoding Enhancers   Woolpack virtualization servicesEncoding Enhancers   Woolpack virtualization services
Encoding Enhancers Woolpack virtualization services
 
A Complete Guide to Select your Virtual Data Center Virtual
A Complete Guide to Select your Virtual Data Center Virtual A Complete Guide to Select your Virtual Data Center Virtual
A Complete Guide to Select your Virtual Data Center Virtual
 
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
Whitepaper   Exchange 2007 Changes, Resilience And Storage ManagementWhitepaper   Exchange 2007 Changes, Resilience And Storage Management
Whitepaper Exchange 2007 Changes, Resilience And Storage Management
 
Data Center Transformation
Data Center TransformationData Center Transformation
Data Center Transformation
 
Unified Compute Platform Pro for VMware vSphere
Unified Compute Platform Pro for VMware vSphereUnified Compute Platform Pro for VMware vSphere
Unified Compute Platform Pro for VMware vSphere
 
Storage virtualization: deliver storage as a utility for the cloud webinar
Storage virtualization: deliver storage as a utility for the cloud webinarStorage virtualization: deliver storage as a utility for the cloud webinar
Storage virtualization: deliver storage as a utility for the cloud webinar
 
IT service transformation with hybrid cloud: Buy or build?
IT service transformation with hybrid cloud: Buy or build?IT service transformation with hybrid cloud: Buy or build?
IT service transformation with hybrid cloud: Buy or build?
 
Managing the move to virtualization and cloud
Managing the move to virtualization and cloudManaging the move to virtualization and cloud
Managing the move to virtualization and cloud
 
Cisco data center
Cisco data centerCisco data center
Cisco data center
 
Hyper-converged infrastructure
Hyper-converged infrastructureHyper-converged infrastructure
Hyper-converged infrastructure
 
Best cloud computing training institute in noida
Best cloud computing training institute in noidaBest cloud computing training institute in noida
Best cloud computing training institute in noida
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Cloud Management Mechanisms
Cloud Management MechanismsCloud Management Mechanisms
Cloud Management Mechanisms
 
developing-highly-available-dynamic-hybrid-cloud-environment
developing-highly-available-dynamic-hybrid-cloud-environmentdeveloping-highly-available-dynamic-hybrid-cloud-environment
developing-highly-available-dynamic-hybrid-cloud-environment
 
Point of View -Converged Infrastructure
Point of View -Converged InfrastructurePoint of View -Converged Infrastructure
Point of View -Converged Infrastructure
 
ENERGY EFFICIENCY IN CLOUD COMPUTING
ENERGY EFFICIENCY IN CLOUD COMPUTINGENERGY EFFICIENCY IN CLOUD COMPUTING
ENERGY EFFICIENCY IN CLOUD COMPUTING
 
IDC WHITE PAPER - IBM PureFlex System Ready for Cloud
IDC WHITE PAPER - IBM PureFlex System Ready for CloudIDC WHITE PAPER - IBM PureFlex System Ready for Cloud
IDC WHITE PAPER - IBM PureFlex System Ready for Cloud
 
Efficient Data Centers Are Built On New Technologies and Strategies
Efficient Data Centers Are Built On New Technologies and StrategiesEfficient Data Centers Are Built On New Technologies and Strategies
Efficient Data Centers Are Built On New Technologies and Strategies
 

Similar to The End of Appliances

Improving the Latency Value by Virtualizing Distributed Data Center and Auto...
Improving the Latency Value by Virtualizing Distributed Data  Center and Auto...Improving the Latency Value by Virtualizing Distributed Data  Center and Auto...
Improving the Latency Value by Virtualizing Distributed Data Center and Auto...IOSR Journals
 
Introduction to Information Storage.pptx
Introduction to Information Storage.pptxIntroduction to Information Storage.pptx
Introduction to Information Storage.pptxNISHASOMSCS113
 
Unleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdf
Unleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdfUnleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdf
Unleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdfGet2 knowit
 
cloud services and providers
cloud services and providerscloud services and providers
cloud services and providersKalai Selvi
 
Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...
Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...
Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...IBM India Smarter Computing
 
Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure HTS Hosting
 
IDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudIDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudEMC
 
Optimize your virtualization_efforts_with_a_blade_infrastructure
Optimize your virtualization_efforts_with_a_blade_infrastructureOptimize your virtualization_efforts_with_a_blade_infrastructure
Optimize your virtualization_efforts_with_a_blade_infrastructureMartín Ríos
 
Cscc cloud-customer-architecture-for-e commerce
Cscc cloud-customer-architecture-for-e commerceCscc cloud-customer-architecture-for-e commerce
Cscc cloud-customer-architecture-for-e commercer_arorabms
 
M.S. Dissertation in Salesforce on Force.com
M.S. Dissertation in Salesforce on Force.comM.S. Dissertation in Salesforce on Force.com
M.S. Dissertation in Salesforce on Force.comArun Somu Panneerselvam
 
Algorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in CloudAlgorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in CloudIRJET Journal
 
IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...
IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...
IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...benzfire
 
Forrester report rp-storage-architectures
Forrester report rp-storage-architecturesForrester report rp-storage-architectures
Forrester report rp-storage-architecturesReadWrite
 
VMworld 2013: Architecting the Software-Defined Data Center
VMworld 2013: Architecting the Software-Defined Data Center VMworld 2013: Architecting the Software-Defined Data Center
VMworld 2013: Architecting the Software-Defined Data Center VMworld
 
Cloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesCloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesAl Sabawi
 

Similar to The End of Appliances (20)

E0332427
E0332427E0332427
E0332427
 
Improving the Latency Value by Virtualizing Distributed Data Center and Auto...
Improving the Latency Value by Virtualizing Distributed Data  Center and Auto...Improving the Latency Value by Virtualizing Distributed Data  Center and Auto...
Improving the Latency Value by Virtualizing Distributed Data Center and Auto...
 
Performance Evaluation of Virtualization Technologies for Server
Performance Evaluation of Virtualization Technologies for ServerPerformance Evaluation of Virtualization Technologies for Server
Performance Evaluation of Virtualization Technologies for Server
 
Introduction to Information Storage.pptx
Introduction to Information Storage.pptxIntroduction to Information Storage.pptx
Introduction to Information Storage.pptx
 
Unleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdf
Unleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdfUnleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdf
Unleashing Power_ A Deep Dive into the Benefits of Server-Based Computing.pdf
 
cloud services and providers
cloud services and providerscloud services and providers
cloud services and providers
 
Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...
Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...
Clabby Analytics White Paper: Beyond Virtualization: Building A Long Term Inf...
 
Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure Insight into Hyperconverged Infrastructure
Insight into Hyperconverged Infrastructure
 
Network Virtualization
Network Virtualization Network Virtualization
Network Virtualization
 
IDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private CloudIDC: Selecting the Optimal Path to Private Cloud
IDC: Selecting the Optimal Path to Private Cloud
 
Optimize your virtualization_efforts_with_a_blade_infrastructure
Optimize your virtualization_efforts_with_a_blade_infrastructureOptimize your virtualization_efforts_with_a_blade_infrastructure
Optimize your virtualization_efforts_with_a_blade_infrastructure
 
Cscc cloud-customer-architecture-for-e commerce
Cscc cloud-customer-architecture-for-e commerceCscc cloud-customer-architecture-for-e commerce
Cscc cloud-customer-architecture-for-e commerce
 
M.S. Dissertation in Salesforce on Force.com
M.S. Dissertation in Salesforce on Force.comM.S. Dissertation in Salesforce on Force.com
M.S. Dissertation in Salesforce on Force.com
 
Algorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in CloudAlgorithm for Scheduling of Dependent Task in Cloud
Algorithm for Scheduling of Dependent Task in Cloud
 
IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...
IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...
IBM Offers ISVs a Fast Track for Virtual Appliance Deployment on New IBM Pure...
 
8 Strategies For Building A Modern DataCenter
8 Strategies For Building A Modern DataCenter8 Strategies For Building A Modern DataCenter
8 Strategies For Building A Modern DataCenter
 
Cloud Computing Improving Organizational Agility
Cloud Computing Improving Organizational AgilityCloud Computing Improving Organizational Agility
Cloud Computing Improving Organizational Agility
 
Forrester report rp-storage-architectures
Forrester report rp-storage-architecturesForrester report rp-storage-architectures
Forrester report rp-storage-architectures
 
VMworld 2013: Architecting the Software-Defined Data Center
VMworld 2013: Architecting the Software-Defined Data Center VMworld 2013: Architecting the Software-Defined Data Center
VMworld 2013: Architecting the Software-Defined Data Center
 
Cloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium BusinessesCloud Computing for Small & Medium Businesses
Cloud Computing for Small & Medium Businesses
 

The End of Appliances

  • 1. A Huginn Consulting Technical White Paper. Page: 1 The End of Appliances Creating Compute Intensive Storage Solutions for Integrated IT Infrastructures September, 2011
  • 2. A Huginn Consulting Technical White Paper. Page: 2 Executive Summary Improving IT efficiencies, reducing costs and simplifying management are common goals in enterprise data centers. IT service providers play a key role in helping businesses achieve these goals. When considering IT Infrastructure as a service (IaaS) environments, reliability and uptime are critical in these demanding environments. The impact of a single failure can be devastating to the multitude of businesses using the hosting service. As such, our goal was to create a fully integrated IT infrastructure and, through testing, isolate and quantify the performance gains of adding memory and compute power to the server and storage systems. A team of engineers and architects from Huginns Consulting and Genesis Hosting, an IT IaaS provider, built and tested a prototype of a truly integrated IT infrastructure environment. This Proof of Concept (POC) demonstrated the steps for building and deploying an infrastructure for a genuinely responsive, service-oriented IT organizations in 90 days or less. Complete with servers, networking, and storage, the model also integrated three diverse and essential IT control elements: self-provisioning, self-service and automated management. Our POC shows how to create efficient and high performance compute and storage intensive solutions—including data de-duplication and compression from generic storage, server and network resources—into a virtualized and integrated IT infrastructure. These solutions fit comfortably into the service provider’s business model, and can easily be integrated into the existing management framework. “IT infrastructure is moving quickly toward becoming delivered through a service model. Machines are becoming virtual, running in secure data centers on large, partitionable machines. Self-provisioned virtual IT resources are the key to success for a service model, which requires all aspects of the physical hardware to be abstracted or partitioned with permissions and given only to a tenant of the service.” Eric Miller, CEO, Genesis Hosting
  • 3. A Huginn Consulting Technical White Paper. Page: 3 To begin, we defined the drivers behind the prototype POC to include three critical aspects for IT service provisioning: 1. The business model for IaaS providers must be centered on a self-service, self-provisioning model. 2. To effectively manage and share these solutions across multiple customers and applications in a self-service hosting environment, the hosting infrastructure needs to be fully virtualized at all levels. 3. Self-service in a shared service hosting environment requires a new management stack. We tested two aspects of the integrated infrastructure model. First, we tested the requirements for the management stack in an IaaS service provider infrastructure through the design of a prototype self- service management facility. The results of this study can be found in the Huginn Consulting report titled, “Enabling Self-Service and Self-Provisioning in an IT Infrastructure.” The second test analyzed the creation of compute and storage intensive solutions in this infrastructure. These solutions must be available for self-provisioning within the tenant’s VDC (virtualized data center), which we will explore further in this study. The POC was tested in a service provider environment at Genesis Hosting using an infrastructure model identical to Genesis’ deployed infrastructure. Findings from this POC validated the proposition that computing resources can be added to storage to create much more cost-effective storage systems and solutions in IaaS environments: 1. A two-thirds reduction in storage consumption can be realized from intensifying the compute resources dedicated to storage management and operation. 2. Adding compute resources to storage arrays can improve the resilience of storage infrastructure by supporting full performance with RAID-6 protection. 3. Array recovery and maintenance operations time, which includes RAID rebuild time and LUN expansion can be reduced by approximately 80%. The Service Hosting Business Profitable and efficient service hosting relies on large-scale infrastructures, a high degree of utilization of the resources, and a completely integrated infrastructure based on virtualization of servers and storage. The service provider builds and manages the shared infrastructure. A self-service UI and a virtualization client enable the tenant to provision and manage his leased virtual data center, which is a collection of VMs (virtual machines) with storage and resources running licensed applications or solutions. The resources are leased time based.
  • 4. A Huginn Consulting Technical White Paper. Page: 4 The scale, utilization, efficiency and simplified management of this infrastructure enables the service provider to provide IT services that are lower than the cost of running these services in house1 New Technologies . Computational power in the form of CPU cores and RAM memory are becoming abundant at lower price points. In a fully virtualized IT infrastructure that supports the self-service service model, these resources can be applied easily, even in storage intensive solutions. Data de-duplication and compression have long been used in secondary storage applications, backup, archiving and more. These technologies are now available in software products, including open source software. The availability of compute power, and faster and cheaper RAM makes these technologies applicable for primary storage applications with surprising affordability and performance. Compute Intensive Storage Solutions for Self-Service The service provider’s business model relies on being able to support customers with a single, virtualized and integrated infrastructure. The customers’ needs are met by creating and running the customer’s VDC in the infrastructure. All applications, solutions and services are implemented as VMs with provisioned servers, storage and other resources from the infrastructure’s resource pool. Compute intensive storage solutions must follow the same model in order to meet the service providers’ requirements. Solutions created by first provisioning a VM with generic server, network and storage resources, then installing the software into this VM, are ideally suited for the IaaS business model. These new solutions can be added simply by integrating the new software into the licensing and billing system, making these available for self-service by the tenant, and adding generic storage and server capacity to the existing resource pools. Solutions that rely on adding hardware appliances built for a specific purpose to the infrastructure are much more difficult to integrate. Purpose built appliances also lead to less than optimal scaling because they often have interfaces not optimized for this type of deployment. New products must be added as separate resource pools, which results in a management framework that gets progressively more complex. Resource efficiency is also compromised as it becomes more difficult to limit idle resource capacity. 1 For a more detailed overview of the service provider business model and the required infrastructure, please refer to the companion report: “Enabling Self-Service and Self-Provisioning in an IT Infrastructure.”
  • 5. A Huginn Consulting Technical White Paper. Page: 5 Protection, Protection, Protection Service hosting and virtualization implies an increased concentration of tenants and users supported by a single physical infrastructure. As a result, the effects of failures and data losses are amplified. A single failure can be catastrophic for thousands or even tens of thousands of users or numerous organizations. Service hosting companies are even more at risk of failure; therefore they must provide much stronger protection of customers’ data. As a consequence, all data must be carefully protected against double disk failures, as well as other storage-related failure modes. This requires increasing storage system computational power. Genesis Hosting exclusively deploys storage arrays configured with RAID-6- to protect its customers against double disk failures. Infrastructure for Self-Service Solutions Effective service hosting with a self-service model implies virtualization at all levels of the infrastructure: storage, networking and servers. This is the only way that shared physical compute and storage resources can be integrated into one infrastructure; this approach enables the most scalable, flexible platform for IT solutions. This infrastructure can then be partitioned into the logical application entities or solutions that customers can provision and manage for themselves. Self-service enables service hosting organizations to build a single scalable and integrated infrastructure of servers, storage arrays, network equipment and more—all of which can be managed from a single console. Customers are able, through service portals and management clients, to provision, implement and manage their own VDCs. The entire premise of this POC revolves around the integrated multi-layered infrastructure enabled through virtualization. The focus of this report is to document two aspects of the POC: 1. The requirements for storage to be deployed in a self-service infrastructure. 2. The creation of more efficient storage solutions within the framework of the self-service infrastructure. Designing a Proof of Concept The hypothesis that intensifying the compute resources in storage solutions, both in VM-based solutions as well as inside the underlying storage, was shaped by the following questions: 1. How can compute resources be used to improve the performance, efficiency and user experience of storage solutions? What types of compute intensive storage solutions can be created? Figure 1: POC Physical Configuration
  • 6. A Huginn Consulting Technical White Paper. Page: 6 2. Which architecture is required by an IT infrastructure that supports these solutions and the self- service hosting model? Structure and Organization The prototype infrastructure for the POC was constructed at Genesis Hosting’s facilities. The architecture was chosen to mirror Genesis’ production infrastructure where Genesis’ customers are provisioning and building their VDCs and running their applications. In fact, the POC team operated as a typical Genesis customer. The prototype was configured as a VDC where the team members provisioned resources and built the VMs used for the compute and storage intensive services tests in this POC. The Hardware Components • NEC Express5800/A1080a (GX) server: The server is configured with four compute modules, each with two Intel “Westmere” processors and 128 GB of RAM. • The new NEC M100 and the previous generation NEC D4 storage arrays. Both were configured with 7.2K RPM SATA disks. Performance of the two arrays was used as a measure for the benefit of increasing the compute and storage intensity factors. o All tests were run on RAID-6 configured LUNs. o Two disk configurations were used: 6 disks, 12 disks. • The Qlogic 8/4 GBit FC switch connected servers and storage. • NEC 1Gbit Ethernet ProgrammableFlow (PF) switch provided connectivity for system management. The Software Stack • All software was run on VMs in VMware vSphere 4.1 environments on the NEC GX server. • The Blackball Search-In Software indexing engine and the Microsoft Exchange JetStress load generator were run on VMs with Microsoft Server 2008 R2. • The NexentaStor (version 3.1.1) software was installed as a virtual appliance in vSphere.
  • 7. A Huginn Consulting Technical White Paper. Page: 7 The POC Prototype—Storage for Service Hosing The compute and storage intensive prototype includes NEC’s M100 storage array, data compaction solutions, a file system indexing solution, and vCenter for cloning VMs and VM templates. Each of the two controllers for the M100 includes the new high performance Jasper Forest processor from Intel and 8GB of RAM. This is considerably more compute capacity than typical arrays; this configuration of resources is required for deployments in a shared service hosting infrastructure as it provides maximum protection while maintaining full service levels to the users. Data Compaction The data compaction solutions (Figure 2) were used to test the effectiveness and the performance of in-line data de-duplication and compression in a solution stack. The solution is based on the NexentaStor virtual appliance. Data writes and reads to and from the LUN exported by NexentaStor are compacted or expanded in real-time. NexentaStor uses the array for storing the compacted data. The compaction and expansion performance and efficiency, as well as the overall solution performance, are tested by changing the configuration of the VMs and the underlying storage configuration. Two storage configurations were used: 6 and 12 disks. Testing the Solutions The prototype workloads were designed to test the performance and efficiency of two compute intensive storage solutions. The first solution tested was the new NEC M100 storage array with considerably more compute performance compared to previous generation products. The second was a set of compute intensive storage solutions created in the virtualized POC infrastructure by using the NexentaStor software. The tests were run using storage allocated directly from the storage array, and with the storage allocated from the de-duplicated or compressed LUNs exported by NexentaStor. VM Cloning VM cloning was used to test bulk read and write performance of storage solutions. vSphere was used to clone the VM with the file system to and from the compacting LUN exported by NexentaStor. The Figure 2: Logical Data Compacting Configuration
  • 8. A Huginn Consulting Technical White Paper. Page: 8 respective source or destination was a 12 disk SAS data store on the NEC D4 storage array, which delivered much higher I/O rates than Nexenta or SATA. Two tests were performed. First, the 62.2 GB VM with the file system used in the indexing test was cloned. Second, another 9.2 GB VM template with only the Microsoft Server 2008 R2 OS was cloned to see the differential efficiency and performance after the first OS instance had been written. File System Indexing The Blackball indexing engine generates a very small index, only ~2% of the indexed data. The file indexing solution was therefore used to test the file read performance of the compacted LUNs. The indexing engine performed indexing of the file system in its VM. The file system consisted of a total of 53 GB of file data, including text, email, music, images, a document archive and more. Email Performance and Resiliency The Exchange 2010 JetStress load testing tool was used to measure the performance of a storage subsystem for a synthetic Exchange email workload. Since JetStress generates its own data, it was not used to test the performance of compacted storage. JetStress determines sustained performance in Microsoft Exchange IOPS (input/output per second), i.e., the total number of Exchange reads and writes to the storage subsystem. The number includes message and log file I/O, and uses a fixed ratio between reads and writes.
  • 9. A Huginn Consulting Technical White Paper. Page: 9 Outcomes The following outcomes related to compute and storage intensive solutions were generated in this POC. Data Compaction We investigated the bulk write and read performance and the efficiency of the data compaction solutions2 Three configurations of the NexentaStor VM were used in testing: . The cloned file indexing VM – including the file system data, and a barebones VM template – was cloned to the compacting LUN exported by NexentaStor. • 4 cores, 8 GB RAM, 6 disk LUN • 8 cores, 32 GB RAM, 6 disk LUN • 8 cores, 32 GB RAM, 12 disk LUN Table 1: Write Performance and Compacting Ratios for Compacting LUN CPU Cores RAM (GB) Disk Set Size Compaction Type Clone Time (Seconds) Compaction Ratio Clone Time (Normalized) Clone Write Rate (MB/s) 4 8 6 disks None 613 1.00 1.00 101 4 8 6 disks Dedupe 5362 0.68 8.75 12 4 8 6 disks Compress 1398 0.86 2.28 44 4 8 6 disks D+C 3635 0.60 5.93 17 8 32 6 disks None 420 1.00 1.00 148 8 32 6 disks Dedupe 1720 0.68 4.09 36 8 32 6 disks Compress 464 0.86 1.10 134 8 32 6 disks D+C 1196 0.60 2.85 52 8 32 12 disks None 421 1.00 1.00 147 8 32 12 disks Dedupe 1274 0.68 3.03 49 8 32 12 disks Compress 469 0.86 1.11 133 8 32 12 disks D+C 1144 0.60 2.72 54 We also tested the read performance of the compacting LUNs. The test was performed by cloning the VM from the compacting LUNs to a SAS LUN on the NEC D4 storage array. 2 See the File Indexing section for the relative file system read performance.
  • 10. A Huginn Consulting Technical White Paper. Page: 10 Table 2: Read Performance from a Compacting LUN CPU Cores RAM (GB) Disk Set Size Compaction Type Read Rate (MB/s) 8 32 12 disks None 90 8 32 12 disks Dedupe 74 8 32 12 disks Compress 86 8 32 12 disks Compress+dedupe 63 Table 3 shows the results for cloning a second VM to the same compacting LUN. The first row is for the same VM as the first, the row is for a VM template with only the Windows 2008 Server R2 guest OS installed. The size of the VM template is 9.2 GB. Table 3: Incremental VM Cloning Performance CPU Cores RAM (GB) Disk Set Size Data Set (GB) Compaction Type Clone Time(s) Compactio n Ratio Clone Time (Normalized) Clone Write Rate (MB/s) 8 32 12 disks 62.2 Dedupe 2010 0.03 4.78 21 8 32 12 disks 9.2 Dedupe 250 0.03 4.00 37 Results can be summarized as follows: • For the initial writing of the VM with data to the compacting LUN, the resulting compacted data set varied from 86% (compression), 68% (dedupe) and 60% (compress + dedupe). These compaction ratios were independent of VM resources or storage subsystem. • Both compression and de-duplication benefit significantly from the increased number of cores and the amount of RAM. The relative performance degradation is reduced by a factor of 2 when increasing VM resources from 4 to 8 cores, and from 8 to 32 GB of RAM. • Write speed to a compressed LUN is 90% when compared to non-compressed LUN. • Write speed to a de-duplicated LUN is 33% when compared to non-compressed LUN. • Write speed to a compressed and de-duplicated LUN is 36% when compared to non-compressed LUN. • Read performance for compacted LUNs is much closer to non-compacted read performance. The performance reduction is only 18%, 5% and 30% for dedupe, compressed and compress + dedupe, respectively. • The incremental cloning of both the full VM and the VM template was compacted to only 3.2% of original size, from 62.2 to 2 GB and from 9.2 GB to .3 GB, respectively.
  • 11. A Huginn Consulting Technical White Paper. Page: 11 File Indexing Table 4 presents the indexing performance for file indexing of a file system stored in un-compacted RAID-6 storage, de-duplicated storage, and compressed storage. These tests all involved compacting storage; the NexentaStor compacting VM was configured with 8 cores and 32 GB RAM. Table 4: Indexing Performance Cores RAM (GB) Disk Configuration Compaction Type Elapsed Time (h:m) Improvement Relative to Initial Configuration Improvement Relative to “No-compact” 2 4 6 disks None 4:39 0% 0% 2 4 6 disks De-duped 4:30 3.3% 3% 4 16 6 disks None 4:39 0% 0% 4 16 6 disks De-duped 4:30 3.3% 3% 4 16 12 Disk None 4:00 14% 0% 4 16 12 Disk De-duped 4:34 1.8% (14.2%) 4 16 12 Disk Compressed 5:04 (8.9%) (26.7%) 4 16 12 Disk Compress+Dedupe (1) 3:52 16% 3.3% 8 64 12 Disk None (1) 4:15 8.6% 0% 8 64 12 Disk De-duped 3:50 17.6% 9.8% 8 64 12 Disk Compressed 4:14 8.9% 0.4% 8 64 12 Disk Compress+Dedupe 4.28 3.9% (5.1%) Results can be summarized as follows: • De-duplicated storage indexes are marginally faster than for non-compacting storage. This is most likely due to caching in the NexentaStor engine.3 • Compressed storage indexes 20% slower than for non-compacting storage.4 • Increasing the number of cores and amount of RAM improves indexing performance between 5-20%. We believe the increased RAM is the most significant factor. • Doubling the number of disks from 6 to 12 improves non-compacted performance by 17%. 3 There are some inconsistencies in the measured results. Further or repeated tests are required to understand the source and significance. 4 The performance measurements give a clear indication of the general read performance of compacting storage.
  • 12. A Huginn Consulting Technical White Paper. Page: 12 Data Compaction Summary When a VM that performs the compaction or expansion has sufficient compute and memory resources, we observed interesting results. A data de-duplicated LUN shows significant degradation in write performance, but equal or better than equal read performance when compared to non-compacted LUN. De-duplication of an initial large data set typically reduces data to 2/3 of the original size. A LUN containing a large number of largely identical data sets, such as the VM system disks in virtualized infrastructure will see much higher compaction ratios. A compressed LUN shows only a small read and write performance degradation. Data is compacted to 87% of original size. Combining de-duplication and compression yields the highest data compaction ration, 40%, but both read and write performance is significantly reduced. This reduction can be mitigated by adding more CPU power or RAM. Data de-duplication is better suited for read intensive workloads, e.g., document archives or storage for the system disk when the system (OS and application code) and the application data are separated into separate (virtual) disks. Compression is better suited for write intensive workloads. The data compaction ratio is smaller than for de-duplication. These findings show that for the right applications, computing resources can be applied with great benefit and can have a dramatic impact on the data footprint and the system performance, as well as the overall economies of IT operations. While this solution benefits significantly from increasing the number of cores and amount of RAM, compression performed well with 4 cores and 8 GB of RAM. Email JetStress was used to test the sustained performance of the array. In the prototype, the JetStress load ran for 1 hour. Table 5 presents the sustained performance observed when using JetStress to generate the simulated Exchange email workload. Table 5: Simulated Exchange Back-end Throughput Cores RAM (GB) Disk Configuration Compaction Exchange Storage IOPS Relative Increase 4 8 6 disks (RAID-6) None 160 0% 8 32 6 disks (RAID-6) None 166 4% 8 32 12 disks (RAID-6) None 279 68% Results can be summarized as follows: The Exchange workload is I/O bound. There were only minor improvements between the small (4 cores, 8 GB RAM) and the large (8 cores, 32 GB RAM) VM configurations. The performance increased by a factor 1.68 when changing from 6 to 12 disks in the array for un-compacted storage.
  • 13. A Huginn Consulting Technical White Paper. Page: 13 The Storage Array We ran all tests on the NEC M100 array and on the previous generation D4 storage arrays. The following results indicate that the M100 is a good candidate for service hosting environments. • The M100 delivered high performance when all disk groups were configured for RAID-6. • The prototype ran two high load performance tests (JetStress + data de-duplication) concurrently on the M100. There was no observed degradation on either test when compared to running these tests independently. The M100, even when configured for RAID-6, has ample compute power to support the JetStress and the data compaction tests in parallel without any measurable performance effect on either test. • The extra compute power in the M100 improves the performance of recovery operations, such as LUN rebuild and maintenance operations including disk group expansion. The staging and configuring of the POC system indicates that the M100 is up to 5X faster than the D4, but these operations have less impact on running applications. Compute Intensive Solutions for Self-Provisioning. Our POC shows how to create efficient and high performance compute and storage intensive solutions—including data de-duplication and compression from generic storage, server and network resources—into a virtualized and integrated IT infrastructure. These solutions fit into the service provider’s business model. They can be integrated easily into the existing management framework and included in the list of solutions, software, etc. that is available for self-provisioning by the tenant. The above tests show how a service provider can create a set of new storage solutions with different benefits from simply using available software products with the server, storage and networking resources already in use. These solutions are made available to the IaaS customers as new types of storage pools in the existing self-service framework. The solutions are created simply by creating a new VM and installing software, or in this case installing a pre-created “virtual appliance,” provisioning the required compute and RAM resources, then making the compacting LUNs available as new data stores that the tenant can allocate as storage for their VMs. For an IaaS service provider with a self-service based service offering, it is critical that the solutions can be built from generic resources of existing resource pools. This assures simple integration into the infrastructure and the self-service model, and assures a high level of resource utilization. All of this is essential when running a service provider operation. Conclusion IT service providers rely on the self-service model in order to remain efficient, competitive and profitable. The most cost-effective solutions are virtualized solutions that can be supported directly by
  • 14. A Huginn Consulting Technical White Paper. Page: 14 the service provider’s virtualized and integrated infrastructure. No purpose-built hardware appliances are required. Created with VMs and generic resources like CPU cores, RAM, and storage, these solutions can be made available easily to tenants in a self-service service hosting environment. The falling price point of computing resources, including CPU cores and RAM make it cost-effective to use compute power to create more effective storage solutions. These resources are integrated into the arrays to provide better performance and resiliency. In addition, as environments take on more users and customers, the increased number of users or applications supported by a single array requires a higher degree of protection against failures, and shorter time/ less client impact when carrying out recovery or maintenance operations. All of these data protection and disaster recovery imperatives require more compute power. One can also apply compute resources to create compute-intensive storage solutions in the infrastructure. Software for solutions, including data de-duplication and compression, are becoming readily available. The resulting solutions provide the same efficiencies as purpose-built appliances, yet these solutions fit comfortably in the service provider business model. This POC was created to test virtualized solutions for data compacting. The measured data compaction ratios for a large write into a LUN are .86, .68 and .60 for compression, de-duplication and de- duplication plus compression, respectively. The data compaction ratios were independent of VM resources. Both de-duplication and compression require significant compute resources and a good amount of RAM to give acceptable read and write speed. The POC demonstrates and documents how adding compute resources to storage arrays increase their value in a service provider environment, and how compute resources can be used in the virtualized infrastructure to create efficient data compacting storage solutions. More specifically, we have shown that these solutions can be created, provisioned and utilized in a virtualized infrastructure designed for self-service and self-provisioning. In the companion white paper, “Enabling Self-Service and Self- Provisioning in an IT Infrastructure,” we have also outlined a new management stack that is required in a service provider infrastructure in order to support the self-service and self-provisioning model, including creating and provisioning the solutions described here. We believe that implementation of this prototype will enable forward-thinking IT architects and managers to reap the full benefit of virtualization and to operate far more efficiently and cost effectively. It will also enable IT executives to re-organize and reshape their operations as corporate service hosting providers. Under this model, the IT organization’s primary mission evolves to building and managing IT infrastructures on behalf of business units, which are then charged-back via a self- service model. This is by far the most effective way to organize corporate IT.
  • 15. A Huginn Consulting Technical White Paper. Page: 15 Disclaimers The mention of any vendor’s name or specific product in this white paper does not imply any endorsement of the vendor or product. The products used in the proof of concept were selected based on consultation with the customer, Genesis Hosting. Other products can be incorporated in future efforts based on circumstances or goals. Huginn Consulting was commissioned by NEC Corporation to build and evaluate the proof of concept outcomes and to write this technical white paper. Huginn Consulting The Huginn team has a total of more than 50 years in product development and engineering and business management in the field of IT. Huginn Consulting provides IT consulting services including building and testing proof of concepts, technical concept evaluation, specification development, requirement analysis, and prototype creation in the areas that include storage and data management.