Using software for tasks that range from RAID level processing to
iSCSI traffic aggregation, Rasilient’s RASTOR 7500 iSCSI storage
array provides I/O streaming and database applications with
enterprise quality at value prices.
RASILIENT Systems designs and manufacturers the RASTOR line of open-
architecture, storage solutions that integrate custom-tuned Linux software.
RASTOR storage arrays rely on software to provide RAID redundancy, provision
virtual disk volumes, and manage iSCSI connectivity. For enterprise-class
networking, the RASILIENT software balances iSCSI traffic load and failover
using multiple NICs, while simplifying client management by presenting a single
RASILIENT’s load-balancing software works at the iSCSI network layer using an
iSCSI protocol redirect feature, which was created to handle target devices that
were busy or undergoing maintenance. This technique is compatible with
Microsoft’s MPIO implementation on client initiators, for which RASILIENT also
provides support so that clients can establish multiple connections for failover
Reliance on software allows RASILIENT to easily mix and match hardware
modules in order to tune system options. In particular, one of the RASTOR 7500
systems tested by openBench Labs utilized high rotational speed (15000-rpm)
SAS disks to provide an optimal environment for applications oriented towards
transaction-processing such as MS Exchange or Oracle. On the other hand, our
second RASTOR 7500 came equipped with 24 700GB SATA drives for an
optimal platform for digital content and Web 2.0 applications for which large files
and streaming I/O are defining characteristics. As a result, RASILIENT can
provide OEMs with a low-cost, enterprise-class storage solution.
SAN technology has long been the premier means of consolidating storage
resources and streamlining management in large data centers. For many mid-tier
companies, the rapid adoption of 1Gigabit per second Ethernet has revived the
notion of implementing a low cost iSCSI SAN in their IT production environments.
Meanwhile, many large enterprise sites are starting to employ iSCSI to extend
provisioning and management savings to remote severs. In fact, this revival of
iSCSI has driven its growth rate to near triple digit levels.
Also fueling growing iSCSI utilization is the adoption of server virtualization. In a
VMware® Virtual Infrastructure (VI) environment, storage virtualization is a much
simpler proposition. The VMware ESX file system (VMFS) eliminates the issue of
exclusive volume ownership. More importantly, an advanced VMware
environment is dependent on shared storage. In particular, VMware can uses
shared iSCSI SAN storage to liberate IT operations from the limitations of backup
windows via VMware Consolidated Backup and to balance active virtual machine
(VM) workloads among multiple ESX servers via VMware VMotion. This
fundamental reliance of VI features on a shared storage infrastructure opens the
door for iSCSI as the backbone of a cost-effective lightweight SAN.
To assess the RASTOR 7500, openBench Labs set up two test scenarios that
represent frequent storage issues at mid-tier business sites. In the first, we
focused on the I/O functionality and performance needed to support digital
content creation, editing, and distribution. In the second, we looked at a storage
consolidation scenario underpinned by transaction-oriented database I/O.
In these test scenarios, openBench Labs utilized two RASTOR 7500 Storage
Systems. Both storage servers featured a 2U single controller system built on a
single dual-core AMD Athlon CPU. For higher availability and throughput, the
RASTOR 7500 also supports a dual active-active controller configuration. In
addition, each RASTOR controller system had a secondary storage shelf with 12
more drives. Finally for high-throughput SAN connectivity, each controller
featured six 1Gb-per-second Ethernet NICs.
What distinguished the two arrays arose from the disk drives that populated each
array. The first storage array sported 24 Seagate SATA drives. The rotational
speed of each SATA drive was 7,200 rpm and its unformatted capacity was just
over 700GB. On the other hand, the other system housed 24 Seagate SAS
drives. Each SAS drive sported a rotational speed of 15,000 rpm and a data
capacity was just over 146GB.
With software-based RAID and iSCSI load balancing core elements of the
RASTOR value proposition, we also configured a 2Gb-per-second Fibre Channel
array that featured hardware-based RAID and 15,000 rpm FC drives as a
baseline comparison. With 2Gb-per-second devices the most prevalent
infrastructure at sites with a Fibre Channel SAN in place, the nStor 4540 Storage
Array served as an excellent base of comparison for an iSCSI array targeting
mid-tier business sits with enterprise throughput capabilities.
A system administrator configures a RASTOR 7500 via a highly intuitive and
easy to use Web-based interface. The first step in this process is to assign
physical addresses to the RASTOR system, as well as, to each individual NIC
that that will be utilized in load-balanced iSCSI traffic.
More importantly, the RASILIENT software load balances storage traffic using
the iSCSI protocol rather than by teaming NICs in a Level 2 networking scheme,
which can suffer measurable overhead when reassembling out-of-order TCP
segments transmitted over various team members. As a result, many NIC
teaming schemes put packets from the same TCP session on the same physical
NIC and that prevents “n” cards working as one logical card from providing “n”
times the performance gain.
On the other hand, RASILIENT’s explicit storage-oriented approach to iSCSI load
balancing is highly focused on performance. RASILIENT starts with support of
jumbo TCP packets for optimal network throughput. Nonetheless, it is the
utilization of a storage protocol rather than TCP segments that sets the
RASILIENT load-balancing scheme apart from many competitors and makes it
completely compatible with Microsoft’s MPIO and MCS (multiple client session)
In effect, RASLIENT stripes iSCSI packets across all NICs for full and effective
load balancing. In that way, multiple clients can utilize full gigabit throughput
when connected to logical disks exported by the RASTOR 7500. What's more, by
supporting active-active MPIO connections, RASILIENT ensures high-availability
iSCSI sessions for clients. As a result, enterprise-class clients can leverage all of
their own advanced throughput and redundancy capabilities to maximize the
benefits of an iSCSI SAN.
For this assessment, openBench Labs employed quad-core Dell PowerEdge
1900 servers running the 64-bit version of Windows Server® 2003 R2 SP2 on
the client side. In each test server, we also installed a dual-port QLogic 4052C
iSCSI HBA to minimize overhead and maximize I/O throughput. With both a TCP
and an iSCSI offload engine, the QLogic iSCSI HBA eliminated TCP packet
processing and iSCSI protocol processing, which can be prodigious if enhanced
data security is invoked on the iSCSI packets via header and data digest CRC
To enhance the resilience of our iSCSI SAN, we leveraged the RASTOR array's
support of MPIO to invoke port failover on our iSCSI HBA. In particular, we used
version 2.06 of the Microsoft® iSCSI initiator in conjunction with the QLogic
iSCSI HBA, which the Microsoft software initiator immediately recognized. With
active-active connections and a round robin failover policy—the default is an
active-passive fail-over configuration—failover was instantaneous and we were
not able to measure any degradation in throughput when a connection was
From portable medical records, to security surveillance, and even high
definition video, a torrent of new data sources continues to feed the burgeoning
volume of data stored on disk. Video postproduction for standard-definition
content has moved from tape (linear access) to disk (non-linear access) and in
the process became a popular example of a Web 2.0 application. Moving video
postproduction from tape to disk has fostered a growing market for non-linear
editing (NLE) systems that need to support a number of key functions. Base NLE
features include capturing digital content, editing—including special effects and
graphics enhancement—and finished video rendering. As a result, any
underlying storage server must be capable of supporting concurrent recording of
broadcast material, modifying prerecorded data, and broadcasting presentations.
What's more, NLE systems have a natural affinity for a SAN. By handling media
content as a digital file, an NLE system allows users to manage that content over
its entire lifecycle. Moreover, any data lifecycle management process is
enhanced by the presence of a data networking infrastructure. In particular, video
operations stress traditional storage systems in terms of capacity and throughput:
Two hours of uncompressed 1080i video will consume over a terabyte of disk
storage and an NLE system will need to access data at a rate around 165 MB
per second—greater than a single gigabit connection can deliver—to create it.
For digital content that relies heavily on streaming large files, the RASTOR 7500
equipped with SATA drives is an excellent fit. Provisioning these or other types of
begins with selecting unused drives and placing them in a new RAID array. Once
the disk group is created, an administrator can partition the disk group in order to
create logical drives that will be presented to client hosts.
What's more, using the RASTOR 7500, administrators can control key storage
characteristics, such as write and read-ahead caching policies, locally at the level
logical disk rather than just globally for an entire array. As a result, two logical
disks partitioned from the same RAID array can take on very different I/O
throughput performance characteristics. With the RASTOR 7500, an
administrator can take a much more fine-grained approach to storage
optimization that can be truly application specific.
Once a virtual disk is created, the final step is to virtualize its ownership, or in the
argot of the RASTOR GUI—as well as a growing number of others including HP
—"present disks to hosts." By default, a newly created logical disk is presented to
every host, which is defined by an iSCSI initiator iqn, on the iSCSI SAN. As a
result, we needed to create two distinct host IDs for our Dell PowerEdge 1900
server: A separate identity was required for each of the two ports on the host’s
QLogic QLA 2552 iSCSI HBA.
For an environment with physical clients running Windows, logical disk
virtualization is vital as Windows desktop and server operating systems do not
have a distributed file locking scheme for sharing storage at the block level. In a
VMware Virtual Infrastructure environment, however, host sharing of SAN
volumes is vital to advanced functions such as VMotion and VMware
In our initial benchmarking tests, openBench Labs concentrated on assessing the
performance of the RASTOR 7500 in a digital content scenario, which now
extends to video surveillance. In these tests, the primary issue for accelerating
application throughput is the streaming of sequential reads and writes. To a
lesser-but growing-degree, however, streaming media applications associated
with Web 2.0 initiatives are also dependent on random data access to support
such functions as non-linear editing (NLE).
We began by examining streaming read and write I/O performance to a single
25GB logical drive backed by a physical RAID-0 array. In this set of tests, the
physical arrays were resident on RASTOR 7500 arrays with SAS and SATA
drives, as well as, an nStor 2Gb-per-second Fibre Channel array, which utilizes a
hardware-based RAID storage scheme.
With software-based RAID on the RASTOR 7500, I/O on logical disks exported
by the RASTOR 7500 was much more characteristic of Linux, which the
RASTOR was running, than Windows Server 2003, which our host was running.
The key difference when streaming sequential I/O is that Linux attempts to
bundle all small I/O requests into 128KB blocks. When streaming I/O for large
files, that results in a rapid convergence of throughput to the maximum level
sustainable by the I/O subsystem. This is particularly important for applications
on Windows, which often make 8KB requests. In contrast, storage arrays that
rely on hardware-based RAID, such as the Fibre Channel-based nStor 4540,
pass the Windows I/O requests directly to the hardware for fast response without
changing the characteristics of the I/O requests.
In our tests with the RASTOR 7500, openBench Labs measured both 8KB reads
and writes in excess of 100MB per second using both the SATA- and SAS-
provisioned RASTOR arrays. That performance level pegged streaming small
block I/O on the iSCSI array as having a 50 percent edge over the 2Gb-per-
second Fibre Channel array. Even by forcing a conservative write-through
caching policy, we also measured little difference in the performance of writes
with differing RAID levels. More importantly, using a normal default for safe write-
back caching obliterated any measurable differences.
We configured our Windows Server 2003 host for high throughput of iSCSI data
and high availability of iSCSI sessions. For high throughput we enabled jumbo
packet support on the RASTOR 7500: In addition to the storage array and client
NIC, all switches between the two devices must also support jumbo TCP
packets. For high availability via the automatic failover of iSCSI sessions, we
enabled MPIO support on the RASTOR array. MPIO allows clients to establish
multiple active connections to logical disks exported by the storage array without
the client OS interpreting each connection as an independent logical disk.
To leverage MPIO on the RASTOR 7500, we needed to provide MPIO support
on our Dell 1900 PowerEdge server. To implement MPIO on our server, we
utilized version 2.0.6 of the Microsoft iSCSI initiator in conjunction with the two
ports on our QLogic QLE4052 iSCSI HBA. This configuration allowed us to to
setup dual active-active connections to each logical drive exported by the
It’s important to note that backend MPIO support on the RASTOR array provides
network load balancing of connections for iSCSI sessions from client systems.
With MPIO, a specific iSCSI session handles all of the I/O to a logical disk at any
particular instant. As a result, the throughput for any particular benchmark
instance was limited to the throughput of a single 1Gb-per-second connection.
Configuring active-active connections for iSCSI sessions ensures automatic
failover will prevent iSCSI sessions from being interrupted when connections are
To test effect that the RASTOR’s backend NIC striping scheme has on
throughput scalability for hosts, we needed to set up multiple iSCSI sessions to
multiple logical disks. With each iSCSI session tied to a port on the server's
iSCSI HBA, read or write throughput on each iSCSI volume was limited to
125MB per second. As a result, scalability would only be evidenced in total
throughput to multiple drives. True to form, I/O continued to exhibit Linux
characteristics with multiple drives, as writes provided the best I/O scaling.
Cumulative write throughput to two logical disks reached 230MB per second with
SAS drives. Furthermore, small 8KB I/O was again providing nearly the same
throughput as 32KB and 64KB accesses.
For streaming digital content applications, storage capacity and throughput go
hand-in-hand as the primary concerns. On the other hand, applications built on
Oracle or SQL Server typically generate large numbers of I/O operations that
transfer data using small block sizes from a multitude of locations dispersed
randomly across a logical disk. In such a scenario, the spotlight is on fast disk
access to maximize processing large numbers of I/O operations (IOPs).
Applications that rely at least in part on transaction processing, such as SAP and
even Microsoft Exchange, put a premium on the minimization of I/O latency
through data caching and high-speed disk rotation.
In many SMB transaction-processing applications, the number of processes
involved in making transactions is often limited to a few proxies. Microsoft
Exchange provides an excellent example of such a transaction-processing
scheme. Exchange utilizes a JET b-tree database structure as the main mailbox
repository. An Exchange store and retrieve process, dubbed the Extensible
Storage Engine (ESE), takes transactions passed to it, creates indexes, and
accesses records within the database.
To assess potential RASTOR 7500 performance in such SMB transaction-
processing scenarios, we ran the Intel® open source IOmeter benchmark. With
IOmeter, we were able to control the number of worker processes making I/O
read or write transaction requests and tune those processes to limit the number
of outstanding I/O requests-the I/O queue length. In particular, we utilized one
process and varied the I/O queue length from 1-to-30 outstanding requests. We
then tested these conditions on various database sizes
During each benchmark test we recorded the number of IOPs processed and the
average response time for each IOP. Using small I/O request sizes-we utilized
8KB reads and writes in all of our tests-IOmeter stresses data access far more
than it stresses data throughput. For comparison, we ran the IOmeter tests using
volumes exported from the nStor 4540 FC array and the RASTOR 7500 SAS
To analyze I/O performance, we plotted the average number of IOPs per second
as a function of the outstanding I/O queue depth. In that context, archetypal IOP
performance follows a distinct pattern: As the number of outstanding I/O requests
begins to increase, the IOP completion rate increases by an order of magnitude.
Continuing to increase the number of outstanding I/O requests, however, leads to
an inflection point in IOP-completions. At that point, the scalability of the I/O
subsystem breaks and additional outstanding I/O requests begin to overwhelm
the I/O subsystem with overhead; the rate at which IOPs complete flattens; and
the average response time for an IOP begins to grow dramatically.
In particular, when openBench Labs tested a 1GB file on the nStor Fibre Channel
array, we needed to allow the IOmeter worker process to have 15 outstanding I/
O requests in order to reach a transaction completion rate of approximately 2,000
IOPs per second for read requests. On the other hand, the RASTOR was able to
process random reads from a 1GB file almost entirely from cache. With an
outstanding I/O queue length of just 5 outstanding I/Os, the RASTO 7500
delivered and IOP completion rate of 20,000 IOPs per second. With 10
outstanding I/O requests, the RASTOR 7500 was completing an average of
28,000 IOPs per second.
That extraordinary cache advantage on reads seemingly disappeared when we
employed a mix of I/O read and write requests-80% read and 20% write I/O
transactions. In that test, I/O completion rates using a logical drive exported by
the iSCSI RASTOR 7500 and a logical drive exported from the 2Gbps Fibre
Channel nStor 4540 were statistically identical. Using both the iSCSI RASTOR
and Fibre Channel nStor arrays, we approached 2,000 IOPs per second with an
I/O queue depth of 15 outstanding requests. Nonetheless, caching on the
RASTOR was still playing a role as the average read response time was 20%
lower on the RASTOR with an I/O queue depth of 15.
With the reliability of a Patek Philippe Grand Complications Chronograph, the
RASTOR 7500 was able to continue to minimize I/O latency by maximizing
cache hits even as we expanded the size of the target file well beyond that of the
system's total cache size. With a I/O transaction mix of 80% read and 20% write
requests, reads on a logical drive exported by the RASTOR 7500 remained
As a result, the rate at which all I/O requests were completed by the RASTOR
array continued to increase well after I/O processing on the nStor array was
saturated. In all of our IOmeter tests, the RASTOR 7500 was able to utilize its
aggressive caching scheme to boost the processing of I/O read transactions,
even in a mixed I/O environment. This performance profile makes the RASTOR
7500 especially valuable in the context of IT operations at an SMB site.
OPENBENCH LABS SCENARIO
UNDER EXAMINATION: iSCSI Storage Server
WHAT WE TESTED
Rasilient RASTOR 7500 Storage Server
Logical volume management services
Web-based GUI for storage provisioning
Load balancing based on the iSCSI redirect function
Backend support for clients implementing Microsoft
HOW WE TESTED
Windows 2003 Server SP2
QLogic QLE4052 iSCSI HBA
nSTOR 4540 Array
3,500 IOPS Benchmark Throughput (8KB Requests)
130MB per Second Benchmark Throughput per iSCSI Session (1Gbps
MPIO Session Management for Clients Running MS Initiator v2.0.6
Targeting system administrators at mid-tier business sites, the RASTOR Storage
Manager GUI provides a highly automated configuration and management
interface. To setup iSCSI load balancing, an administrator simply assigns a
network address to each NIC using the RASTOR GUI. All other details are
handled internally. Clients are then able to connect to the storage array using a
single IP address..
We were easily able to leverage the support of MPIO for client systems on the
RASTOR 7500. To do this we used the latest version of the Microsoft iSCSI
initiator in conjunction with the dual ported QLogic QLA4052C iSCSI HBA. With
MPIO activated for each iSCSI connection, we tested its effect by unplugging the
active Ethernet connection from the QLogic iSCSI HBA while running our
benchmarks. With active-active connections, failover was immediate and totally
transparent as no degradation was measured in throughput.
Using five 750GB SATA drives, we created a 2.7TB RAID5 disk group. System
administrators are able to configure up to four disk groups for each controller. For
our Using five 750GB SATA drives, we created a 2.7TB RAID5 disk group.
System administrators are able to configure up to four disk groups for each
controller. For our scenario, we assigned various groups RAID levels of 0, 1, 10,
5 or 6 over the test period. We found that we were able to maximize both the
capacity and recoverability of an array without adding any significant overhead by
using RAID level 6. By adding an extra parity bit over RAID 5, two drives can fail
in a RAID 6 array while the array remains active and recoverable,, we assigned
various groups RAID levels of 0, 1, 10, 5 or 6 over the test period. We found that
we were able to maximize both the capacity and recoverability of an array without
adding any significant overhead by using RAID level 6. By adding an extra parity
bit over RAID 5, two drives can fail in a RAID 6 array while the array remains
active and recoverable.
After configuring a set of physical disks as a 2.7TB RAID array, we then
partitioned that array into 25GB virtual disks. Since RASILIENT utilizes software
for these functions, we were able to fine tune each logical drive by assigning
policies for read-ahead, and write I/O caching operations. That kind of tuning is
typically reserved for physical arrays. By enabling write-back caching, we
eliminated all write overhead differences, Even with a conservative write-through
policy, we measured a negligible difference between RAID 5 and RAID 6.
The RASTOR GUI provides a means to virtualize logical disks for one or more
hosts. By default, a logical disk is available to all systems with an iSCSI initiator.
Alternatively, an administrator can present a logical disk exclusively to a list of
one or more hosts via its initiator's iqn. With each iSCSI HBA port on our Dell
1900 having a unique iqn, we could either restrict disks to a particular port or
make them available to both ports for MPIO failover.
//oblDisk.ai (performance chart)//
With software-based RAID provided via the Linux OS running on the RASTOR
7500, I/O to any logical disk presented by the RASOR 7500 took on the
characteristics of Linux reads and writes. Thanks to the bundling of I/O requests
into 128KB blocks, sequential 8KB I/O, which is the default used by most
Windows applications, streamed at over 100MB per second for both reads and
writes. Essentially, all sequential I/O was at wire speed for 1Gb-per-second
//oblDisk2.ai (performance chart) //
With one iSCSI session connected to each of the 1Gb-per-second ports on our
host's iSCSI HBA, I/O scaled close to linearly as total I/O from the RASTOR 7500
approached 2Gb per second. Moreover, throughput increased in a Linux-like
manner as write I/O scaled better than read I/O. What's more, SAS drives
showed a slight but measurable advantage with multiple sessions.
//IOmeter-1GB.ai (performance chart)
I/O performance using IOmeter on a logical volume exported from the nStor 4540
FC array went totally according to Hoyle: As we increased the I/O queue length,
the rate of IOP completion increased in tandem. The same pattern occurred with
a logical volume exported from the RASTOR 7500 with one notable exception.
When we targeted a small-1GB file-with read requests exclusively, the RASTOR
7500 maximized cache utilization and sustained IOP completion rates that were
an order of magnitude higher than that of the nStor.
//IOmeter-10GB.ai (performance chart)
As we increased the size of the target file for the IOmeter benchmark, we
expected to see a dramatic decline in caching effects on the RASTOR 7500: This
decline did not occur. With a 10GB file, mixed read and write requests
maintained a distinct advantage with faster read response times. With 30
outstanding I/O requests, read response time was 60% slower on a logical disk
exported by the nStor array, As a result, the logical disk exported by the
RASTOR 7500 sustained a 23% higher IOP rate on reads.