- Enterprise IT Agility Challenges
Federal IT infrastructures face increasing complexity as they are virtualized and migrated to private clouds at a rapid pace. This makes it difficult to ensure optimal performance and alignment with changing agency needs.
- Impediments to Aligning IT Organizations with Mission Goals
Siloed teams, lack of processes, and legacy monitoring tools that don't account for virtualization prevent IT from delivering on service level agreements.
- Infrastructure Performance Management (IPM) – a New Approach
IPM provides comprehensive, real-time visibility into overall infrastructure performance to help optimize budgets, mitigate risks, and ensure performance meets agency requirements.
2. 2
Federal IT Agility Challenges: Cost, Complexity, Convergence
Technology and application trends such as big data, storage and server growth,
mobile applications, bring-your-own-device (BYOD) and social media are the
biggest disrupters of mission-critical IT today. The resultant pressure on IT
organizations is to deliver 24x7 “dial-tone” service levels that align with agency
demands for availability and performance while remaining ever-agile to meet
continually evolving application service delivery and mission requirements.
Compounding the complexity and risk of this “new normal” state on mission-
critical infrastructures are complex interdependencies and technology
advances—leading to blindingly fast servers, converging infrastructure
components, mass adoption of virtualization, and rapid migration to the private
cloud. The key drivers behind all of these trends are demand for
anytime/anywhere application access and flawless performance—from any
device. All this exponentially increases storage requirements, mobile data, IP
traffic and servers, and fuels advances in new technologies. The result is a never-
ending pace of rapid transformation and explosive increase in complexity and
risk.
The New “Normal”
The Federal Data Center Consolidation Initiative increasingly requires the entire
system, from server to storage fabric and storage arrays to be virtualized and
abstracted. Because of this, clear visibility into performance and issue resolution
are on-going challenges. As federal data centers with multi-vendor, multi-layer
data center architectures perpetually change; views from the user, administrator
and application are increasingly de-coupled from the actual physical
architectures. As a result, a clear and unbiased understanding of the physical
infrastructures, the applications dependent on them, and how they are
performing, is difficult to achieve. Moreover, this “new normal” state makes it
nearly impossible to manage with confidence and operate more efficiently. These
Summary
Enterprise IT Agility Challenges
Mission critical Federal IT
application delivery infrastructures
are being virtualized at a break-neck
pace, and are increasingly
abstracted and complex. Agile
development means changes are
continual, and these same
infrastructures are required to
deliver a flawless end-user
experience characterized by hyper-
responsive performance and
availability.
Impediments to Aligning IT
Organizations with Mission Goals
Not having the right people,
processes and technology solutions,
and leveraging old technologies that
weren’t purpose-built to handle
today’s complex and ever-changing
infrastructure, prevents IT from
delivering mission-aligned service
levels.
Infrastructure Performance
Management (IPM) – a Platform
to Ensure Federal IT Value
IPM is a new platform category for
managing the physical, virtual and
cloud infrastructure.
Capabilities Required in an IPM
Platform
Key requirements include
continuous, real-time, system-wide
data collection from heterogeneous
IT infrastructures, combined with
unbiased correlation and analysis.
Benefits of an IPM platform
Delivers ROI to Federal Agencies for
both CAPEX and OPEX savings.
Ensures optimal budget alignment.
Guarantees performance, mitigates
risk and aligns Federal IT
infrastructures with agency
application and mission
requirements.
Virtual Instruments’
VirtualWisdom platform
A comprehensive IPM platform
providing 360o visibility into the
performance, health and availability
of the IT infrastructure.
Source: 2011 and 2012 Cisco VNI, EMC and IDC
3. 3
circumstances are creating inherent gaps in visibility that must be addressed in order to achieve the real-time
situational awareness needed to better guide IT decisions, stay within budget, and remain ever-agile.
Managing the New “Normal”
It’s now critical for Federal IT organizations to have a purpose-built infrastructure performance management
platform. One that is specifically designed for granular, real-time monitoring of performance throughout the
infrastructure supporting mission-critical applications (i.e. Microsoft Exchange, Electronic Medical Records,
Imagery Analysis, etc), Virtual Desktop Infrastructures (VDI) and agency processes. Organizations also need a
platform that assures system-wide performance and availability. This platform should enable infrastructure teams
(no matter if they are server, storage, virtualization or performance focused) to de-risk mission-critical workloads
across physical, virtual and cloud-based computing environments. At the same time, it should also provide deep
insights into the real-time performance of the IT infrastructure, so that the risk of virtualizing and delivering new
applications, and deploying new technologies can be minimized. Leveraging such a platform would allow Federal
IT organizations to align with application requirements and agency demands, and become a trusted partner in
driving better organizational decisions to accomplish the mission.
Impediments to Aligning IT Organizations with Government Agency Goals: Minding and
Managing the Gaps
Three main factors—people, processes and technology- can prevent Federal IT organizations from achieving a state
of maturity and alignment, where they are perceived by the broader agency community as having the situational
awareness and ability to deliver exacting, cost-optimized service levels.
People and Organizational Structure: The IT organization itself can impact mission alignment, particularly
if resources and management are functionally siloed. For example, organizations are often divided into
Application, Server, Network, and Storage administrators. If there is a performance problem, the various
administrators tend to make sure that it is not occurring in their domain. This structure pits teams against
each other and results in finger pointing. This predicament is further antagonized by device-specific system
administration tools that provide a biased, myopic focus on only one facet of an IT infrastructure. When it
comes to interrelated and interdependent systems, without a holistic and unbiased view of the environment,
it’s nearly impossible to manage performance—let alone quickly find root causes of problems or set accurate
SLAs.
Process and IT Maturity: IT organizations are forever focused on providing under or within budget service
levels, but many lack the processes or metrics necessary to fully realize their commitments. The reasons range
from a lack of executive sponsorship, to over-allocated or mis-allocated resources, to little or no
institutionalized collaboration processes. In all cases, teams must focus on finding and resolving specific
problems. However in many cases, they lack the proper instrumentation and processes for fast and conclusive
resolution. Many IT teams try to focus on aligning their operations with agency goals. Often, however, this
ends up being more of a hope than a reality. Without continuing executive commitment to people and
processes (backed by accurate data) it’s tough to achieve alignment. Having uniform processes in place,
especially when there are multiple stakeholders with different reporting structures, incentives and goals—
often proves impossible to maintain.
Technology and Tools for Yesterday’s Environments: Systems Management tools have traditionally
focused on specific components of an infrastructure, such as provisioning and monitoring server resources,
managing storage capacity, or utilization of the storage and network fabric. In most cases these legacy device-
specific tools have been marketed as performance monitoring, even though they only monitor utilization.
4. 4
Neither singularly or combined do these tools provide the necessary performance information that
Application, Server, Storage and Cloud administrators need to understand how their mission-critical systems
infrastructures are actually performing. They lack real-time performance metrics (like infrastructure response
times) for executing actions, and efficiency in moving data/information. They also have limited scalability, and
are often agent-based and vendor-specific. These cobbled together toolsets may try to show that the overall IT
environment is working fine, with the right capacities and resources provisioned. However, if application
response is poor, related IT service levels are still deemed unsatisfactory.
A Moment of Clarity: Performance versus Utilization. Infrastructure monitoring tools use utilization
metrics to infer the potential impact on performance. Since they are not measuring true performance, it leads
teams to over-provision resources in order to ensure that utilization doesn’t impact performance. However,
inferring performance when over-provisioning is no longer acceptable and is not an option. The days of over-
provisioning to ensure performance are gone, since it’s no longer viable with hyper data growth and
accompanying cost pressures. This is especially true since the fundamental promise of virtualization and the
cloud is to drive down costs, operate more efficiently and achieve greater utilization against existing assets.
Requirements for IT to be Ever-Agile: Another important factor placing everything at risk is the ever-changing
IT infrastructure landscape. Systems Management tools were originally designed for monitoring physical elements,
but with the advent of virtualization and cloud architectures those tools are neither appropriate nor useful for
managing the related complexities. The continued virtualization and abstraction of the infrastructure and
underlying physical elements makes performance management, problem avoidance, root cause problem analysis,
and remediation of the physical elements all the more difficult.
IPM: Defined
Infrastructure Performance Management (IPM) is the ability to continuously capture, correlate and analyze in real-
time, the system-wide performance, utilization and health of heterogeneous physical, virtual and cloud computing
environments. Ultimately, IPM enables Federal IT to have real-time situational awareness and have the “Do Once,
Use Many Times” capability to establish and maintain the service levels that agencies require while driving the
systems level optimization, cost savings and agility that is the promise of virtualization and the cloud.
IPM: Managing Through Compounding Complexity and Risk to Assure IT Value and Agency
Mission Alignment
Federal Infrastructure Operations and Management teams are forever under pressure to deliver IT service levels
that are aligned with government agency goals and application performance requirements. In fact, agency
processes and application teams often set IT’s priorities in terms of availability and performance. Moreover, new
IT infrastructure compute models include hybrid environments where portfolios of applications are migrated
between and reside within physical, virtual and cloud environments simultaneously. The result is compounded
complexities in monitoring, managing and reporting—necessitating an IPM platform for assuring performance
optimization, operational efficiency, risk mitigation and sustainable SLA’s.
5. 5
Trying to manage through this
complexity forces the device-
specific management tools of
yesterday to change. In the past,
management tools focused on
capacity, utilization and
management of individual physical
components. With the move to
virtualization, cloud and new
deployment models, Application
Performance Management (APM)
and Network Performance
Management (NPM) have been
trying to extend and fill-in for
what’s missing. The challenge is
that neither delivers a holistic view
of the underlying IT infrastructure.
Neither provide situational context.
So, between all of these tool sets
that address device, application
and network performance, there is still a huge gap in understanding system-wide performance.
Real-Time IPM Platform Requirements
An IPM platform must be able to continuously capture, correlate and analyze system-wide infrastructure
performance, utilization and health metrics from heterogeneous environments in real-time. It must provide an
unbiased view of system-wide infrastructure performance, from virtual machine, to server, to switch fabric, to
storage array to logical unit of storage.
Continuous, Granular Monitoring and Measurement: The most important metric for IT Infrastructure
performance is infrastructure response times. This is defined as the time it takes for any application, running
on a physical server or virtual machine to place an I/O request and get a response back. In large, highly
virtualized and/or cloud infrastructures, too much happens within a five or twenty minute polling-interval that
shouldn’t be glossed-over. For any monitoring to be effective, organizations need to capture the minimum,
maximum and average at no greater than one second intervals for accurate insight into the overall
infrastructure performance.
Unbiased and Heterogeneous: Government IT infrastructures by design are heterogeneous, with multiple
vendors delivering their part of a complete physical, virtual and cloud solution. An IPM platform must collect
data from various devices without vendor or product-specific bias or dependencies. It must deliver an
unbiased, vendor independent view of the whole system—with precise metrics that enable situational
understanding and context for what’s happening throughout the system. The IPM platform should deliver the
unbiased “truth” related to system-level performance, so that government IT leaders can make smart decisions
regarding infrastructure architecture, engineering, acquisition and operations.
System-Wide Data Collection, Visibility and Analytics: The IPM platform must capture, organize and
correlate metrics about all of the infrastructure elements. This means from the virtual machine, server, fabric,
6. 6
storage arrays and logical unit number. Further, the data must be presented in a way that clearly describes
interdependencies in context of system-wide performance. The data must also be persisted, to make it easy to
go back in time and pinpoint exactly where and when a performance impacting event happened, and how it
affected the system overall. The IPM platform must track all granular I/O activity across highly virtualized
infrastructures—from server transmission through the switch to storage and back—and leverage an analytics
framework for contextual understanding, correlation and discovery.
Scalability: A typical virtualized federal IT infrastructure is comprised of thousands of servers, hundreds-of-
thousands of switch and storage I/O ports, and petabytes of storage. An IPM platform must be able to handle
the large number of installed physical devices and the associated metrics without a hiccup and without risk of
hitting a limit. Most legacy device or system management tools struggle with scale by design.
Benefits of an IPM Platform
An IPM platform delivers significant cost, time and resource savings to both IT and government agencies:
Operational Efficiency and Effectiveness: Cross-domain system visibility enables comprehensive
measurement of the infrastructure performance from the virtual machine all the way to the storage LUN. So
performance bottlenecks can be identified and corrected proactively, before impacting service levels—
resulting in significantly higher performance—and enabling higher utilization of existing infrastructure assets.
Additionally, IPM helps IT organizations improve infrastructure response times, proactively avoid outages,
quickly identify root cause and remediate performance issues, and drive continuous improvement in
reliability.
Risk Mitigation: IPM enables highly accurate baselining to proactively identify the impact of any and all
changes to the infrastructure. This improves forecast accuracy and results are seen in real-time. For example,
migration of agency enterprise applications from physical to virtual and cloud environments can be
accomplished with much lower risk. Application performance has an interdependent relationship with the
infrastructure it runs on. Different applications require different resource quality and compete for resources
with other applications, so it is important to manage the infrastructure accordingly. By leveraging IPM data,
it’s possible to project application performance and ensure that agency processes are executing as they should.
This is equally applicable whether the change process spans months of engineering design, test and rollout, or
as it becomes more real-time in highly automated cloud environments.
Agency Mission Alignment: Federal IT organization priorities include delivering the service levels required
by their users, and having an agile response as program requirements change and evolve—all while optimizing
related operational efficiencies and right-sizing infrastructure budgets. For these IT organizations, IPM
enables the ability to stay in lock-step with agency mission requirements. IPM helps IT quickly identify root
causes and proactively (and significantly) reduce mean time to resolution. By increasing system-wide
utilization, reducing over-provisioning and making the correct decisions on infrastructure spend versus
performance required, budgets are significantly reduced. Having an IPM platform helps cut through
virtualization complexity, accelerates transformation and minimize risks.
The IPM Platform: Enabling Agile IT that Aligns with Federal Agency Missions and Programs
Though it’s still important to understand utilization and how it affects performance, it’s now essential to
understand it within the context of true infrastructure response times. It’s critical to understand how the
infrastructure is delivering the resources required by the applications relying on it, and if the resources are
delivering the right service levels. Virtual Instruments’ VirtualWisdom IPM Platform was purpose-built from the
7. 7
ground up to address all of the challenges and gaps resulting from the circumstances and evolutionary scenarios
described above. It delivers a continuous real-time understanding of application data behavior at the protocol level
throughout physical, virtual and cloud infrastructures. Whereas other management tools focus on utilization and
health at the device layer, the VirtualWisdom IPM Platform looks at the entire compute environment from a
systems level—focusing on correlation and interdependencies of performance, utilization and health overall. This
system-wide visibility enables Federal IT organizations to help agencies consolidate redundant applications, avoid
waste and duplication, and strengthen IT governance.
The benefits for Federal IT organizations are multi-fold, as they are now enabled with better situational awareness
for better informed IT decisions around architecture, engineering, design and operations. They can also:
Right size and align infrastructure capacity, resource quality and avoid wasted resources
Drive greater utilization against existing assets and resources
Accurately measure I/O traffic, correlate system-wide data and monitor trends
Improve infrastructure response times
Proactively, efficiently, accurately and quickly identify and remediate problems.
Deliver the right level of performance at the appropriate cost
Establish SLAs aligned with agency mission requirements
Effectively partner with and contribute to the success of the agency’s mission
Virtual Instruments’ VirtualWisdom®: The Award Winning IPM Platform
VI’s VirtualWisdom is the industry
leading IPM platform for Federal
IT organizations’ physical, virtual
and private cloud computing
environments. VirtualWisdom
has been architected from the
ground-up to measure and ensure
the performance, utilization and
health of mission-critical systems
infrastructures. It is comprised of
VirtualWisdom application
software and three distinct probes
that capture metrics from multiple
data sources throughout the open
system. The platform server
persists and correlates the
granular metrics collected to deliver authoritative visibility, trending, alerting and reporting. VirtualWisdom
delivers the industry’s only real-time, unbiased and system-wide Infrastructure Performance Management
solution.
VirtualWisdom Server and Dashboard
The VirtualWisdom Server is a robust, database-driven application that controls and collects metrics from both
hardware and software probes. The platform persists all metrics collected over time and organizes these metrics
in both physical and logical groupings. This enables advanced correlation and analysis across all layers of the
open systems stack. This capability also delivers on the requirements for real-time alerting, historic trending and
reporting and advanced ‘what-if’ modeling. The VirtualWisdom server also provides a fully customizable
Dashboard UI, which displays a time-correlated, cross-domain view of the performance, utilization and health of
8. 8
the system as a whole, based on the deep, granular metrics available from the probes. The dashboard is entirely
configurable, so the display and reports can be presented in a user defined context that fits with the enterprise’s
operational process. It also provides for integration with third-party products and management frameworks.
VirtualWisdom Virtual Server Probe
The Virtual Server Probe is a software-based probe that is integrated with VMware vCenter and collects more than
100 metrics on the associated VMware estate. Some of the metrics captured by this probe include CPU utilization
and status, memory utilization, disk I/O requests and capacity, as well as network requests and utilization by
virtual machine, ESX Host and cluster. The metrics are then persisted and correlated with the metrics from the
other VirtualWisdom probes, enabling end-to-end correlation from virtual machine to the logical unit of storage.
VirtualWisdom SAN Availability Probe
The SAN Availability Probe is a software-based probe that collects key data from Fibre Channel and FCoE
directors, switches and embedded devices via SNMP. The software probe collects comprehensive metrics for
every active port in the SAN and organizes them according to the attached device type. Some representative
metrics include utilization, frames, bytes, reads and writes, as well as key faults such as loss of synchronization,
link resets, link failures, packet discards and CRC errors. Organizing these metrics according to the attached
device type enables customers to manage storage, servers, ISL’s and other devices through global rules as opposed
to a port-by-port basis. The metrics are captured and reported to the VirtualWisdom Server on a user
configurable time-based interval.
SAN Performance Probes
The hardware-based SAN Performance Probe is the most advanced, high performance line-rate data capture and
analysis device available in the industry. This device sits at the protocol layer to capture and analyze every FC
frame header and SCSI Command on the SAN in real-time, at 2, 4 and 8 Gbits per second. The probe captures the
true, unaltered, I/O profile of the actual application traffic, detecting application performance slowdowns and
transmission errors by measuring every SCSI I/O transaction from start to finish. The SAN Performance Probe is
currently available in both 8 link and high-density 16 link versions.
SANInsight® TAP Patch Panel System
Traffic Access Points (TAPs) provide a passive, fail-safe access point to Fibre Channel network traffic on the TAP’d
link. This makes the fiber ‘light’ available for real-time performance monitoring, deep problem diagnosis and
protocol layer failure analysis. TAPs are non-powered, non-mechanical devices that reflect a small portion of the
signal through the TAP to another port, which provides a copy of the light to upstream, out of band probes. The
passive TAP does not introduce any latency or overhead, has no impact on application or SAN performance and is
integrated with several industry-leading Fibre Channel Patch Panel Systems for simple deployment.
Delivering on the Promise of Program Agility and Mission Alignment
The move to virtualized and cloud environments can help reduce CAPEX and growth in servers, but this new
paradigm can’t contribute to controlling storage CAPEX or OPEX growth in maintenance costs and staff. In
addition, there’s often a budget impact of downtime or unacceptable performance, which can also impact customer
satisfaction and whether or not the agency mission is achieved. If one is not vigilant, the net effect of cloud-driven
consolidation, migration, and new technology roll-outs is increased risk and cycle time, often negating the very
reason to move to a cloud infrastructure in the first place.