SlideShare a Scribd company logo
1 of 35
Download to read offline
University of California
Larry L. Sautter Award Submission
Innovation in Information Technology
at the University of California, San Francisco
Submitted By:
Mr. Michael Williams, MA
University of California
Executive Director, Information Technology
UC San Francisco Diabetes Center, Immune Tolerance Network
and
Chief Information Officer
UC San Francisco Neurology, Epilepsy Phenome/Genome Project
Telephone: (415) 860-3581
Email: mwilliams@immunetolerance.org
Date Submitted: Friday, May 18, 2007
Page 1
Table of Contents
1. PROJECT TEAM...........................................................................................2
1.1. TEAM LEADERS ........................................................................................2
1.2. TEAM MEMBERS .......................................................................................2
2. PROJECT SUMMARY AND SIGNIFICANCE...............................................4
3. PROJECT DESCRIPTION ............................................................................6
3.1. BACKGROUND INFORMATION .....................................................................6
3.2. SITUATION PRIOR TO ARCAMIS ...............................................................7
3.3. AFTER ARCAMIS DEPLOYMENT ...............................................................8
3.4. BUSINESS IMPACT...................................................................................13
4. TECHNOLOGIES UTILIZED.......................................................................14
4.1. THE ARCAMIS SUITE ............................................................................14
4.2. ITIL TEAM BASED OPERATING MODEL.....................................................15
4.3. SECURITY MODEL AND ARCHITECTURE.....................................................17
4.4. DATA CENTER FACILITIES .......................................................................21
4.5. INTERNET CONNECTIVITY.........................................................................23
4.6. VIRTUAL CPU, RAM, NETWORK, AND DISK RESOURCES ..........................23
4.7. OPERATING SYSTEMS SUPPORTED ..........................................................25
4.8. BACKUP, ARCHIVAL, AND DISASTER RECOVERY .......................................25
4.9. MONITORING, ALERTING, AND REPORTING................................................25
4.10. IT SERVICE MANAGEMENT SYSTEMS ....................................................27
5. IMPLEMENTATION TIMEFRAME ..............................................................28
5.1. PROJECT TIMELINE .................................................................................28
6. CUSTOMER TESTIMONIALS.....................................................................29
APPENDICES.....................................................................................................30
APPENDIX A – CAPABILITIES SUMMARY OF THE ARCAMIS SUITE ........................30
APPENDIX B – EXCERPT FROM THE ARCAMIS SYSTEMS FUNCTIONAL
SPECIFICATION..................................................................................................33
Page 2
1. Project Team
1.1. Team Leaders
Michael Williams, M.A.
Executive Director, Information Technology
UC San Francisco Diabetes Center, Immune Tolerance Network
and
Chief Information Officer
UC San Francisco Neurology, Epilepsy Phenome/Genome Project
Gary Kuyat
Senior Systems Architect, Information Technology
UC San Francisco Diabetes Center, Immune Tolerance Network
and
UC San Francisco Neurology, Epilepsy Phenome/Genome Project
1.2. Team Members
Immune Tolerance Network Information Technology:
Jeff Angst
Project Manager
Lijo Neelankavil
Systems Engineer
Diabetes Center Information Technology:
Aaron Gannon
Systems Engineer
Project Sponsors:
Michael Williams, M.A.
Executive Director, Information Technology, Immune Tolerance Network
Page 3
Jeff Bluestone, Ph.D.
Director Diabetes Center and Immune Tolerance Network
Dr. Daniel Lowenstein, M.D.
Department of Neurology at UCSF, Director of the UCSF Epilepsy Center
Dr. Mark A. Musen, M.D., Ph.D.
Professor; Head, Stanford Medical Informatics
Dr. Hugh Auchincloss, MD.
Chief Operating Officer, Immune Tolerance Network (at time of project)
Currently - Principal Deputy Director of NIAID at NIH
Page 4
2. Project Summary and Significance
By deploying the Advanced Research Computing and Analysis Managed
Infrastructure Services (ARCAMIS) suite, the Immune Tolerance Network
(ITN) and Epilepsy Phenome Genome Project (EPGP) at the University of
California, San Francisco (UCSF) has implemented multiple Tier 1 networks
and physically secured enterprise class datacenters, storage area network
(SAN) data consolidation, and server virtualization to achieve to achieve a
centralized, scalable network and system architecture that is responsive,
reliable, and secure. This is combined with a nationally consistent, team
centric operating model based on Information Technology Infrastructure
Language (ITIL) best practices. Our deployed solution is compliant with
applicable confidentiality regulations and assures 24 hour business
continuance with no loss of data in the event of a major disaster. ARCAMIS
has also provided significant savings on IT costs.
Over the last 3 years we have efficiently met the constantly expanding
demands for IT resources by using virtualization of disk, CPU, RAM, network,
and ultimately servers. ARCAMIS has allowed us to provision and support
hundreds of production, staging, testing, and development servers at a ratio
of 25 guests to one physical host. By using IP remote management
technologies that do not require physical presence, server consolidation and
virtualization, combined with SAN based thin-provisioning of storage; we have
effectively untied infrastructure upgrades from service delivery cycles.
Furthermore, centralizing storage to a Storage Area Network (SAN) has given
us the ability to provide real-time server backups (no backup window) and
hourly disaster recovery snapshots to a Washington, DC, disaster recovery
(DR) site for business continuance within hours of a disaster.
ARCAMIS provides the University of California with a proven case study of
how to implement enterprise class IT infrastructures and operating models for
the benefit of NIH funded clinical research at UCSF. We have accelerated the
Page 5
time from the bedside to the bench in clinical research by taking the IT
infrastructure out of the clinical trials’ critical path, thereby providing a
positive impact on our core business: preventing and curing human
disease. ARCAMIS is more agile and responsive, having reduced server
acquisition time to a matter of hours rather than weeks. ARCAMIS is
significantly more secure and reliable, providing in the order of 99.998%
technically architected uptime, and we’ve greatly improved the performance
and utilization of our IT assets. We have created hundreds of thousands of
dollars in measurable costs savings. ARCAMIS is environmentally friendly,
significantly reducing our impact on environmental resources such as power
and cooling. ARCAMIS is able to be used as a blue-print for enterprise class
Clinical Research IT infrastructure services throughout the University of
California, at partner research institutions and universities, and the National
Institute of Health.
The technologies used: Hewlett Packard Proliant Servers and 7000c Series
Blades, VMWare Virtual Infrastructure Enterprise 3.01, Network Appliance
FAS3020 Storage Area Network, Cisco, Brocade, Red Hat LINUX Enterprise,
and Microsoft Windows Server 2003, among others.
Page 6
3. Project Description
3.1. Background Information
The mission of the Immune Tolerance Network (ITN) is to prevent and cure
human disease. Based at the University of California, San Francisco (UCSF),
the ITN is a collaborative research project that seeks out, develops and
performs clinical trials and biological assays of immune tolerance. ITN
supported researchers are developing new approaches to induce, maintain,
and monitor tolerance with the goal of designing new immune therapies for
kidney and islet transplantation, autoimmune diseases and allergy and
asthma. Key to our success is the ability to collect, store and analyze the
huge amount of data collected on ITN’s 30+ global clinical trials at 90+
medical centers, in a secure and effective manner, so a reliable, scalable and
adaptable IT infrastructure is paramount in this endeavor. The ITN is in the
7th
year of 14-year contracts from the NIH, National Institute of Allergy and
Infectious Diseases (NIAID), the National Institute of Diabetes and Digestive
and Kidney Disorders (NIDDK) and the Juvenile Diabetes Research
Foundation.
The Epilepsy Phenome/Genome Project (EPGP) studies the complex genetic
factors that underlie some of the most common forms of epilepsy; bringing
together 50 researchers and clinicians from 15 medical centers throughout
the US. The overall strategy of EPGP is to collect detailed, high quality
phenotypic information on 3,750 epilepsy patients and 3,000 controls, and to
use state-of-the-art genomic and computational methods to identify the
contribution of genetic variation to: the epilepsy phenotype, developmental
anomalies of the brain, and the varied therapeutic response of patients
treated with AEDs. This initial 5 year grant is being funded by the NIH,
National Institute of Neurological Disorders and Stroke (NINDS).
The ITN and EPGP turned to computing infrastructure centralization in Tier 1
networked enterprise class datacenters, virtualization, and data consolidation
Page 7
onto a Storage Area Network (SAN) with off-site disaster recovery replication
to address these challenges. Combined with an ITIL, team based, nationally
consistent operating model leveraging specificity of labor; we are in a position
to efficiently and scaleably respond to the increasing demands of the
organization and rapidly adapt the IT infrastructure to dynamic management
goals. This is accomplished while minimizing costs and maintaining requisite
quality: we have a true high availability architecture, assuring zero data loss.
3.2. Situation Prior to ARCAMIS
Like most of today’s geographically dispersed IT organizations, we were faced
with the challenge of providing IT services in a timely, consistent, and cost
effective manner with high customer satisfaction. Unlike other organizations,
ITN and EPGP have many M.D. and Ph.D. clinical research knowledge workers
with higher then normal, computationally intensive, IT requirements.
Escalating site-specific IT infrastructure costs, unpredicted downtime,
geographically inconsistent process and procedure, and lack of a team based
operating model were among the challenges being faced to support such a
multi-site infrastructure. There was a general sense that IT could do better.
Risk of data loss was real. Dynamically growing demands were making it
more difficult to consistently provide high IT service quality, site IT staff were
largely reactionary and isolated. Prior to the ARCAMIS deployment, the IT
infrastructure faced many challenges:
1. High costs of running and managing numerous physical servers at
inconsistent, multiple-site, server rooms, such as power consumption
with poor reliability, sub-standard cooling, and poorly laid out physical
space. Intermittent and unexpected local facility downtime was
common. Global website services were served out of office servers
connected via single T1 lines.
2. Lead time for delivering new services was typically 6 weeks which
directly impacted clinical trials’ costs. Procuring and deploying new
infrastructure for new services or upgrades were major projects
requiring significant downtime and direct physical presence of IT staff.
3. Existing computing capacity was underutilized, but still required
technical support such as backups and patches; with individualized site
Page 8
based process and procedures. Little automation caused significant
effort for administration. There was a huge amount of IT
administrative effort to manage site specific physical server support,
asset tracking and equipment leases at multiple sites.
4. Lack of IT staff team operating model, consistently automated
architecture, and remote management technologies resulted in process
and procedure inconsistency at any one site and led to severe variance
in service quality and reliability by geography.
5. IT maturity prevented discussion of higher level functions such as
auditable policies and procedures, disaster recovery, redundant
network architectures, and security audits; all required for NIH Clinical
Trail safety compliance.
3.3. After ARCAMIS Deployment
ARCAMIS represents a paradigm shift in our IT philosophy both operationally
and technically. The goal was to move out of a geographically specific,
reactionary mode to a prospective operating model and technical architecture
designed from the ground up to be in alignment with the organizations
growing, dynamic demands for IT services.
Most importantly, we worked with management prospectively to understand
service quality expectations and requirements to scale up to 30 clinical trails
in 7 years. Given management objectives and our limited resources, we
realized a need for a more team centric operating model, providing specificity
of labor. As a result, we logically grouped our human resources into the
Support team and Architecture team. This gave more senior technical talent
the time they needed to re-engineer, build, and migrate to the ARCAMIS
solution while more junior talent continued to focus on reactionary issues.
From a technical perspective, we engineered an architecture that would
eliminate or automate time consuming tasks and improve reliability. By
centralizing all ARCAMIS Managed Infrastructure into bi-coastal, carrier
diverse, redundant, Tier 1 networked, enterprise class datacenters and using
fully “lights-out”, Hewlett Packard Proliant Servers with 4 hour on-site
Page 9
physical support and a remote, IP based, server administration model, we
have dramatically improved service reliability and supportability without
adding administrative staff. The same senior staff now supports twice the
number of physical servers and 20 times the virtual servers. For example, it
is now common for engineers to administrate infrastructure at all seven sites
simultaneously, including hard reboots and physical failures.
With the integration of the VMWare Virtual Infrastructure 3.01 Enterprise
infrastructure virtualization technologies, ARCAMIS reduces the number of
physical servers at our data centers while continuing to meet the
exponentially expanding business server requirements. Less hardware yields
a reduction in initial server hardware costs and saves ongoing data center
lease, power and cooling costs associated with ARCAMIS infrastructure. The
initial capital expenditure was about the same as purchasing physical servers
due to our investment in virtualization and SAN technologies.
Consolidating all server data onto the Network Appliance Storage Area
Network; the ARCAMIS project deployed a 99.998% uptime, 25 TB,
production and disaster recovery cluster in San Francisco and a 25 TB
99.998% uptime production and disaster recovery cluster site in the
Washington, DC metro area. The SAN allows us to reduce cost and
complexity via automation, resulting in dramatic improvements in operations
efficiency. We can more efficiently use what we already own, oversubscribe
disk, and eliminate silos of underutilized storage. Current storage usage at
the primary site is 65%, up from 25% average per server using Direct
Attached Storage (DAS). We can seamlessly scale to 100 terabytes of
storage by simply adding disk shelves, not possible with a server based
approach. Another key benefit of using SAN technology is risk mitigation via
completely automated backup, archival, and offsite replication. File restores
are instantaneous, eliminating the need for human resource intensive and
less reliable tape backup approaches.
Combining the SAN with VMWare Infrastructure 3.01 Enterprise server
virtualization technologies provides reliable, extensible, manageable, high
availability architecture. Adjusting to changing server requirements is simple
Page 10
because of the SAN’s storage expansion and reduction capability for live
volumes and VMWare’s ability to scale from 1 to 4 64-bit CPUs with up to
16GB RAM and 16 network ports per virtual server. Also, oversubscription
allows the ITN to more efficiently use the disk, RAM, and CPU we already
own. We can seamlessly control server, firewall, network and data adds,
removes, and changes without business service interruption. The SAN and
VMWare ESX combination provides excellent performance and reliability using
both Fiber Channel & iSCSI Multipathing for a redundant disk to server access
architecture. For certain applications we can create Highly Available
Clustered Systems truly architected to meet rigorous 99.998% uptime
requirements. VMs boot from the SAN and are replicated locally and off-site
while running. This improves business scalability and agility via accelerated
service deployment and expanded utilization of existing hardware assets.
Physical server maintenance requiring the server to be shut down or rebooted
is done during regular working hours without downtime due to support for
VMotion, the ability to move a running VM from one physical machine to
another. This has greatly reduced off-hours engineer work. The increasing
data security & compliance requirements are also able to be met with the
centralized control provided by the SAN. In our experience, storage
availability determines service availability; automation guarantees service
quality of storage.
Page 11
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
FAS
3050
activity status power
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
FAS
3050
activity status power
16 port FC switch 16 port FC switch
Active/Active High Availablity 25TB Fiber Channel Cluster
Passive Synchronization
Between Sites
VMWare Server
VMWare Server
VMWare Server
VMWare Server
VMWare Server
San Francisco, CA
VMWare Server
VMWare Server
VMWare Server
VMWare Server
VMWare Server
VMWare ESX Server 1 VMWare ESX Server 2
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
FAS
3050
activity status power
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
13 12 11 10 09 08 07 06 05 04 03 02 01 00
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
FAS
3050
activity status power
16 port FC switch 16 port FC switch
Active/Active High Availablity 25TB Fiber Channel Cluster
VMWare Server
VMWare Server
VMWare Server
VMWare Server
VMWare Server
Herndon, VA
VMWare Server
VMWare Server
VMWare Server
VMWare Server
VMWare Server
VMWare ESX Server 1 VMWare ESX Server 2
The ARCAMIS project has proven and demonstrated the many benefits
promised by these new enterprise class technologies. We have significantly
increased the value of IT to our core business, slashed IT operating costs, and
radically improved the quality of our IT service. The ARCAMIS architecture
and operating model is a core competency which other UC organizations can
leverage to achieve similar benefits.
Just some of the benefits resulting from the ARCAMIS project include the
following:
1. Saved hundreds of thousands of dollars and improved security,
reliability, scalability, and deployment time.
Page 12
2. Helped the environment by reducing power consumption by a factor of
20 for a comparable service infrastructure.
3. Improved conformance with federal and state regulations such as
HIPAA and 21 CFR Part 11.
4. Centralized critical infrastructures into Tier 1, redundantly multi-
homed network, enterprise class datacenters. Space utilization at our
8 sites has been consolidated to three data centers using 5 server
racks.
5. Eliminated the inconsistent complexity of our IT infrastructure and
processes/procedures, and ensured uptime for our business critical
applications; even in the event of hardware failures. All new solution
deployments are done based on nationally consistent operating models
and technical architectures.
6. Consolidated data to SAN and VMWare servers. The infrastructure is
architected to be a true 99.998% uptime solution. Our biggest
downtime risk is human error.
7. Any staff member with security privileges can manage any device at
any site from any Internet connected PC; including hardware failures
and power cycles. We efficiently provision, monitor and manage the
infrastructure with a single top console.
8. Standardized virtual server builds, procurement and deployment time
reduced from as much as 6 weeks to 2 hours without investing in new
server hardware.
9. Automated backup, archival, and disaster recovery.
10. Cloned production servers for testing and troubleshooting. Systems
and networks can be cloned while running, with zero downtime and
rebooted in virtual lab environments. Servers can be rotated back into
production with only a few seconds downtime.
11. Average CPU utilization has risen from 5% to 30% while retaining peak
capacity.
12. Disk utilization has risen from 25% to 65%
13. Savings are in the region of $200,000 in the past 12 months. This will
grow exponentially as the architecture scales.
14. Multiple operating systems are supported, including: RedHat LINUX,
MS Windows 2000, MS Windows 2003, with both 32 and 64 bit
Page 13
versions of all operating systems supported. These can all be
deployed on the same physical server, providing us with reduced
dependence on vendor’s proprietary solutions.
15. Reduced support overhead and power consumption of legacy
applications by migrating these into the virtual environment.
3.4. Business Impact
ARCAMIS provides the University of California with a proven case study of
how to implement enterprise class IT infrastructures and operating models for
the benefit of NIH funded clinical research at UCSF. We have accelerated the
time from the bedside to the bench in clinical research by taking the IT
infrastructure out of the clinical trials’ critical path, thereby providing a
positive impact on our core business: preventing and curing human
disease. ARCAMIS is more agile and responsive, having reduced server
acquisition time to a matter of hours rather than weeks. ARCAMIS is
significantly more secure and reliable, providing in the order of 99.998%
technically architected uptime, and we’ve greatly improved the performance
and utilization of our IT assets. We have created hundreds of thousands of
dollars in measurable costs savings. ARCAMIS is environmentally friendly,
significantly reducing our impact on environmental resources such as power
and cooling. ARCAMIS is able to be used as a blue-print for enterprise class
Clinical Research IT infrastructure services throughout the University of
California, at partner research institutions and universities, and the National
Institute of Health.
Page 14
4. Technologies Utilized
4.1. The ARCAMIS Suite
This suite of Academic Research Computing and Analysis Managed
Infrastructure Services (ARCAMIS) includes the following technology
components:
1. ITIL based, nationally consistent, labor specific, team IT operating
model
2. Security model and architecture (including firewalls, intrusion
detection, VPN, automated updates)
3. Enterprise class data center facilities
4. Tier 1, multi-homed, redundant, carrier diverse, networks
5. Virtual CPU, RAM, Network, and Disk resources based on Hewlett
Packard Proliant servers, VMware Infrastructure Enterprise 3.01 and
Network Appliance Storage Area Network (SAN)
6. Various 32 and 64 bit LINUX and Windows Operating Systems
7. Backup, archival and disaster recovery
8. Monitoring, alerting, and reporting
9. IT service management systems
Page 15
4.2. ITIL Team Based Operating Model
The ARCAMIS Operating Model is based on an ITIL best practices, nationally
consistent, team based, and uses specificity of labor. Via our formal,
documented, Infrastructure Lifecycle Process (ILCP) and support policies,
procedures (SOPs), and support documentation such as operating guides and
systems functional specifications, the ARCAMIS infrastructure evolves though
its lifecycles of continuous improvement. Below are samples of the IT Policies
and Procedures used.
IT Policies
Page 16
Standard Operating Procedures
Our goal moving forward is to be a completely ITIL shop in the next 12
months. As you can see from the below organizational chart the ARCAMIS
team is logically grouped into a prospective engineering team, and an
administration and support team.
Page 17
Organizational Chart
Customer Engineer
Laurel Heights/CB
Executive Director,
Information Technology
Manager
Customer Engineering
Customer Engineer
BEA/ITI
Customer Engineer
Parnassus
Level 1 and 2 Support Team
IT Office and
Operations
Manager
Customer Engineer
Parnassus
Systems and
Network Architect
Systems and
Network Engineer
Server and Network
Engineering Team
Level 3 and 4 Support
Systems and
Network Engineer
Customer Engineer
Pittsburgh
4.3. Security Model and Architecture
ARCAMIS is required to meet at minimum the Security Category and Level of
MODERATE for Confidentiality, Integrity, and Availability as defined by the
National Institute of Health. Compliance with this Security Category spans
the entire organization from the initial Concept Proposal phase, through
clinical trial design and approval, into trial operations where patient
information is gathered, including data collection and specimen storage.
Significant amounts of confidential, proprietary and unique patient data are
collected, transferred, and stored in the ARCAMIS infrastructure for analysis
and dissemination by approved parties. Certain parts of the infrastructure are
able to satisfy HIPAA and 21 CFR Part 11 compliance. This becomes
especially important as the ITN and EPGP organizations continue to innovate
and develop new intellectual property which may have significant market
value.
Page 18
Information Security Category Requirements
Exceeding the minimum compliance requirements with this Information
Security Category is achieved by a holistic approach addressing all aspects of
the ARCAMIS personnel, operations, physical locations, networks and
systems. This includes tested, consistently executed, and audited plans,
policies and procedures, and automated, monitored, and logged security
technologies used on a day to day basis. The overall security posture of the
ARCAMIS has many aspects including legal agreements with partners and
employees, personnel background checks and training, organization wide
disaster recovery plans, backup, systems and network security architectures
(firewalls, intrusion detection systems, multiple levels of encryption, etc.),
and detailed documentation requirements.
Consistent with the NIH Application/System Security Plan (SSP) Template for
Applications and General Support Systems and the US Department of Health
and Human Services Official Information Security Program Policy (HHS IRM
Policy 2004-002.001), ARCAMIS maintains a formal information systems
security program to protect the organization’s information resources. This is
called the Information Security and Information Technology Program (ISITP).
ISITP delineates security controls into the four primary categories of
management, operational, technical and standard operating procedures which
structure the organization of the ISITP.
- Management Policies focus on the management of information security
systems and the management of risk for a system. They are techniques and
concerns that are addressed by management, examples include: Capital
Planning and Investment, and Risk Management.
- Operational Policies address security methods focusing on mechanisms
primarily implemented and executed by people (as opposed to systems).
These controls are put in place to improve the security of a particular system
(or group of systems), examples include: Acceptable Use, Personnel
Separation, and Visitor Policies.
Page 19
- Technical Policies focus on security policies that the computer system
executes. The controls can provide automated protection for unauthorized
access or misuse, facilitate detection of security violations, and support
security requirements for applications and data, examples include: password
requirements, automatic account lockout, and firewall policies.
- Standard Operating Procedures (SOPs) focus on logistical procedures
that staff do routinely to ensure ongoing compliance, examples include: IT
Asset Assessment, Server and Network Support, and Systems Administration.
Specifically, the ARCAMIS ISITP includes detailed definitions of the following
Operational and Technical Security Policies.
PERSONNEL SECURITY
Background Investigations
Rules of Behavior
Disciplinary Action
Acceptable Use
Separation of Duties
Least Privilege
Security Education and Awareness
Personnel Separation
RESOURCE MANAGEMENT
Provision of Resources
Human Resources
Infrastructure
PHYSICAL SECURITY
Physical Access
Physical Security
Visitor Policy
MEDIA CONTROL
Media Protection
Media Marking
Sanitization and Disposal of Information
Input/Output Controls
COMMUNICATIONS SECURITY
Voice Communications
Data Communications
Video Teleconferencing
Audio Teleconferencing
Webcast
Voice-Over Internet Protocol
Facsimile
WIRELESS COMMUNICATIONS
SECURITY
Wireless Local Area Network (LAN)
Multifunctional Wireless Devices
EQUIPMENT SECURITY
Workstations
Laptops and Other Portable Computing
Devices
Personally Owned Equipment and Software
Hardware Security
ENVIRONMENTAL SECURITY
Fire Prevention
Supporting Utilities
DATA INTEGRITY
Documentation
NETWORK SECURITY POLICIES
Remote Access and Dial-In
Network Security
Monitoring
Firewall
System-to-System Interconnection
Internet Security
SYSTEMS SECURITY POLICIES
Identification
Password
Access Control
Automatic Account Lockout
Automatic Session Timeout
Warning Banner
Audit Trails
Peer-to-Peer Communications
Patch Management
Cryptography
Malicious Code Protection
Product Assurance
E-Mail Security
Personal E-Mail Accounts
Page 20
These policies serve as the foundation of the ARCAMIS Standard Operating
Procedures and technical infrastructure architectures which when combined,
create a secure environment based security best practices.
Security Infrastructure Architecture
To ensure a hardened Information Security and Information Technology
environment, the ARCAMIS has centralized its critical Information Technology
infrastructures into two Tier 1 data centers. Facilities include: Uninterruptible
Power Supply via backup diesel generators that can keep servers running
indefinitely without direct electric grid power. They are equipped with optimal
environment controls, including sophisticated air conditioning and humidifier
equipment as well as stringent physical security systems. They provide 24x7
Network Operations Center network monitoring and physical security. Each
data center also includes fire suppression systems with water-free fire
protection so as not to damage the servers.
For secure data transport, ARCAMIS provides a carrier diverse, redundant,
secure, reliable, Internet connected, high speed Local Area Network (LAN)
and Wide Area Network (WAN). The ARCAMIS network and Virtual Private
Network (VPN) is the foundation for all the ARCAMIS IT services and used by
every ITN and EPGP stakeholder every day. The high speed WAN is protected
by intrusion detection monitored and logged firewalls at all locations. Firewall
and VPN services are provided by industry leading Microsoft and Cisco
products. All network traffic between ITN sites, desktops, and partner
organizations that travels over public networks is encrypted using at least
128-bit encryption using various security protocols including IPSec, SFTP,
RDC, Kerberos, and others. We also implemented a wildcard based virtual
certificate architecture for all port 443 communications, allowing rapid
deployment of new secured services.
Keeping these systems monitored and patched, ARCAMIS provides IP ping,
SNMP MIB monitoring, specific service monitoring and automated restarts,
hardware monitoring, intrusion detection monitoring, and website monitoring
Page 21
of the ARCAMIS production server environment. Server and end-user
security patches are applied monthly via Software Update Services.
Application and LINUX/Macintosh patches are pushed out on a monthly basis.
We have standardized on McAfee Anti-Virus for virus protection and use
Postini for e-mail SPAM and Virus filtering.
The ITN’s Authoritative Directory uses Microsoft Active Directory and is
exposed via SOAP, RADIUS, and LDAP for cross platform authentication. The
ITN is currently using an Enterprise Certificate Authority (ITNCA) for
certificate based security authentication.
Comprehensive Information Security
The ITN has established mandatory policies, processes, controls, and
procedures to ensure confidentiality, integrity, availability, reliability, and
non-repudiation within the Organization’s infrastructure and its operations. It
is the policy of ARCAMIS that the organization abides by or exceeds the
requirements outlined in ITN Information Security and Information
Technology Program, thereby exceeding the required Security Category and
Level of MODERATE for Confidentiality, Integrity, and Availability outlined
above. In addition, to ensure adequate security, ARCAMIS implements
additional security policies exceeding the minimum requirement, as
appropriate for our specific operational and risk environment as necessary.
4.4. Data Center Facilities
The ITN has centralized its server architecture into two Tier 1 data centers.
The first is located in Herndon, VA with Cogent Communications, and the
second in San Francisco, CA with Level 3 Communications. An additional
research data center is located at the UCSF QB3 facility. Physical access
requires a badge and biometric hand security scanning, and the facilities have
24x7 security staff on-site. Each data center includes redundant
uninterruptible power supplies and backup diesel generators that can keep
each server running indefinitely without direct electric grid power. The centers
provide active server and application monitoring, helping hands and backup
media rotation capabilities. They are equipped with optimal environment
Page 22
controls, including sophisticated air conditioning and humidifier equipment as
well as stringent physical security systems. There are also waterless fire
suppression systems. Power to our racks specifically is provided by four
redundant, monitored PDUs which report exact power usage at a point in time
and alert us if there is a power surge.
Herndon, VA Rack Diagram
G3
HP
ProLiant
ML570
UID
21
Channel 2Channel 2Channel 1
100
1
2
3
4
5
6
7
G3
HP
ProLiant
ML570
UID
21
Channel 2Channel 2Channel 1
100
1
2
3
4
5
6
7
G3
HP
ProLiant
ML570
UID
21
Channel 2Channel 2Channel 1
100
1
2
3
4
5
6
7
UID
1
2
SimplexDuplexchch21
0011
3322
4455Tape
UID
1
2
SimplexDuplexchch21
0011
3322
4455Tape
UID
1
2
SimplexDuplexchch21
0011
3322
4455Tape
UID
1
2
SimplexDuplexchch21
0011
3322
4455Tape
UID
1
2
SimplexDuplexchch21
0011
3322
4455Tape
NetApp
FAS 3020
activity status power
NetApp
FAS 3020
activity status power
UID
HP
ProLiant
DL320
G3
1 2
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
UID
HP
ProLiant
DL320
G3
1 2
UID
HP
ProLiant
DL320
G3
1 2
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
NetworkAppliance
Power
System
Shelf ID
Loop B
Fault
Loop A
72F
DS14
MK2
FC
NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance
72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F
G3
HP
ProLiant
ML570
UID
21
Channel 2Channel 2Channel 1
100
1
2
3
4
5
6
7
G3
HP
ProLiant
ML570
UID
21
Channel 2Channel 2Channel 1
100
1
2
3
4
5
6
7
Page 23
4.5. Internet Connectivity
Servicing the ARCAMIS customer base is a carrier diverse, redundant,
firewalled, reliable, Internet connected high speed network. This network
combined with the Virtual Private Network (VPN) creates the foundation for all
the ARCAMIS services provided.
Internet connectivity is location dependant:
• San Francisco, China Basin and Level 3. – Tier 1 1000mbs Ethernet
connection to the Internet is provided by Cogent Networks. UCSF
provides a 100mbs Ethernet connection to redundant 45mbs OC198
connections, and 100mbs Ethernet to between UCSF campuses.
• San Francisco, Quantitative Biology III Data Center – UCSF network
provides 1000mbs Ethernet connection to redundant 45mbs OC198
connections and 100mbs Ethernet to between UCSF campuses.
• Herndon, VA – Tier 1 100mbs Ethernet connection to the Internet is
provided by Cogent Networks. AT&T provides a 1.5mbs DSL backup
connection.
4.6. Virtual CPU, RAM, Network, and Disk Resources
ARCAMIS uses the Network Appliances Storage Area Network with a 25 TB
HA Cluster in Herndon and a 25 TB disaster recovery site in San Francisco.
This allows us to reduce cost and complexity via automation and operations
efficiency. We can seamlessly control adds, removes, and updates without
business interruption for our critical storage needs. We can more efficiently
use what we already own and eliminate silos of underutilized memory, CPU,
network, and storage. This improves business scalability and agility via
accelerated service deployment and expansion on existing hardware assets.
We can scale to tens of terabyte storage, not possible with a server based
approach. Another key result of using this technology is risk mitigation. We
architecturally automate the elimination of the possibility of critical data loss.
We fully automate backup, archival, restore - productivity loss goes from days
to minutes, to nothing, in event of user error or HW failure. We have
technologically automated smooth business continuance in the event of a
disaster. The increasing ARCAMIS data security and compliance
requirements are able to be met with a SAN. We can handle HIPAA security
Page 24
and compliance requirements. In our experience, storage availability
determines service availability; automation guarantees service quality.
VMware Virtual Infrastructure Enterprise 3.01 (VI3) is virtual infrastructure
software for partitioning, consolidating and managing servers in mission-
critical environments. Ideally suited for enterprise data centers, VI3
minimizes the total cost of ownership of computing infrastructure by
increasing resource utilization and its hardware-independent virtual machines
encapsulated in easy-to-manage files maximize administration flexibility.
VMware ESX Server allows enterprises to: boost x86 server utilization to 60-
80%, provision new systems faster with reduced hardware; decouple
application workloads from underlying physical hardware for increased
flexibility; and dramatically lower the cost of business continuity. ESX Server
supports 64-bit VMs with 16GB of RAM, meeting ARCAMIS’s expanding
server computing requirements.
Combining the SAN with server virtualization provides an extremely reliable,
extensible, manageable, high availability architecture for ARCAMIS. The SAN
provides instantaneous VM backups, restores and provisioning and off-site
disaster recovery and archival. File restores are instantaneous, eliminating
the need for human resource intensive and less reliable client side disk
management applications. Adjusting to changing server requirements is
instantaneous because of the SAN’s storage expansion and reduction
capability for live volumes. Also, oversubscription allows the ITN to
significantly more efficiently use the disk we already own. The SAN VMWare
ESX combination provides excellent performance and reliability using both
Fiber Channel and iSCSI Multi-pathing. VMs boot from the SAN are replicated
locally and off-site while running. For certain applications we can create
Highly Available Clustered Systems even greater than 99.998% uptime.
Finally, server maintenance can be done during regular working hours without
downtime due to support for VMotion, the ability to move a running VM from
one physical machine to another.
Page 25
4.7. Operating Systems Supported
ARCAMIS supports several operating systems, including all flavors of LINUX,
i386 Solaris, and all versions of the Windows operating system.
4.8. Backup, Archival, and Disaster Recovery
ARCAMIS Data Availability, Backup, and Archival is provided by a Storage
Area Network (SAN) with a 25 TB High Availability Cluster in Herndon and a
25 TB disaster recovery site in San Francisco. This SAN houses ARCAMIS
critical clinical data and IT server data. The SAN automates backup, archival,
and restore via NetApp SnapMirror, SnapBackup, and SnapRestore
applications. All critical data at the San Francisco and Herndon sites are
replicated to the other site within 1 hour. In the event of a major disaster at
any ARCAMIS datacenter site, only minimal data (60 minutes) loss can occur
and the critical server infrastructure can be failed over to the other coast’s
facility for business continuance. In addition to the SAN, the ARCAMIS uses
a 7 day incremental backup to offline disk rotation with monthly off-site
stored archives for all production data based on Symantec Veritas software.
4.9. Monitoring, Alerting, and Reporting
We use various monitoring and report technologies and two IT staff do full
infrastructure monitoring audits twice daily, 5 days per week, once at 8:00am
EST and again at 3:00pm PST. We use a 1-800, Priority 1 issue resolution
line that pages and calls 5 senior engineers simultaneously in the event of a
major system failure or issue. We use an on-call rotation schedule that
changes weekly. We use the following technologies: Microsoft Operations
Manager (MOM), WebWatchBot, Brocade Fabric Manager, NetApp Operations
Manger, VMWARE Operations Manager, Cacti, and Oracle, among others.
Cacti Disk Utilization Graph
Below is a sample disk utilization graph.
Page 26
Monitoring Table
Below is a partial list of monitoring we do.
Server 1 Network 1
Customer Defined Transaction Monitoring
ODBC Database Query Verification
Ping Monitoring
SMTP Server and Account Monitoring
POP3 Server and Account Monitoring
FTP Upload/Download Verification
File Existence and Content Monitoring
Disk/Share Usage Monitoring
Microsoft Performance Counters
Microsoft Process Monitoring
Microsoft Services Performance Monitoring
Microsoft Services Availability Monitoring
Event Log Monitoring
HTTP/HTTPS URL Monitoring
Customer Specified Port Monitoring
Active Directory
Exchange Intelligent Message Filter
HP ProLiant Servers
Microsoft .NET Framework
Microsoft Baseline Security Analyzer
Microsoft Exchange Server Best Practices
Analyzer
Microsoft Exchange Server
Microsoft ISA Server
Microsoft Network Load Balancing
Microsoft Office Live Communications Server 2003
Microsoft Office Live Communications Server 2005
Microsoft Office Project Server
Microsoft Office SharePoint Portal Server 2003
Microsoft Operations Manager MPNotifier
Microsoft Operations Manager
Page 27
Microsoft Password Change Notification Service
Microsoft SQL Server
Microsoft Web Sites and Services MP
Microsoft Windows Base OS
Microsoft Windows DFS Replication
Microsoft Windows Distributed File Systems
Microsoft Windows Distributed File Systems
Microsoft Windows DHCP
Microsoft Windows Group Policy
Microsoft Windows Internet Information Services
Microsoft Windows RRAS
Microsoft Windows System Resource Manager
Microsoft Windows Terminal Services
Microsoft Windows Ultrasound
NetApp
Volume Utilization
Global Status Indicator
Hardware Event Log
Visual Inspection
Ambient Temperature
Temperature Trending
Location WAN Connectivity
4.10.IT Service Management Systems
We use Remedy and Track-IT Enterprise for Ticketing, Asset Tracking, and
Purchasing.
Page 28
5. Implementation Timeframe
5.1. Project Timeline
Page 29
6. Customer Testimonials
“ARCAMIS provides services that allow the ITN knowledge workers to focus
on answering the difficult scientific questions in immune tolerance; we don’t
waste time on basic IT infrastructure functions. ARCAMIS allows me to be
confident our research patient data is stored in a secure, reliable and
responsive IT infrastructure. For example, last week we did a demonstration
to the Network Executive Committee of our Informatics data management
and collaboration portal in real-time. This included the National Institute of
Health senior management responsible for our funding… it all worked
perfectly. This entire application was built on ARCAMIS.”
Jeffrey A. Bluestone, Ph.D.
Director, UCSF Diabetes Center
Director, Immune Tolerance Network
A.W. and Mary Clausen Distinguished Professor of Medicine, Pathology,
Microbiology and Immunology
“With ARCAMIS we are well positioned to meet the rigorous IT requirements
of an NIH funded study. Within weeks of project funding from the NIH, our
entire secure research computing network and server infrastructure of more
than 10 servers was built, our developers finished the public website, and we
began work on the Patient Recruitment portal. That would have taken at least
6 months if I had to hire a team to procure and build it ourselves.
Accelerating scientific progress in neurology is core to everything we do;
ARCAMIS has been an important part of what we are currently doing.”
Dr. Daniel H. Lowenstein, M.D.
Professor of Neurology, UCSF and
Director, Physician-Scientist Education and Training Programs
Director, Epilepsy Phenome Genome Project
“With the investment in ARCAMIS, UCSF and the ITN can confidently partner
with other leading medical research universities across the country. At the
ITN we depend on the on-demand, services based, scalable computing
capacity of ARCAMIS every day to enable our collaborative data analysis and
Informatics data visualization applications.”
Mark Musen, Ph.D.
Director, Medical Informatics Department
Stanford University
Deputy Director, Immune Tolerance Network
Page 30
Appendices
Appendix A – Capabilities Summary of the ARCAMIS Suite
Fundamentals
• 99.998% production solution uptime guaranteed via Service Level
Agreement.
• Managed multi-homed, Tier 1 network (Zero Downtime SLA)
• High speed 1000mbs connectivity to UCSF network space.
• Bi-costal world-class data centers hosted with Level 3 and Cogent
communications with redundant power and HVAC systems
• Managed DNS or use UCSF DNS
• Managed Active Directory for “Production Servers” and integration with
UCSF CAMPUS AD via trust.
• Phone, e-mail and web based ticketing system to track all issues
• Mature purchasing services with purchases charged to correct account
Monitoring & Issue Response
• 8am EST to 5pm PST business day access to live support personnel
• 24/7/365 with one primary “on call” engineer, paging off hours access,
with a 1-800 P1 issue number that rings 5 infrastructure engineers
simultaneously.
• Microsoft Operations Manager monitoring (CPU, RAM, disk, event log,
ping, ports and services)
• Application script response monitoring for web applications, including
SSL via WebWatchBot 5
• HP Remote Insight Manager hardware monitoring with 4 hour vendor
response on all servers
• NetApp corporate monitoring and 4 hour time to resolution will fully
stocked parts depot on Storage Area Network.
• 24x7 staffed datacenters with secure physical access to all servers
• 24x7 staffed Network Operating Center for WAN
Page 31
• Notification preferences and standard response specifications can be
customized
Backup, Restore and Disaster Recovery/Business Continuance
• Symantec Backup Exec server agents for Oracle, SQL, MySQL, and
Exchange servers with 7 nightly incremental backups.
• 14 local daily snapshots of full “crash consistent” server state
• Hourly off-site snapshots of full “crash consistent” server state with 40
hourly restore points for DR
• Monthly archive of entire infrastructure, that rolls to quarterly after 3
months.
Reporting
• Online Ticketing
• Detailed Backup Utilization
• Bandwidth Utilization
• Infrastructure uptime reports
• CPU, RAM, Network, and Disk utilization reports
Server & Device Administration
• Customized Specifications using VMWare Infrastructure 3.01
technology up to 4 64-Bit, 3.0Ghz Intel Xeon Processors, 16GB RAM,
1gbs Network with 2TB disk volumes max.
• Based on HP Proliant Enterprise servers. ML570 8 processors per
server and DL380 series. 7000c Blade servers
• IP everywhere, full remote management of every device, including full
KVM via separate backLAN network.
• Microsoft MCCA licensing on key server components,
• Full license and asset tracking
• Senior System Administrator troubleshooting
• Optional high availability (99.999% uptime) server capabilities via
Veritas and Microsoft Clustering
Managed Security
Page 32
• Automated OS and major application patching
• Managed Network-based Intrusion Detection
• Managed policy based enterprise firewall using Cisco and Microsoft
technologies
• Managed VPN access
Page 33
Appendix B – Excerpt from the ARCAMIS Systems Functional
Specification
Centralized Virtual Infrastructure Administration
ARCAMIS can move virtual machines between hosts, create new machines from pre-
built templates, and control existing virtual machine configurations. We also can
gather event log information from a central location for all VMware hosts; have an
increased ability to identify asset utilization and troubleshoot warnings prior to
problems occurring; have easier management of physical system bios updates and
firmware upgrades; and have centralized management of all virtual machines within
the network.
The Virtual Center management interface allows us to centrally manage and monitor
our entire physical and virtual infrastructure from one place:
Hosts Clusters and Resource Pools:
By organizing physical hosts into clusters of two or more, we are able to distribute
the aggregate resources as if it were one physical host. For example a single server
might be configured with 4 dual core 2.7 GHz processors and 24 GB of RAM. By
clustering two servers together, the resources are presented as 21 GHz and 48 GB of
RAM which can be provisioned as needed to multiple guests.
Page 34
DRS and VMotion:
VMotion enables us to migrate live servers from one physical host to another which
allows for physical host maintenance to be performed with no impact to production
service uptime. Dynamic Resource Scheduling (DRS) is used to set different resource
allocation policies for different classes of services which are automatically monitored
and enforced using the aggregate resources of the cluster.

More Related Content

What's hot

HCMDSS11 Workshop Agenda
HCMDSS11 Workshop AgendaHCMDSS11 Workshop Agenda
HCMDSS11 Workshop AgendamLab
 
Article Detail 7 - Domestic Preparedness
Article Detail 7 - Domestic PreparednessArticle Detail 7 - Domestic Preparedness
Article Detail 7 - Domestic PreparednessDiana Hopkins
 
NIST Malware Attack Prevention SP 800-83
NIST Malware Attack Prevention  SP 800-83NIST Malware Attack Prevention  SP 800-83
NIST Malware Attack Prevention SP 800-83David Sweigert
 
CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES
 CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES
CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES Nabin Malakar
 
Mobile Device Connectivity
Mobile Device ConnectivityMobile Device Connectivity
Mobile Device ConnectivityNuvon, Inc.
 
Case_Study_EPIC_Ex_Web
Case_Study_EPIC_Ex_WebCase_Study_EPIC_Ex_Web
Case_Study_EPIC_Ex_WebBill Cassidy
 
Fundamentals of Medical Device Connectivity
Fundamentals of Medical Device ConnectivityFundamentals of Medical Device Connectivity
Fundamentals of Medical Device ConnectivityNuvon, Inc.
 
FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...
FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...
FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...Power System Operation
 
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Artificial Intelligence Institute at UofSC
 
Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0Rex Osborn
 
Tct presentation final
Tct presentation finalTct presentation final
Tct presentation finalBGuez
 

What's hot (15)

HCMDSS11 Workshop Agenda
HCMDSS11 Workshop AgendaHCMDSS11 Workshop Agenda
HCMDSS11 Workshop Agenda
 
Article Detail 7 - Domestic Preparedness
Article Detail 7 - Domestic PreparednessArticle Detail 7 - Domestic Preparedness
Article Detail 7 - Domestic Preparedness
 
NIST Malware Attack Prevention SP 800-83
NIST Malware Attack Prevention  SP 800-83NIST Malware Attack Prevention  SP 800-83
NIST Malware Attack Prevention SP 800-83
 
CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES
 CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES
CREATING A REGIONAL PM2.5 MAP BY FUSING SATELLITE AND KRIGING ESTIMATES
 
Mobile Device Connectivity
Mobile Device ConnectivityMobile Device Connectivity
Mobile Device Connectivity
 
Case_Study_EPIC_Ex_Web
Case_Study_EPIC_Ex_WebCase_Study_EPIC_Ex_Web
Case_Study_EPIC_Ex_Web
 
Sgci data west 12-15-16
Sgci data west 12-15-16Sgci data west 12-15-16
Sgci data west 12-15-16
 
Nanotechnology
NanotechnologyNanotechnology
Nanotechnology
 
Sgci data west 12-15-16
Sgci data west 12-15-16Sgci data west 12-15-16
Sgci data west 12-15-16
 
Fundamentals of Medical Device Connectivity
Fundamentals of Medical Device ConnectivityFundamentals of Medical Device Connectivity
Fundamentals of Medical Device Connectivity
 
FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...
FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...
FRAMEWORK FOR EPU OPERATORS TO MANAGE THE RESPONSE TO A CYBER-INITIATED THREA...
 
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...Evaluating a Potential Commercial Tool for Healthcare Application for People ...
Evaluating a Potential Commercial Tool for Healthcare Application for People ...
 
Threat landscape 4.0
Threat landscape 4.0Threat landscape 4.0
Threat landscape 4.0
 
Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0Imaging evolution-himss-middle east09-v5.0
Imaging evolution-himss-middle east09-v5.0
 
Tct presentation final
Tct presentation finalTct presentation final
Tct presentation final
 

Viewers also liked

UCSF12 - SOP_Educational_Technology_Award_2012
UCSF12 - SOP_Educational_Technology_Award_2012UCSF12 - SOP_Educational_Technology_Award_2012
UCSF12 - SOP_Educational_Technology_Award_2012Michael Williams
 
Sunu111
Sunu111Sunu111
Sunu111GRAT3X
 
UCSF08 - ITN Concept Proposal_Award_2008
UCSF08 - ITN Concept Proposal_Award_2008UCSF08 - ITN Concept Proposal_Award_2008
UCSF08 - ITN Concept Proposal_Award_2008Michael Williams
 
UCSF08 - HD Video Conferencing Deployment_Award_2008
UCSF08 - HD Video Conferencing Deployment_Award_2008UCSF08 - HD Video Conferencing Deployment_Award_2008
UCSF08 - HD Video Conferencing Deployment_Award_2008Michael Williams
 
UCSF08 - EPGP_Pharmacogenomics_Award_2008_Submission
UCSF08 - EPGP_Pharmacogenomics_Award_2008_SubmissionUCSF08 - EPGP_Pharmacogenomics_Award_2008_Submission
UCSF08 - EPGP_Pharmacogenomics_Award_2008_SubmissionMichael Williams
 
Côté Agglo n°13
Côté Agglo n°13Côté Agglo n°13
Côté Agglo n°13Agglo
 
Otrm presentation ad linked in
Otrm presentation ad linked inOtrm presentation ad linked in
Otrm presentation ad linked ingreenjaguar
 
Mobotrax linkedin ppt d 2
Mobotrax linkedin ppt d 2Mobotrax linkedin ppt d 2
Mobotrax linkedin ppt d 2greenjaguar
 
Moboquip linkedin ppt c 2
Moboquip linkedin ppt c 2Moboquip linkedin ppt c 2
Moboquip linkedin ppt c 2greenjaguar
 

Viewers also liked (14)

UCSF12 - SOP_Educational_Technology_Award_2012
UCSF12 - SOP_Educational_Technology_Award_2012UCSF12 - SOP_Educational_Technology_Award_2012
UCSF12 - SOP_Educational_Technology_Award_2012
 
Sunu111
Sunu111Sunu111
Sunu111
 
UCSF08 - ITN Concept Proposal_Award_2008
UCSF08 - ITN Concept Proposal_Award_2008UCSF08 - ITN Concept Proposal_Award_2008
UCSF08 - ITN Concept Proposal_Award_2008
 
UCSF08 - HD Video Conferencing Deployment_Award_2008
UCSF08 - HD Video Conferencing Deployment_Award_2008UCSF08 - HD Video Conferencing Deployment_Award_2008
UCSF08 - HD Video Conferencing Deployment_Award_2008
 
Resume-Justin new
Resume-Justin newResume-Justin new
Resume-Justin new
 
UCSF08 - EPGP_Pharmacogenomics_Award_2008_Submission
UCSF08 - EPGP_Pharmacogenomics_Award_2008_SubmissionUCSF08 - EPGP_Pharmacogenomics_Award_2008_Submission
UCSF08 - EPGP_Pharmacogenomics_Award_2008_Submission
 
604 egypt
604 egypt604 egypt
604 egypt
 
Côté Agglo n°13
Côté Agglo n°13Côté Agglo n°13
Côté Agglo n°13
 
Article 1
Article 1Article 1
Article 1
 
Article 6
Article 6Article 6
Article 6
 
Exercice intégrales
Exercice intégralesExercice intégrales
Exercice intégrales
 
Otrm presentation ad linked in
Otrm presentation ad linked inOtrm presentation ad linked in
Otrm presentation ad linked in
 
Mobotrax linkedin ppt d 2
Mobotrax linkedin ppt d 2Mobotrax linkedin ppt d 2
Mobotrax linkedin ppt d 2
 
Moboquip linkedin ppt c 2
Moboquip linkedin ppt c 2Moboquip linkedin ppt c 2
Moboquip linkedin ppt c 2
 

Similar to UCSF07 - Research and HPC Infrastructure_Award_2007

White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...EMC
 
Utilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingBIT002
 
Data driven systems medicine article
Data driven systems medicine articleData driven systems medicine article
Data driven systems medicine articlemntbs1
 
University of California
University of CaliforniaUniversity of California
University of CaliforniaVideoguy
 
ScienceDirectAvailable online at www.sciencedirect.com
ScienceDirectAvailable online at www.sciencedirect.comScienceDirectAvailable online at www.sciencedirect.com
ScienceDirectAvailable online at www.sciencedirect.comdaniatrappit
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsInnovation Agency
 
Resume Shane Milam Sep 2015 Sans References
Resume Shane Milam Sep 2015 Sans ReferencesResume Shane Milam Sep 2015 Sans References
Resume Shane Milam Sep 2015 Sans ReferencesShane Milam
 
Pistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexus
Pistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexusPistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexus
Pistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexusPistoia Alliance
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingDenodo
 
Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...IJMIT JOURNAL
 
Cloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences ResearchCloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences ResearchInterpretOmics
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchEagle Genomics
 
Intel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analyticsIntel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analyticsCarestream
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionChris Dwan
 
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...CTSI at UCSF
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineCloudera, Inc.
 

Similar to UCSF07 - Research and HPC Infrastructure_Award_2007 (20)

White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
White Paper: Life Sciences at RENCI, Big Data IT to Manage, Decipher and Info...
 
Utilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group settingUtilization of virtual microscopy in a cooperative group setting
Utilization of virtual microscopy in a cooperative group setting
 
Data driven systems medicine article
Data driven systems medicine articleData driven systems medicine article
Data driven systems medicine article
 
University of California
University of CaliforniaUniversity of California
University of California
 
ScienceDirectAvailable online at www.sciencedirect.com
ScienceDirectAvailable online at www.sciencedirect.comScienceDirectAvailable online at www.sciencedirect.com
ScienceDirectAvailable online at www.sciencedirect.com
 
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health SystemsDr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
Dr Dennis Kehoe- Connected Health Cities: Using Learning Health Systems
 
Wincere Inc.
Wincere Inc.Wincere Inc.
Wincere Inc.
 
Resume Shane Milam Sep 2015 Sans References
Resume Shane Milam Sep 2015 Sans ReferencesResume Shane Milam Sep 2015 Sans References
Resume Shane Milam Sep 2015 Sans References
 
Informatics
Informatics Informatics
Informatics
 
Pistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexus
Pistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexusPistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexus
Pistoia Alliance US Conference 2015 - 1.3.2 New member introductions - DNAnexus
 
Data Virtualization Modernizes Biobanking
Data Virtualization Modernizes BiobankingData Virtualization Modernizes Biobanking
Data Virtualization Modernizes Biobanking
 
Research Poster
Research PosterResearch Poster
Research Poster
 
Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...Table of Content - International Journal of Managing Information Technology (...
Table of Content - International Journal of Managing Information Technology (...
 
Evidence-based Healthcare IT
Evidence-based Healthcare ITEvidence-based Healthcare IT
Evidence-based Healthcare IT
 
Cloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences ResearchCloud Computing and Innovations for Optimizing Life Sciences Research
Cloud Computing and Innovations for Optimizing Life Sciences Research
 
Expert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational ResearchExpert Panel on Data Challenges in Translational Research
Expert Panel on Data Challenges in Translational Research
 
Intel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analyticsIntel next-generation-medical-imaging-data-and-analytics
Intel next-generation-medical-imaging-data-and-analytics
 
Production Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on ProductionProduction Bioinformatics, emphasis on Production
Production Bioinformatics, emphasis on Production
 
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
UCSF Informatics Day 2014 - Sorena Nadaf, "Translational Informatics OnCore C...
 
A Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision MedicineA Modern Data Strategy for Precision Medicine
A Modern Data Strategy for Precision Medicine
 

UCSF07 - Research and HPC Infrastructure_Award_2007

  • 1. University of California Larry L. Sautter Award Submission Innovation in Information Technology at the University of California, San Francisco Submitted By: Mr. Michael Williams, MA University of California Executive Director, Information Technology UC San Francisco Diabetes Center, Immune Tolerance Network and Chief Information Officer UC San Francisco Neurology, Epilepsy Phenome/Genome Project Telephone: (415) 860-3581 Email: mwilliams@immunetolerance.org Date Submitted: Friday, May 18, 2007
  • 2. Page 1 Table of Contents 1. PROJECT TEAM...........................................................................................2 1.1. TEAM LEADERS ........................................................................................2 1.2. TEAM MEMBERS .......................................................................................2 2. PROJECT SUMMARY AND SIGNIFICANCE...............................................4 3. PROJECT DESCRIPTION ............................................................................6 3.1. BACKGROUND INFORMATION .....................................................................6 3.2. SITUATION PRIOR TO ARCAMIS ...............................................................7 3.3. AFTER ARCAMIS DEPLOYMENT ...............................................................8 3.4. BUSINESS IMPACT...................................................................................13 4. TECHNOLOGIES UTILIZED.......................................................................14 4.1. THE ARCAMIS SUITE ............................................................................14 4.2. ITIL TEAM BASED OPERATING MODEL.....................................................15 4.3. SECURITY MODEL AND ARCHITECTURE.....................................................17 4.4. DATA CENTER FACILITIES .......................................................................21 4.5. INTERNET CONNECTIVITY.........................................................................23 4.6. VIRTUAL CPU, RAM, NETWORK, AND DISK RESOURCES ..........................23 4.7. OPERATING SYSTEMS SUPPORTED ..........................................................25 4.8. BACKUP, ARCHIVAL, AND DISASTER RECOVERY .......................................25 4.9. MONITORING, ALERTING, AND REPORTING................................................25 4.10. IT SERVICE MANAGEMENT SYSTEMS ....................................................27 5. IMPLEMENTATION TIMEFRAME ..............................................................28 5.1. PROJECT TIMELINE .................................................................................28 6. CUSTOMER TESTIMONIALS.....................................................................29 APPENDICES.....................................................................................................30 APPENDIX A – CAPABILITIES SUMMARY OF THE ARCAMIS SUITE ........................30 APPENDIX B – EXCERPT FROM THE ARCAMIS SYSTEMS FUNCTIONAL SPECIFICATION..................................................................................................33
  • 3. Page 2 1. Project Team 1.1. Team Leaders Michael Williams, M.A. Executive Director, Information Technology UC San Francisco Diabetes Center, Immune Tolerance Network and Chief Information Officer UC San Francisco Neurology, Epilepsy Phenome/Genome Project Gary Kuyat Senior Systems Architect, Information Technology UC San Francisco Diabetes Center, Immune Tolerance Network and UC San Francisco Neurology, Epilepsy Phenome/Genome Project 1.2. Team Members Immune Tolerance Network Information Technology: Jeff Angst Project Manager Lijo Neelankavil Systems Engineer Diabetes Center Information Technology: Aaron Gannon Systems Engineer Project Sponsors: Michael Williams, M.A. Executive Director, Information Technology, Immune Tolerance Network
  • 4. Page 3 Jeff Bluestone, Ph.D. Director Diabetes Center and Immune Tolerance Network Dr. Daniel Lowenstein, M.D. Department of Neurology at UCSF, Director of the UCSF Epilepsy Center Dr. Mark A. Musen, M.D., Ph.D. Professor; Head, Stanford Medical Informatics Dr. Hugh Auchincloss, MD. Chief Operating Officer, Immune Tolerance Network (at time of project) Currently - Principal Deputy Director of NIAID at NIH
  • 5. Page 4 2. Project Summary and Significance By deploying the Advanced Research Computing and Analysis Managed Infrastructure Services (ARCAMIS) suite, the Immune Tolerance Network (ITN) and Epilepsy Phenome Genome Project (EPGP) at the University of California, San Francisco (UCSF) has implemented multiple Tier 1 networks and physically secured enterprise class datacenters, storage area network (SAN) data consolidation, and server virtualization to achieve to achieve a centralized, scalable network and system architecture that is responsive, reliable, and secure. This is combined with a nationally consistent, team centric operating model based on Information Technology Infrastructure Language (ITIL) best practices. Our deployed solution is compliant with applicable confidentiality regulations and assures 24 hour business continuance with no loss of data in the event of a major disaster. ARCAMIS has also provided significant savings on IT costs. Over the last 3 years we have efficiently met the constantly expanding demands for IT resources by using virtualization of disk, CPU, RAM, network, and ultimately servers. ARCAMIS has allowed us to provision and support hundreds of production, staging, testing, and development servers at a ratio of 25 guests to one physical host. By using IP remote management technologies that do not require physical presence, server consolidation and virtualization, combined with SAN based thin-provisioning of storage; we have effectively untied infrastructure upgrades from service delivery cycles. Furthermore, centralizing storage to a Storage Area Network (SAN) has given us the ability to provide real-time server backups (no backup window) and hourly disaster recovery snapshots to a Washington, DC, disaster recovery (DR) site for business continuance within hours of a disaster. ARCAMIS provides the University of California with a proven case study of how to implement enterprise class IT infrastructures and operating models for the benefit of NIH funded clinical research at UCSF. We have accelerated the
  • 6. Page 5 time from the bedside to the bench in clinical research by taking the IT infrastructure out of the clinical trials’ critical path, thereby providing a positive impact on our core business: preventing and curing human disease. ARCAMIS is more agile and responsive, having reduced server acquisition time to a matter of hours rather than weeks. ARCAMIS is significantly more secure and reliable, providing in the order of 99.998% technically architected uptime, and we’ve greatly improved the performance and utilization of our IT assets. We have created hundreds of thousands of dollars in measurable costs savings. ARCAMIS is environmentally friendly, significantly reducing our impact on environmental resources such as power and cooling. ARCAMIS is able to be used as a blue-print for enterprise class Clinical Research IT infrastructure services throughout the University of California, at partner research institutions and universities, and the National Institute of Health. The technologies used: Hewlett Packard Proliant Servers and 7000c Series Blades, VMWare Virtual Infrastructure Enterprise 3.01, Network Appliance FAS3020 Storage Area Network, Cisco, Brocade, Red Hat LINUX Enterprise, and Microsoft Windows Server 2003, among others.
  • 7. Page 6 3. Project Description 3.1. Background Information The mission of the Immune Tolerance Network (ITN) is to prevent and cure human disease. Based at the University of California, San Francisco (UCSF), the ITN is a collaborative research project that seeks out, develops and performs clinical trials and biological assays of immune tolerance. ITN supported researchers are developing new approaches to induce, maintain, and monitor tolerance with the goal of designing new immune therapies for kidney and islet transplantation, autoimmune diseases and allergy and asthma. Key to our success is the ability to collect, store and analyze the huge amount of data collected on ITN’s 30+ global clinical trials at 90+ medical centers, in a secure and effective manner, so a reliable, scalable and adaptable IT infrastructure is paramount in this endeavor. The ITN is in the 7th year of 14-year contracts from the NIH, National Institute of Allergy and Infectious Diseases (NIAID), the National Institute of Diabetes and Digestive and Kidney Disorders (NIDDK) and the Juvenile Diabetes Research Foundation. The Epilepsy Phenome/Genome Project (EPGP) studies the complex genetic factors that underlie some of the most common forms of epilepsy; bringing together 50 researchers and clinicians from 15 medical centers throughout the US. The overall strategy of EPGP is to collect detailed, high quality phenotypic information on 3,750 epilepsy patients and 3,000 controls, and to use state-of-the-art genomic and computational methods to identify the contribution of genetic variation to: the epilepsy phenotype, developmental anomalies of the brain, and the varied therapeutic response of patients treated with AEDs. This initial 5 year grant is being funded by the NIH, National Institute of Neurological Disorders and Stroke (NINDS). The ITN and EPGP turned to computing infrastructure centralization in Tier 1 networked enterprise class datacenters, virtualization, and data consolidation
  • 8. Page 7 onto a Storage Area Network (SAN) with off-site disaster recovery replication to address these challenges. Combined with an ITIL, team based, nationally consistent operating model leveraging specificity of labor; we are in a position to efficiently and scaleably respond to the increasing demands of the organization and rapidly adapt the IT infrastructure to dynamic management goals. This is accomplished while minimizing costs and maintaining requisite quality: we have a true high availability architecture, assuring zero data loss. 3.2. Situation Prior to ARCAMIS Like most of today’s geographically dispersed IT organizations, we were faced with the challenge of providing IT services in a timely, consistent, and cost effective manner with high customer satisfaction. Unlike other organizations, ITN and EPGP have many M.D. and Ph.D. clinical research knowledge workers with higher then normal, computationally intensive, IT requirements. Escalating site-specific IT infrastructure costs, unpredicted downtime, geographically inconsistent process and procedure, and lack of a team based operating model were among the challenges being faced to support such a multi-site infrastructure. There was a general sense that IT could do better. Risk of data loss was real. Dynamically growing demands were making it more difficult to consistently provide high IT service quality, site IT staff were largely reactionary and isolated. Prior to the ARCAMIS deployment, the IT infrastructure faced many challenges: 1. High costs of running and managing numerous physical servers at inconsistent, multiple-site, server rooms, such as power consumption with poor reliability, sub-standard cooling, and poorly laid out physical space. Intermittent and unexpected local facility downtime was common. Global website services were served out of office servers connected via single T1 lines. 2. Lead time for delivering new services was typically 6 weeks which directly impacted clinical trials’ costs. Procuring and deploying new infrastructure for new services or upgrades were major projects requiring significant downtime and direct physical presence of IT staff. 3. Existing computing capacity was underutilized, but still required technical support such as backups and patches; with individualized site
  • 9. Page 8 based process and procedures. Little automation caused significant effort for administration. There was a huge amount of IT administrative effort to manage site specific physical server support, asset tracking and equipment leases at multiple sites. 4. Lack of IT staff team operating model, consistently automated architecture, and remote management technologies resulted in process and procedure inconsistency at any one site and led to severe variance in service quality and reliability by geography. 5. IT maturity prevented discussion of higher level functions such as auditable policies and procedures, disaster recovery, redundant network architectures, and security audits; all required for NIH Clinical Trail safety compliance. 3.3. After ARCAMIS Deployment ARCAMIS represents a paradigm shift in our IT philosophy both operationally and technically. The goal was to move out of a geographically specific, reactionary mode to a prospective operating model and technical architecture designed from the ground up to be in alignment with the organizations growing, dynamic demands for IT services. Most importantly, we worked with management prospectively to understand service quality expectations and requirements to scale up to 30 clinical trails in 7 years. Given management objectives and our limited resources, we realized a need for a more team centric operating model, providing specificity of labor. As a result, we logically grouped our human resources into the Support team and Architecture team. This gave more senior technical talent the time they needed to re-engineer, build, and migrate to the ARCAMIS solution while more junior talent continued to focus on reactionary issues. From a technical perspective, we engineered an architecture that would eliminate or automate time consuming tasks and improve reliability. By centralizing all ARCAMIS Managed Infrastructure into bi-coastal, carrier diverse, redundant, Tier 1 networked, enterprise class datacenters and using fully “lights-out”, Hewlett Packard Proliant Servers with 4 hour on-site
  • 10. Page 9 physical support and a remote, IP based, server administration model, we have dramatically improved service reliability and supportability without adding administrative staff. The same senior staff now supports twice the number of physical servers and 20 times the virtual servers. For example, it is now common for engineers to administrate infrastructure at all seven sites simultaneously, including hard reboots and physical failures. With the integration of the VMWare Virtual Infrastructure 3.01 Enterprise infrastructure virtualization technologies, ARCAMIS reduces the number of physical servers at our data centers while continuing to meet the exponentially expanding business server requirements. Less hardware yields a reduction in initial server hardware costs and saves ongoing data center lease, power and cooling costs associated with ARCAMIS infrastructure. The initial capital expenditure was about the same as purchasing physical servers due to our investment in virtualization and SAN technologies. Consolidating all server data onto the Network Appliance Storage Area Network; the ARCAMIS project deployed a 99.998% uptime, 25 TB, production and disaster recovery cluster in San Francisco and a 25 TB 99.998% uptime production and disaster recovery cluster site in the Washington, DC metro area. The SAN allows us to reduce cost and complexity via automation, resulting in dramatic improvements in operations efficiency. We can more efficiently use what we already own, oversubscribe disk, and eliminate silos of underutilized storage. Current storage usage at the primary site is 65%, up from 25% average per server using Direct Attached Storage (DAS). We can seamlessly scale to 100 terabytes of storage by simply adding disk shelves, not possible with a server based approach. Another key benefit of using SAN technology is risk mitigation via completely automated backup, archival, and offsite replication. File restores are instantaneous, eliminating the need for human resource intensive and less reliable tape backup approaches. Combining the SAN with VMWare Infrastructure 3.01 Enterprise server virtualization technologies provides reliable, extensible, manageable, high availability architecture. Adjusting to changing server requirements is simple
  • 11. Page 10 because of the SAN’s storage expansion and reduction capability for live volumes and VMWare’s ability to scale from 1 to 4 64-bit CPUs with up to 16GB RAM and 16 network ports per virtual server. Also, oversubscription allows the ITN to more efficiently use the disk, RAM, and CPU we already own. We can seamlessly control server, firewall, network and data adds, removes, and changes without business service interruption. The SAN and VMWare ESX combination provides excellent performance and reliability using both Fiber Channel & iSCSI Multipathing for a redundant disk to server access architecture. For certain applications we can create Highly Available Clustered Systems truly architected to meet rigorous 99.998% uptime requirements. VMs boot from the SAN and are replicated locally and off-site while running. This improves business scalability and agility via accelerated service deployment and expanded utilization of existing hardware assets. Physical server maintenance requiring the server to be shut down or rebooted is done during regular working hours without downtime due to support for VMotion, the ability to move a running VM from one physical machine to another. This has greatly reduced off-hours engineer work. The increasing data security & compliance requirements are also able to be met with the centralized control provided by the SAN. In our experience, storage availability determines service availability; automation guarantees service quality of storage.
  • 12. Page 11 NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F FAS 3050 activity status power NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F FAS 3050 activity status power 16 port FC switch 16 port FC switch Active/Active High Availablity 25TB Fiber Channel Cluster Passive Synchronization Between Sites VMWare Server VMWare Server VMWare Server VMWare Server VMWare Server San Francisco, CA VMWare Server VMWare Server VMWare Server VMWare Server VMWare Server VMWare ESX Server 1 VMWare ESX Server 2 NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F FAS 3050 activity status power NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance 13 12 11 10 09 08 07 06 05 04 03 02 01 00 Power System Shelf ID Loop B Fault Loop A 72F DS14 NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F FAS 3050 activity status power 16 port FC switch 16 port FC switch Active/Active High Availablity 25TB Fiber Channel Cluster VMWare Server VMWare Server VMWare Server VMWare Server VMWare Server Herndon, VA VMWare Server VMWare Server VMWare Server VMWare Server VMWare Server VMWare ESX Server 1 VMWare ESX Server 2 The ARCAMIS project has proven and demonstrated the many benefits promised by these new enterprise class technologies. We have significantly increased the value of IT to our core business, slashed IT operating costs, and radically improved the quality of our IT service. The ARCAMIS architecture and operating model is a core competency which other UC organizations can leverage to achieve similar benefits. Just some of the benefits resulting from the ARCAMIS project include the following: 1. Saved hundreds of thousands of dollars and improved security, reliability, scalability, and deployment time.
  • 13. Page 12 2. Helped the environment by reducing power consumption by a factor of 20 for a comparable service infrastructure. 3. Improved conformance with federal and state regulations such as HIPAA and 21 CFR Part 11. 4. Centralized critical infrastructures into Tier 1, redundantly multi- homed network, enterprise class datacenters. Space utilization at our 8 sites has been consolidated to three data centers using 5 server racks. 5. Eliminated the inconsistent complexity of our IT infrastructure and processes/procedures, and ensured uptime for our business critical applications; even in the event of hardware failures. All new solution deployments are done based on nationally consistent operating models and technical architectures. 6. Consolidated data to SAN and VMWare servers. The infrastructure is architected to be a true 99.998% uptime solution. Our biggest downtime risk is human error. 7. Any staff member with security privileges can manage any device at any site from any Internet connected PC; including hardware failures and power cycles. We efficiently provision, monitor and manage the infrastructure with a single top console. 8. Standardized virtual server builds, procurement and deployment time reduced from as much as 6 weeks to 2 hours without investing in new server hardware. 9. Automated backup, archival, and disaster recovery. 10. Cloned production servers for testing and troubleshooting. Systems and networks can be cloned while running, with zero downtime and rebooted in virtual lab environments. Servers can be rotated back into production with only a few seconds downtime. 11. Average CPU utilization has risen from 5% to 30% while retaining peak capacity. 12. Disk utilization has risen from 25% to 65% 13. Savings are in the region of $200,000 in the past 12 months. This will grow exponentially as the architecture scales. 14. Multiple operating systems are supported, including: RedHat LINUX, MS Windows 2000, MS Windows 2003, with both 32 and 64 bit
  • 14. Page 13 versions of all operating systems supported. These can all be deployed on the same physical server, providing us with reduced dependence on vendor’s proprietary solutions. 15. Reduced support overhead and power consumption of legacy applications by migrating these into the virtual environment. 3.4. Business Impact ARCAMIS provides the University of California with a proven case study of how to implement enterprise class IT infrastructures and operating models for the benefit of NIH funded clinical research at UCSF. We have accelerated the time from the bedside to the bench in clinical research by taking the IT infrastructure out of the clinical trials’ critical path, thereby providing a positive impact on our core business: preventing and curing human disease. ARCAMIS is more agile and responsive, having reduced server acquisition time to a matter of hours rather than weeks. ARCAMIS is significantly more secure and reliable, providing in the order of 99.998% technically architected uptime, and we’ve greatly improved the performance and utilization of our IT assets. We have created hundreds of thousands of dollars in measurable costs savings. ARCAMIS is environmentally friendly, significantly reducing our impact on environmental resources such as power and cooling. ARCAMIS is able to be used as a blue-print for enterprise class Clinical Research IT infrastructure services throughout the University of California, at partner research institutions and universities, and the National Institute of Health.
  • 15. Page 14 4. Technologies Utilized 4.1. The ARCAMIS Suite This suite of Academic Research Computing and Analysis Managed Infrastructure Services (ARCAMIS) includes the following technology components: 1. ITIL based, nationally consistent, labor specific, team IT operating model 2. Security model and architecture (including firewalls, intrusion detection, VPN, automated updates) 3. Enterprise class data center facilities 4. Tier 1, multi-homed, redundant, carrier diverse, networks 5. Virtual CPU, RAM, Network, and Disk resources based on Hewlett Packard Proliant servers, VMware Infrastructure Enterprise 3.01 and Network Appliance Storage Area Network (SAN) 6. Various 32 and 64 bit LINUX and Windows Operating Systems 7. Backup, archival and disaster recovery 8. Monitoring, alerting, and reporting 9. IT service management systems
  • 16. Page 15 4.2. ITIL Team Based Operating Model The ARCAMIS Operating Model is based on an ITIL best practices, nationally consistent, team based, and uses specificity of labor. Via our formal, documented, Infrastructure Lifecycle Process (ILCP) and support policies, procedures (SOPs), and support documentation such as operating guides and systems functional specifications, the ARCAMIS infrastructure evolves though its lifecycles of continuous improvement. Below are samples of the IT Policies and Procedures used. IT Policies
  • 17. Page 16 Standard Operating Procedures Our goal moving forward is to be a completely ITIL shop in the next 12 months. As you can see from the below organizational chart the ARCAMIS team is logically grouped into a prospective engineering team, and an administration and support team.
  • 18. Page 17 Organizational Chart Customer Engineer Laurel Heights/CB Executive Director, Information Technology Manager Customer Engineering Customer Engineer BEA/ITI Customer Engineer Parnassus Level 1 and 2 Support Team IT Office and Operations Manager Customer Engineer Parnassus Systems and Network Architect Systems and Network Engineer Server and Network Engineering Team Level 3 and 4 Support Systems and Network Engineer Customer Engineer Pittsburgh 4.3. Security Model and Architecture ARCAMIS is required to meet at minimum the Security Category and Level of MODERATE for Confidentiality, Integrity, and Availability as defined by the National Institute of Health. Compliance with this Security Category spans the entire organization from the initial Concept Proposal phase, through clinical trial design and approval, into trial operations where patient information is gathered, including data collection and specimen storage. Significant amounts of confidential, proprietary and unique patient data are collected, transferred, and stored in the ARCAMIS infrastructure for analysis and dissemination by approved parties. Certain parts of the infrastructure are able to satisfy HIPAA and 21 CFR Part 11 compliance. This becomes especially important as the ITN and EPGP organizations continue to innovate and develop new intellectual property which may have significant market value.
  • 19. Page 18 Information Security Category Requirements Exceeding the minimum compliance requirements with this Information Security Category is achieved by a holistic approach addressing all aspects of the ARCAMIS personnel, operations, physical locations, networks and systems. This includes tested, consistently executed, and audited plans, policies and procedures, and automated, monitored, and logged security technologies used on a day to day basis. The overall security posture of the ARCAMIS has many aspects including legal agreements with partners and employees, personnel background checks and training, organization wide disaster recovery plans, backup, systems and network security architectures (firewalls, intrusion detection systems, multiple levels of encryption, etc.), and detailed documentation requirements. Consistent with the NIH Application/System Security Plan (SSP) Template for Applications and General Support Systems and the US Department of Health and Human Services Official Information Security Program Policy (HHS IRM Policy 2004-002.001), ARCAMIS maintains a formal information systems security program to protect the organization’s information resources. This is called the Information Security and Information Technology Program (ISITP). ISITP delineates security controls into the four primary categories of management, operational, technical and standard operating procedures which structure the organization of the ISITP. - Management Policies focus on the management of information security systems and the management of risk for a system. They are techniques and concerns that are addressed by management, examples include: Capital Planning and Investment, and Risk Management. - Operational Policies address security methods focusing on mechanisms primarily implemented and executed by people (as opposed to systems). These controls are put in place to improve the security of a particular system (or group of systems), examples include: Acceptable Use, Personnel Separation, and Visitor Policies.
  • 20. Page 19 - Technical Policies focus on security policies that the computer system executes. The controls can provide automated protection for unauthorized access or misuse, facilitate detection of security violations, and support security requirements for applications and data, examples include: password requirements, automatic account lockout, and firewall policies. - Standard Operating Procedures (SOPs) focus on logistical procedures that staff do routinely to ensure ongoing compliance, examples include: IT Asset Assessment, Server and Network Support, and Systems Administration. Specifically, the ARCAMIS ISITP includes detailed definitions of the following Operational and Technical Security Policies. PERSONNEL SECURITY Background Investigations Rules of Behavior Disciplinary Action Acceptable Use Separation of Duties Least Privilege Security Education and Awareness Personnel Separation RESOURCE MANAGEMENT Provision of Resources Human Resources Infrastructure PHYSICAL SECURITY Physical Access Physical Security Visitor Policy MEDIA CONTROL Media Protection Media Marking Sanitization and Disposal of Information Input/Output Controls COMMUNICATIONS SECURITY Voice Communications Data Communications Video Teleconferencing Audio Teleconferencing Webcast Voice-Over Internet Protocol Facsimile WIRELESS COMMUNICATIONS SECURITY Wireless Local Area Network (LAN) Multifunctional Wireless Devices EQUIPMENT SECURITY Workstations Laptops and Other Portable Computing Devices Personally Owned Equipment and Software Hardware Security ENVIRONMENTAL SECURITY Fire Prevention Supporting Utilities DATA INTEGRITY Documentation NETWORK SECURITY POLICIES Remote Access and Dial-In Network Security Monitoring Firewall System-to-System Interconnection Internet Security SYSTEMS SECURITY POLICIES Identification Password Access Control Automatic Account Lockout Automatic Session Timeout Warning Banner Audit Trails Peer-to-Peer Communications Patch Management Cryptography Malicious Code Protection Product Assurance E-Mail Security Personal E-Mail Accounts
  • 21. Page 20 These policies serve as the foundation of the ARCAMIS Standard Operating Procedures and technical infrastructure architectures which when combined, create a secure environment based security best practices. Security Infrastructure Architecture To ensure a hardened Information Security and Information Technology environment, the ARCAMIS has centralized its critical Information Technology infrastructures into two Tier 1 data centers. Facilities include: Uninterruptible Power Supply via backup diesel generators that can keep servers running indefinitely without direct electric grid power. They are equipped with optimal environment controls, including sophisticated air conditioning and humidifier equipment as well as stringent physical security systems. They provide 24x7 Network Operations Center network monitoring and physical security. Each data center also includes fire suppression systems with water-free fire protection so as not to damage the servers. For secure data transport, ARCAMIS provides a carrier diverse, redundant, secure, reliable, Internet connected, high speed Local Area Network (LAN) and Wide Area Network (WAN). The ARCAMIS network and Virtual Private Network (VPN) is the foundation for all the ARCAMIS IT services and used by every ITN and EPGP stakeholder every day. The high speed WAN is protected by intrusion detection monitored and logged firewalls at all locations. Firewall and VPN services are provided by industry leading Microsoft and Cisco products. All network traffic between ITN sites, desktops, and partner organizations that travels over public networks is encrypted using at least 128-bit encryption using various security protocols including IPSec, SFTP, RDC, Kerberos, and others. We also implemented a wildcard based virtual certificate architecture for all port 443 communications, allowing rapid deployment of new secured services. Keeping these systems monitored and patched, ARCAMIS provides IP ping, SNMP MIB monitoring, specific service monitoring and automated restarts, hardware monitoring, intrusion detection monitoring, and website monitoring
  • 22. Page 21 of the ARCAMIS production server environment. Server and end-user security patches are applied monthly via Software Update Services. Application and LINUX/Macintosh patches are pushed out on a monthly basis. We have standardized on McAfee Anti-Virus for virus protection and use Postini for e-mail SPAM and Virus filtering. The ITN’s Authoritative Directory uses Microsoft Active Directory and is exposed via SOAP, RADIUS, and LDAP for cross platform authentication. The ITN is currently using an Enterprise Certificate Authority (ITNCA) for certificate based security authentication. Comprehensive Information Security The ITN has established mandatory policies, processes, controls, and procedures to ensure confidentiality, integrity, availability, reliability, and non-repudiation within the Organization’s infrastructure and its operations. It is the policy of ARCAMIS that the organization abides by or exceeds the requirements outlined in ITN Information Security and Information Technology Program, thereby exceeding the required Security Category and Level of MODERATE for Confidentiality, Integrity, and Availability outlined above. In addition, to ensure adequate security, ARCAMIS implements additional security policies exceeding the minimum requirement, as appropriate for our specific operational and risk environment as necessary. 4.4. Data Center Facilities The ITN has centralized its server architecture into two Tier 1 data centers. The first is located in Herndon, VA with Cogent Communications, and the second in San Francisco, CA with Level 3 Communications. An additional research data center is located at the UCSF QB3 facility. Physical access requires a badge and biometric hand security scanning, and the facilities have 24x7 security staff on-site. Each data center includes redundant uninterruptible power supplies and backup diesel generators that can keep each server running indefinitely without direct electric grid power. The centers provide active server and application monitoring, helping hands and backup media rotation capabilities. They are equipped with optimal environment
  • 23. Page 22 controls, including sophisticated air conditioning and humidifier equipment as well as stringent physical security systems. There are also waterless fire suppression systems. Power to our racks specifically is provided by four redundant, monitored PDUs which report exact power usage at a point in time and alert us if there is a power surge. Herndon, VA Rack Diagram G3 HP ProLiant ML570 UID 21 Channel 2Channel 2Channel 1 100 1 2 3 4 5 6 7 G3 HP ProLiant ML570 UID 21 Channel 2Channel 2Channel 1 100 1 2 3 4 5 6 7 G3 HP ProLiant ML570 UID 21 Channel 2Channel 2Channel 1 100 1 2 3 4 5 6 7 UID 1 2 SimplexDuplexchch21 0011 3322 4455Tape UID 1 2 SimplexDuplexchch21 0011 3322 4455Tape UID 1 2 SimplexDuplexchch21 0011 3322 4455Tape UID 1 2 SimplexDuplexchch21 0011 3322 4455Tape UID 1 2 SimplexDuplexchch21 0011 3322 4455Tape NetApp FAS 3020 activity status power NetApp FAS 3020 activity status power UID HP ProLiant DL320 G3 1 2 NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F UID HP ProLiant DL320 G3 1 2 UID HP ProLiant DL320 G3 1 2 NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F NetworkAppliance Power System Shelf ID Loop B Fault Loop A 72F DS14 MK2 FC NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance NetworkAppliance 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F 72F G3 HP ProLiant ML570 UID 21 Channel 2Channel 2Channel 1 100 1 2 3 4 5 6 7 G3 HP ProLiant ML570 UID 21 Channel 2Channel 2Channel 1 100 1 2 3 4 5 6 7
  • 24. Page 23 4.5. Internet Connectivity Servicing the ARCAMIS customer base is a carrier diverse, redundant, firewalled, reliable, Internet connected high speed network. This network combined with the Virtual Private Network (VPN) creates the foundation for all the ARCAMIS services provided. Internet connectivity is location dependant: • San Francisco, China Basin and Level 3. – Tier 1 1000mbs Ethernet connection to the Internet is provided by Cogent Networks. UCSF provides a 100mbs Ethernet connection to redundant 45mbs OC198 connections, and 100mbs Ethernet to between UCSF campuses. • San Francisco, Quantitative Biology III Data Center – UCSF network provides 1000mbs Ethernet connection to redundant 45mbs OC198 connections and 100mbs Ethernet to between UCSF campuses. • Herndon, VA – Tier 1 100mbs Ethernet connection to the Internet is provided by Cogent Networks. AT&T provides a 1.5mbs DSL backup connection. 4.6. Virtual CPU, RAM, Network, and Disk Resources ARCAMIS uses the Network Appliances Storage Area Network with a 25 TB HA Cluster in Herndon and a 25 TB disaster recovery site in San Francisco. This allows us to reduce cost and complexity via automation and operations efficiency. We can seamlessly control adds, removes, and updates without business interruption for our critical storage needs. We can more efficiently use what we already own and eliminate silos of underutilized memory, CPU, network, and storage. This improves business scalability and agility via accelerated service deployment and expansion on existing hardware assets. We can scale to tens of terabyte storage, not possible with a server based approach. Another key result of using this technology is risk mitigation. We architecturally automate the elimination of the possibility of critical data loss. We fully automate backup, archival, restore - productivity loss goes from days to minutes, to nothing, in event of user error or HW failure. We have technologically automated smooth business continuance in the event of a disaster. The increasing ARCAMIS data security and compliance requirements are able to be met with a SAN. We can handle HIPAA security
  • 25. Page 24 and compliance requirements. In our experience, storage availability determines service availability; automation guarantees service quality. VMware Virtual Infrastructure Enterprise 3.01 (VI3) is virtual infrastructure software for partitioning, consolidating and managing servers in mission- critical environments. Ideally suited for enterprise data centers, VI3 minimizes the total cost of ownership of computing infrastructure by increasing resource utilization and its hardware-independent virtual machines encapsulated in easy-to-manage files maximize administration flexibility. VMware ESX Server allows enterprises to: boost x86 server utilization to 60- 80%, provision new systems faster with reduced hardware; decouple application workloads from underlying physical hardware for increased flexibility; and dramatically lower the cost of business continuity. ESX Server supports 64-bit VMs with 16GB of RAM, meeting ARCAMIS’s expanding server computing requirements. Combining the SAN with server virtualization provides an extremely reliable, extensible, manageable, high availability architecture for ARCAMIS. The SAN provides instantaneous VM backups, restores and provisioning and off-site disaster recovery and archival. File restores are instantaneous, eliminating the need for human resource intensive and less reliable client side disk management applications. Adjusting to changing server requirements is instantaneous because of the SAN’s storage expansion and reduction capability for live volumes. Also, oversubscription allows the ITN to significantly more efficiently use the disk we already own. The SAN VMWare ESX combination provides excellent performance and reliability using both Fiber Channel and iSCSI Multi-pathing. VMs boot from the SAN are replicated locally and off-site while running. For certain applications we can create Highly Available Clustered Systems even greater than 99.998% uptime. Finally, server maintenance can be done during regular working hours without downtime due to support for VMotion, the ability to move a running VM from one physical machine to another.
  • 26. Page 25 4.7. Operating Systems Supported ARCAMIS supports several operating systems, including all flavors of LINUX, i386 Solaris, and all versions of the Windows operating system. 4.8. Backup, Archival, and Disaster Recovery ARCAMIS Data Availability, Backup, and Archival is provided by a Storage Area Network (SAN) with a 25 TB High Availability Cluster in Herndon and a 25 TB disaster recovery site in San Francisco. This SAN houses ARCAMIS critical clinical data and IT server data. The SAN automates backup, archival, and restore via NetApp SnapMirror, SnapBackup, and SnapRestore applications. All critical data at the San Francisco and Herndon sites are replicated to the other site within 1 hour. In the event of a major disaster at any ARCAMIS datacenter site, only minimal data (60 minutes) loss can occur and the critical server infrastructure can be failed over to the other coast’s facility for business continuance. In addition to the SAN, the ARCAMIS uses a 7 day incremental backup to offline disk rotation with monthly off-site stored archives for all production data based on Symantec Veritas software. 4.9. Monitoring, Alerting, and Reporting We use various monitoring and report technologies and two IT staff do full infrastructure monitoring audits twice daily, 5 days per week, once at 8:00am EST and again at 3:00pm PST. We use a 1-800, Priority 1 issue resolution line that pages and calls 5 senior engineers simultaneously in the event of a major system failure or issue. We use an on-call rotation schedule that changes weekly. We use the following technologies: Microsoft Operations Manager (MOM), WebWatchBot, Brocade Fabric Manager, NetApp Operations Manger, VMWARE Operations Manager, Cacti, and Oracle, among others. Cacti Disk Utilization Graph Below is a sample disk utilization graph.
  • 27. Page 26 Monitoring Table Below is a partial list of monitoring we do. Server 1 Network 1 Customer Defined Transaction Monitoring ODBC Database Query Verification Ping Monitoring SMTP Server and Account Monitoring POP3 Server and Account Monitoring FTP Upload/Download Verification File Existence and Content Monitoring Disk/Share Usage Monitoring Microsoft Performance Counters Microsoft Process Monitoring Microsoft Services Performance Monitoring Microsoft Services Availability Monitoring Event Log Monitoring HTTP/HTTPS URL Monitoring Customer Specified Port Monitoring Active Directory Exchange Intelligent Message Filter HP ProLiant Servers Microsoft .NET Framework Microsoft Baseline Security Analyzer Microsoft Exchange Server Best Practices Analyzer Microsoft Exchange Server Microsoft ISA Server Microsoft Network Load Balancing Microsoft Office Live Communications Server 2003 Microsoft Office Live Communications Server 2005 Microsoft Office Project Server Microsoft Office SharePoint Portal Server 2003 Microsoft Operations Manager MPNotifier Microsoft Operations Manager
  • 28. Page 27 Microsoft Password Change Notification Service Microsoft SQL Server Microsoft Web Sites and Services MP Microsoft Windows Base OS Microsoft Windows DFS Replication Microsoft Windows Distributed File Systems Microsoft Windows Distributed File Systems Microsoft Windows DHCP Microsoft Windows Group Policy Microsoft Windows Internet Information Services Microsoft Windows RRAS Microsoft Windows System Resource Manager Microsoft Windows Terminal Services Microsoft Windows Ultrasound NetApp Volume Utilization Global Status Indicator Hardware Event Log Visual Inspection Ambient Temperature Temperature Trending Location WAN Connectivity 4.10.IT Service Management Systems We use Remedy and Track-IT Enterprise for Ticketing, Asset Tracking, and Purchasing.
  • 29. Page 28 5. Implementation Timeframe 5.1. Project Timeline
  • 30. Page 29 6. Customer Testimonials “ARCAMIS provides services that allow the ITN knowledge workers to focus on answering the difficult scientific questions in immune tolerance; we don’t waste time on basic IT infrastructure functions. ARCAMIS allows me to be confident our research patient data is stored in a secure, reliable and responsive IT infrastructure. For example, last week we did a demonstration to the Network Executive Committee of our Informatics data management and collaboration portal in real-time. This included the National Institute of Health senior management responsible for our funding… it all worked perfectly. This entire application was built on ARCAMIS.” Jeffrey A. Bluestone, Ph.D. Director, UCSF Diabetes Center Director, Immune Tolerance Network A.W. and Mary Clausen Distinguished Professor of Medicine, Pathology, Microbiology and Immunology “With ARCAMIS we are well positioned to meet the rigorous IT requirements of an NIH funded study. Within weeks of project funding from the NIH, our entire secure research computing network and server infrastructure of more than 10 servers was built, our developers finished the public website, and we began work on the Patient Recruitment portal. That would have taken at least 6 months if I had to hire a team to procure and build it ourselves. Accelerating scientific progress in neurology is core to everything we do; ARCAMIS has been an important part of what we are currently doing.” Dr. Daniel H. Lowenstein, M.D. Professor of Neurology, UCSF and Director, Physician-Scientist Education and Training Programs Director, Epilepsy Phenome Genome Project “With the investment in ARCAMIS, UCSF and the ITN can confidently partner with other leading medical research universities across the country. At the ITN we depend on the on-demand, services based, scalable computing capacity of ARCAMIS every day to enable our collaborative data analysis and Informatics data visualization applications.” Mark Musen, Ph.D. Director, Medical Informatics Department Stanford University Deputy Director, Immune Tolerance Network
  • 31. Page 30 Appendices Appendix A – Capabilities Summary of the ARCAMIS Suite Fundamentals • 99.998% production solution uptime guaranteed via Service Level Agreement. • Managed multi-homed, Tier 1 network (Zero Downtime SLA) • High speed 1000mbs connectivity to UCSF network space. • Bi-costal world-class data centers hosted with Level 3 and Cogent communications with redundant power and HVAC systems • Managed DNS or use UCSF DNS • Managed Active Directory for “Production Servers” and integration with UCSF CAMPUS AD via trust. • Phone, e-mail and web based ticketing system to track all issues • Mature purchasing services with purchases charged to correct account Monitoring & Issue Response • 8am EST to 5pm PST business day access to live support personnel • 24/7/365 with one primary “on call” engineer, paging off hours access, with a 1-800 P1 issue number that rings 5 infrastructure engineers simultaneously. • Microsoft Operations Manager monitoring (CPU, RAM, disk, event log, ping, ports and services) • Application script response monitoring for web applications, including SSL via WebWatchBot 5 • HP Remote Insight Manager hardware monitoring with 4 hour vendor response on all servers • NetApp corporate monitoring and 4 hour time to resolution will fully stocked parts depot on Storage Area Network. • 24x7 staffed datacenters with secure physical access to all servers • 24x7 staffed Network Operating Center for WAN
  • 32. Page 31 • Notification preferences and standard response specifications can be customized Backup, Restore and Disaster Recovery/Business Continuance • Symantec Backup Exec server agents for Oracle, SQL, MySQL, and Exchange servers with 7 nightly incremental backups. • 14 local daily snapshots of full “crash consistent” server state • Hourly off-site snapshots of full “crash consistent” server state with 40 hourly restore points for DR • Monthly archive of entire infrastructure, that rolls to quarterly after 3 months. Reporting • Online Ticketing • Detailed Backup Utilization • Bandwidth Utilization • Infrastructure uptime reports • CPU, RAM, Network, and Disk utilization reports Server & Device Administration • Customized Specifications using VMWare Infrastructure 3.01 technology up to 4 64-Bit, 3.0Ghz Intel Xeon Processors, 16GB RAM, 1gbs Network with 2TB disk volumes max. • Based on HP Proliant Enterprise servers. ML570 8 processors per server and DL380 series. 7000c Blade servers • IP everywhere, full remote management of every device, including full KVM via separate backLAN network. • Microsoft MCCA licensing on key server components, • Full license and asset tracking • Senior System Administrator troubleshooting • Optional high availability (99.999% uptime) server capabilities via Veritas and Microsoft Clustering Managed Security
  • 33. Page 32 • Automated OS and major application patching • Managed Network-based Intrusion Detection • Managed policy based enterprise firewall using Cisco and Microsoft technologies • Managed VPN access
  • 34. Page 33 Appendix B – Excerpt from the ARCAMIS Systems Functional Specification Centralized Virtual Infrastructure Administration ARCAMIS can move virtual machines between hosts, create new machines from pre- built templates, and control existing virtual machine configurations. We also can gather event log information from a central location for all VMware hosts; have an increased ability to identify asset utilization and troubleshoot warnings prior to problems occurring; have easier management of physical system bios updates and firmware upgrades; and have centralized management of all virtual machines within the network. The Virtual Center management interface allows us to centrally manage and monitor our entire physical and virtual infrastructure from one place: Hosts Clusters and Resource Pools: By organizing physical hosts into clusters of two or more, we are able to distribute the aggregate resources as if it were one physical host. For example a single server might be configured with 4 dual core 2.7 GHz processors and 24 GB of RAM. By clustering two servers together, the resources are presented as 21 GHz and 48 GB of RAM which can be provisioned as needed to multiple guests.
  • 35. Page 34 DRS and VMotion: VMotion enables us to migrate live servers from one physical host to another which allows for physical host maintenance to be performed with no impact to production service uptime. Dynamic Resource Scheduling (DRS) is used to set different resource allocation policies for different classes of services which are automatically monitored and enforced using the aggregate resources of the cluster.