The experiences of migrating a large scale, high performance healthcare network Larry Williams Corporate Manager, Partners HealthCare
In the next half hour… Partners Healthcare System overview Caché platform architecture & metrics The need to migrate Phased migration approach Benchmark testing and results Discoveries and production enhancements
Partners Healthcare System Founded in 1994  Brigham & Women’s Hospital  Massachusetts General Hospital Now includes: Community physician network (1200 + 3500 MD’s) PCHi 3 community hospitals 2 rehab hospitals 3 specialty institutions  Enterprise-wide Information Systems 1100 employees Annual budget FY05 approximately $160 million
Anchor Hospitals & Airport BWH MGH Logan Airport 10 km 6 km
Acute Care Hospitals MGH BWH Newton- Wellesley Community Physician Practices
Partners Domain Devices Internet 12,000 Printers 32,000 Desktops Firewall ~30,000 other devices 1,450 Servers Closely Managed Assumed Managed
Windows Production Architecture 3.5 TB
Enterprise Integration Over 30% are to and from Caché database Change from prior year Daily  Average Est. Annual Transactions # of Interfaces 196 170 192 167 37% 4,659,035 1,330,962,017 2007 40% 3,399,211 1,240,712,044 2006 45% 2,431,917 887,649,802 2005 1,673,515 610,833,080 2004
Integration Components
Gigabytes in Use
Annual Database Growth Rate
Database Utilization Average Database References per day in Billions
The Need to Migrate - Availability Monthly Downtime Current  State Business  need
Additional Business Requirements Increase availability and reliability Decrease database risk from 5 single points of failure More robust hardware and OS Many less servers and OS instances to manage Clustering and automated failover  Reduce monthly maintenance needs, updates once or twice per year -------------------------------------------------------- Improve Performance  64 bit OS, more memory for cache Caché 5.0.20 to Caché 2008.1, significantly improved  ECP performance Increase Scalability 91 Terabytes available on EMC SAN DMX3 On-demand addition of processor cores
Caché Migration Decision Making Process Only considered first tier vendors and support  (IBM, HP) HP assumed much more risk with Professional  Services Existing HP business yields more leverage & visibility with regional office More headroom in HP configuration Price was not a distinguishing factor
Phased migration approach Proof of Concept (benchmark testing) Completed 10/15/07 Phase 1 – Database tier 4 of 5 servers migrated, anticipated completion 4/14/08 Phase 2 – Application tier Big Bang migration 12/14/08 Phase 3 – Disaster Recovery January 2009
UNIX Benchmark Environment
Database Benchmark Load Testing Results Goals  Simulate current Production user counts & transaction loads Verify support for load increases up to 300% Benchmark Environment Isolated LAN, new DMX3 SAN 20 new Windows blade servers (10 app servers, 10 script ‘players’) Scripts for 8 apps (represent heaviest use, Web/Telnet/VB apps) 2 batch jobs (screensaver simulation, NullGen LMR functions) Conclusions Able to simulate production load, 1.5x and 3x load 2 HP rx8640 can handle growth projections 0.66 0.15 0.32 LMR avg Caché app time (in sec.) 40,000 40,000 11,806 LMR transactions (5 min. period) 135,000 30,000 35,000 Database Global Refs / sec. Benchmark full script load Benchmark “paced” script load Production peak (8/21, 11:20 am) Metric
Design and Configuration Considerations Database configuration simulation testing 1 to 5 Caché database instances were assessed 1 vs. 5 ECP channels per Caché instance were assessed Number of active cores were accessed (4 active, 2  reserved) Results and unexpected discoveries Identify 5 Caché database instance as optimal design configuration Journal synch bottleneck the biggest issue  High Transaction Journal deamon maintains ECP durability to guarantee transaction (1 per Caché instance) Maintain same data distribution across 5 DB instances Determine 1 ECP channel per instance optimal Additional channels did not improve throughput, still have only 1 Journal Deamon
Benchmark Discoveries led to Production Improvements References to  Undefined  globals using $Data and $Get  These commands require network round trip Use of $increment  Each call to $I requires network round trip Excessive use of Cache locks  Forces more than 1 round trip Use of large strings  Strings that require more than 3900–4000 bytes to represent the string value are big strings and never cached on the ECP client.   Lesson Learned -  Each trip to the database server results in overhead caused by a Journal Synch.  Increasing the Journal Synch rate causes bottlenecks in the ECP channel which increase the risk of long transactions .
75% reduction in long running transaction
Phased Migration Approach
Monthly Average Caché Web Transaction Time
Application Models Old   New Browser  client Web server Cache Cache VB client .Net server Cache Cache .Net client Browser  client Web server Cache Web Services Browser  client .Net client Scalability/Connection pooling, robustness/error handling, Vism Managed Obj. Vism.ocx Managed Obj. Cache Web Services WebLink
The experiences of migrating a large scale, high performance healthcare network Larry Williams Corporate Manager, Partners HealthCare

The experiences of migrating a large scale, high performance healthcare network

  • 1.
    The experiences ofmigrating a large scale, high performance healthcare network Larry Williams Corporate Manager, Partners HealthCare
  • 2.
    In the nexthalf hour… Partners Healthcare System overview Caché platform architecture & metrics The need to migrate Phased migration approach Benchmark testing and results Discoveries and production enhancements
  • 3.
    Partners Healthcare SystemFounded in 1994 Brigham & Women’s Hospital Massachusetts General Hospital Now includes: Community physician network (1200 + 3500 MD’s) PCHi 3 community hospitals 2 rehab hospitals 3 specialty institutions Enterprise-wide Information Systems 1100 employees Annual budget FY05 approximately $160 million
  • 4.
    Anchor Hospitals &Airport BWH MGH Logan Airport 10 km 6 km
  • 5.
    Acute Care HospitalsMGH BWH Newton- Wellesley Community Physician Practices
  • 6.
    Partners Domain DevicesInternet 12,000 Printers 32,000 Desktops Firewall ~30,000 other devices 1,450 Servers Closely Managed Assumed Managed
  • 7.
  • 8.
    Enterprise Integration Over30% are to and from Caché database Change from prior year Daily Average Est. Annual Transactions # of Interfaces 196 170 192 167 37% 4,659,035 1,330,962,017 2007 40% 3,399,211 1,240,712,044 2006 45% 2,431,917 887,649,802 2005 1,673,515 610,833,080 2004
  • 9.
  • 10.
  • 11.
  • 12.
    Database Utilization AverageDatabase References per day in Billions
  • 13.
    The Need toMigrate - Availability Monthly Downtime Current State Business need
  • 14.
    Additional Business RequirementsIncrease availability and reliability Decrease database risk from 5 single points of failure More robust hardware and OS Many less servers and OS instances to manage Clustering and automated failover Reduce monthly maintenance needs, updates once or twice per year -------------------------------------------------------- Improve Performance 64 bit OS, more memory for cache Caché 5.0.20 to Caché 2008.1, significantly improved ECP performance Increase Scalability 91 Terabytes available on EMC SAN DMX3 On-demand addition of processor cores
  • 15.
    Caché Migration DecisionMaking Process Only considered first tier vendors and support (IBM, HP) HP assumed much more risk with Professional Services Existing HP business yields more leverage & visibility with regional office More headroom in HP configuration Price was not a distinguishing factor
  • 16.
    Phased migration approachProof of Concept (benchmark testing) Completed 10/15/07 Phase 1 – Database tier 4 of 5 servers migrated, anticipated completion 4/14/08 Phase 2 – Application tier Big Bang migration 12/14/08 Phase 3 – Disaster Recovery January 2009
  • 17.
  • 18.
    Database Benchmark LoadTesting Results Goals Simulate current Production user counts & transaction loads Verify support for load increases up to 300% Benchmark Environment Isolated LAN, new DMX3 SAN 20 new Windows blade servers (10 app servers, 10 script ‘players’) Scripts for 8 apps (represent heaviest use, Web/Telnet/VB apps) 2 batch jobs (screensaver simulation, NullGen LMR functions) Conclusions Able to simulate production load, 1.5x and 3x load 2 HP rx8640 can handle growth projections 0.66 0.15 0.32 LMR avg Caché app time (in sec.) 40,000 40,000 11,806 LMR transactions (5 min. period) 135,000 30,000 35,000 Database Global Refs / sec. Benchmark full script load Benchmark “paced” script load Production peak (8/21, 11:20 am) Metric
  • 19.
    Design and ConfigurationConsiderations Database configuration simulation testing 1 to 5 Caché database instances were assessed 1 vs. 5 ECP channels per Caché instance were assessed Number of active cores were accessed (4 active, 2 reserved) Results and unexpected discoveries Identify 5 Caché database instance as optimal design configuration Journal synch bottleneck the biggest issue High Transaction Journal deamon maintains ECP durability to guarantee transaction (1 per Caché instance) Maintain same data distribution across 5 DB instances Determine 1 ECP channel per instance optimal Additional channels did not improve throughput, still have only 1 Journal Deamon
  • 20.
    Benchmark Discoveries ledto Production Improvements References to Undefined globals using $Data and $Get  These commands require network round trip Use of $increment Each call to $I requires network round trip Excessive use of Cache locks Forces more than 1 round trip Use of large strings Strings that require more than 3900–4000 bytes to represent the string value are big strings and never cached on the ECP client. Lesson Learned - Each trip to the database server results in overhead caused by a Journal Synch.  Increasing the Journal Synch rate causes bottlenecks in the ECP channel which increase the risk of long transactions .
  • 21.
    75% reduction inlong running transaction
  • 22.
  • 23.
    Monthly Average CachéWeb Transaction Time
  • 24.
    Application Models Old New Browser client Web server Cache Cache VB client .Net server Cache Cache .Net client Browser client Web server Cache Web Services Browser client .Net client Scalability/Connection pooling, robustness/error handling, Vism Managed Obj. Vism.ocx Managed Obj. Cache Web Services WebLink
  • 25.
    The experiences ofmigrating a large scale, high performance healthcare network Larry Williams Corporate Manager, Partners HealthCare