The experiences of migrating a large scale, high performance healthcare network


Published on

Lessons learned from migrating a large scale healthcare application from Windows to Unix at Partners Healthcare in Boston MA.

Published in: Health & Medicine, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • The experiences of migrating a large scale, high performance healthcare network

    1. 1. <ul><li>The experiences of migrating a large scale, high performance healthcare network </li></ul><ul><li>Larry Williams </li></ul><ul><li>Corporate Manager, Partners HealthCare </li></ul>
    2. 2. In the next half hour… <ul><li>Partners Healthcare System overview </li></ul><ul><li>Caché platform architecture & metrics </li></ul><ul><li>The need to migrate </li></ul><ul><li>Phased migration approach </li></ul><ul><li>Benchmark testing and results </li></ul><ul><li>Discoveries and production enhancements </li></ul>
    3. 3. Partners Healthcare System <ul><li>Founded in 1994 </li></ul><ul><ul><li>Brigham & Women’s Hospital </li></ul></ul><ul><ul><li>Massachusetts General Hospital </li></ul></ul><ul><li>Now includes: </li></ul><ul><ul><li>Community physician network (1200 + 3500 MD’s) PCHi </li></ul></ul><ul><ul><li>3 community hospitals </li></ul></ul><ul><ul><li>2 rehab hospitals </li></ul></ul><ul><ul><li>3 specialty institutions </li></ul></ul><ul><li>Enterprise-wide Information Systems </li></ul><ul><ul><li>1100 employees </li></ul></ul><ul><ul><li>Annual budget FY05 approximately $160 million </li></ul></ul>
    4. 4. Anchor Hospitals & Airport BWH MGH Logan Airport 10 km 6 km
    5. 5. Acute Care Hospitals MGH BWH Newton- Wellesley Community Physician Practices
    6. 6. Partners Domain Devices Internet 12,000 Printers 32,000 Desktops Firewall ~30,000 other devices 1,450 Servers Closely Managed Assumed Managed
    7. 7. Windows Production Architecture 3.5 TB
    8. 8. Enterprise Integration Over 30% are to and from Caché database Change from prior year Daily Average Est. Annual Transactions # of Interfaces 196 170 192 167 37% 4,659,035 1,330,962,017 2007 40% 3,399,211 1,240,712,044 2006 45% 2,431,917 887,649,802 2005 1,673,515 610,833,080 2004
    9. 9. Integration Components
    10. 10. Gigabytes in Use
    11. 11. Annual Database Growth Rate
    12. 12. Database Utilization Average Database References per day in Billions
    13. 13. The Need to Migrate - Availability Monthly Downtime Current State Business need
    14. 14. Additional Business Requirements <ul><li>Increase availability and reliability </li></ul><ul><ul><li>Decrease database risk from 5 single points of failure </li></ul></ul><ul><ul><li>More robust hardware and OS </li></ul></ul><ul><ul><li>Many less servers and OS instances to manage </li></ul></ul><ul><ul><li>Clustering and automated failover </li></ul></ul><ul><ul><li>Reduce monthly maintenance needs, updates once or twice per year </li></ul></ul><ul><li>-------------------------------------------------------- </li></ul><ul><li>Improve Performance </li></ul><ul><ul><li>64 bit OS, more memory for cache </li></ul></ul><ul><ul><li>Caché 5.0.20 to Caché 2008.1, significantly improved ECP performance </li></ul></ul><ul><li>Increase Scalability </li></ul><ul><ul><li>91 Terabytes available on EMC SAN DMX3 </li></ul></ul><ul><ul><li>On-demand addition of processor cores </li></ul></ul>
    15. 15. Caché Migration Decision Making Process <ul><li>Only considered first tier vendors and support (IBM, HP) </li></ul><ul><li>HP assumed much more risk with Professional Services </li></ul><ul><li>Existing HP business yields more leverage & visibility with regional office </li></ul><ul><li>More headroom in HP configuration </li></ul><ul><li>Price was not a distinguishing factor </li></ul>
    16. 16. Phased migration approach <ul><li>Proof of Concept (benchmark testing) </li></ul><ul><ul><li>Completed 10/15/07 </li></ul></ul><ul><li>Phase 1 – Database tier </li></ul><ul><ul><li>4 of 5 servers migrated, anticipated completion 4/14/08 </li></ul></ul><ul><li>Phase 2 – Application tier </li></ul><ul><ul><li>Big Bang migration 12/14/08 </li></ul></ul><ul><li>Phase 3 – Disaster Recovery </li></ul><ul><ul><li>January 2009 </li></ul></ul>
    17. 17. UNIX Benchmark Environment
    18. 18. Database Benchmark Load Testing Results <ul><li>Goals </li></ul><ul><ul><li>Simulate current Production user counts & transaction loads </li></ul></ul><ul><ul><li>Verify support for load increases up to 300% </li></ul></ul><ul><li>Benchmark Environment </li></ul><ul><ul><li>Isolated LAN, new DMX3 SAN </li></ul></ul><ul><ul><li>20 new Windows blade servers (10 app servers, 10 script ‘players’) </li></ul></ul><ul><ul><li>Scripts for 8 apps (represent heaviest use, Web/Telnet/VB apps) </li></ul></ul><ul><ul><li>2 batch jobs (screensaver simulation, NullGen LMR functions) </li></ul></ul><ul><li>Conclusions </li></ul><ul><ul><li>Able to simulate production load, 1.5x and 3x load </li></ul></ul><ul><ul><li>2 HP rx8640 can handle growth projections </li></ul></ul>0.66 0.15 0.32 LMR avg Caché app time (in sec.) 40,000 40,000 11,806 LMR transactions (5 min. period) 135,000 30,000 35,000 Database Global Refs / sec. Benchmark full script load Benchmark “paced” script load Production peak (8/21, 11:20 am) Metric
    19. 19. Design and Configuration Considerations <ul><li>Database configuration simulation testing </li></ul><ul><ul><li>1 to 5 Caché database instances were assessed </li></ul></ul><ul><ul><li>1 vs. 5 ECP channels per Caché instance were assessed </li></ul></ul><ul><ul><li>Number of active cores were accessed (4 active, 2 reserved) </li></ul></ul><ul><li>Results and unexpected discoveries </li></ul><ul><ul><li>Identify 5 Caché database instance as optimal design configuration </li></ul></ul><ul><ul><ul><li>Journal synch bottleneck the biggest issue </li></ul></ul></ul><ul><ul><ul><ul><li>High Transaction Journal deamon maintains ECP durability to guarantee transaction (1 per Caché instance) </li></ul></ul></ul></ul><ul><ul><ul><li>Maintain same data distribution across 5 DB instances </li></ul></ul></ul><ul><ul><li>Determine 1 ECP channel per instance optimal </li></ul></ul><ul><ul><ul><li>Additional channels did not improve throughput, still have only 1 Journal Deamon </li></ul></ul></ul>
    20. 20. Benchmark Discoveries led to Production Improvements <ul><li>References to Undefined globals using $Data and $Get  </li></ul><ul><ul><li>These commands require network round trip </li></ul></ul><ul><li>Use of $increment </li></ul><ul><ul><li>Each call to $I requires network round trip </li></ul></ul><ul><li>Excessive use of Cache locks </li></ul><ul><ul><li>Forces more than 1 round trip </li></ul></ul><ul><li>Use of large strings </li></ul><ul><ul><li>Strings that require more than 3900–4000 bytes to represent the string value are big strings and never cached on the ECP client. </li></ul></ul><ul><li>Lesson Learned - Each trip to the database server results in overhead caused by a Journal Synch.  Increasing the Journal Synch rate causes bottlenecks in the ECP channel which increase the risk of long transactions . </li></ul>
    21. 21. 75% reduction in long running transaction
    22. 22. Phased Migration Approach
    23. 23. Monthly Average Caché Web Transaction Time
    24. 24. Application Models <ul><li>Old New </li></ul>Browser client Web server Cache Cache VB client .Net server Cache Cache .Net client Browser client Web server Cache Web Services Browser client .Net client Scalability/Connection pooling, robustness/error handling, Vism Managed Obj. Vism.ocx Managed Obj. Cache Web Services WebLink
    25. 25. The experiences of migrating a large scale, high performance healthcare network Larry Williams Corporate Manager, Partners HealthCare