Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to MongoDB

Presented by Michael Poremba, Director of Data Architecture, Practice Fusion.

Michael will share technical insights and key lessons learned by the Practice Fusion audit team, which can be leveraged for your own MongoDB projects. Key topics include: security, schema design, scalability, data migration, disaster recovery, and organizing the team of technical experts.

Practice Fusion decided to move the primary audit system for their Electronic Health Record (EHR) from SQL Server to MongoDB. The goal was to provide a scale-out data store for a system that was under IO pressure. The project required moving 4 TB of production data from a traditional database schema into a document data store while the system was in operation with a peak transaction throughput of up to 1,000 writes per second. The audit system is a tier-1 component mandated by law, and any interruption in the availability of the audit system would result in a system outage.

BIO:

Michael Poremba is Director of Data Architecture at Practice Fusion, a cloud-based electronic health records (EHR) service used by over 100,000 health care providers. He has over 20 years of experience in software engineering with expertise in architecture, performance, and scalability of transactional database applications.

Practice Fusion & MongoDB: Transitioning a 4 TB Audit Log from SQL Server to MongoDB

  1. 1. 4 TB Audit Log from SQL Server to MongoDB Michael Poremba Director, Data Architecture Practice Fusion May 2015
  2. 2. + 20 years software engineering + Data architect / application architect + High-volume OLTP relational databases + Application performance and scalability + Domain experience: Health care; financial services; IT management; content management and distribution; targeted advertising; telecom billing; manufacturing; insurance Michael Poremba @ Practice Fusion 2
  3. 3. Project Background Getting started 3
  4. 4. + Cloud-based electronic health records service (EHR) + Over 100,000 health care providers in US + Over 100,000,000 patient medical records + SQL Server OLTP database  Weekday peak ~ 60,000 transactions per second + Primary database = 8 TB + 50% of primary database is security audit records + indexes Practice Fusion 4
  5. 5. + HIPAA: Health Insurance Portability and Accountability Act of 1996 + Who did what to which patient’s medical record when? + Regulatory requirement—audit log must be kept and reviewed + Law enforcement and evidence in legal discovery + Save the audit log forever + Primary use cases:  Audit report in EHR: Security audit log viewer  Physician data analytics: Clinical quality measures (CQM) HIPAA Security Audit Log 5
  6. 6. 6
  7. 7. HIPAA Security Auditing on MongoDB Project anatomy & lessons learned 7
  8. 8. + Latency on SAN increased + Database writes slowed down + Database connections held longer + Connection pool expanded + User interface locked up—waiting + Users tried to log in again + Login is heaviest user operation + [Repeat] The Log Jam Found at: http://anchorhardwoods.com/wp-content/uploads/2011/08/log-jam.jpg 8
  9. 9. Security Auditing – Legacy Architecture Public Load Balancer App 1 App 2 App n . . . EHR (OLTP DB) ActivityFeed ActivityFeedParameter 2..10 CQM Reporting ETL Audit Report 9
  10. 10. Audit Service – New Architecture Public Load Balancer App 1 App 2 App n . . . MongoDB Audit Log Audit Service AMQ Queue Listener Audit Report CQM Reporting ETL 10
  11. 11. + Isolate auditing system from EHR OLTP database + Move audit IO off of EHR SAN to AWS + New service interface for audit events using .NET + Scale out audit service interface on IIS farm + Scale out audit data store using MongoDB Technical Benefits of New Architecture 11
  12. 12. + Transaction volume: Sustain 1,000 new documents per second + Data volume: Scale to 10’s of billions of audit event records + High availability and disaster recovery—higher SLA than EHR + Quick UI response time for interactive audit report + Tamper prevention and detection  No updates or deletes permitted on audit log  Security alerts when audit log is altered + Leverage industry standards for health care security audit logging  ~300 distinct auditable user actions  Required and varying data elements Security Auditing – Application Requirements 12
  13. 13. Project Objectives + New infrastructure for MongoDB and AMQ + Modernize audit service API + Convert ~200 audit events to new audit service interface + Data warehouse ETL from MongoDB + Modernize audit report UI + Migrate 4 billion exiting audit records Project: Audit 2.0 Colette program management Ernest services expert Bhavik test engineering Jay MongoDB expert Jeff cluster architecture Michael data architecture Brett AMQ expert Bryan infrastructure coordination Rajani data warehouse ETL 13
  14. 14. Audit Event Participant Object Audit System User 0..n1..1 1..2 Health Care Industry Standards for Audit Logging + ISO 27789:2013: Health Informatics – Audit trails for electronic health records + ASTM E2147-01(2013): Standard Specification for Audit Disclosure Logs for Use in Health Information Systems + FHIR SecurityEvent – resource definition for auditing 14
  15. 15. { "_id" : <BinaryData(4)>, // The audit event GUID "docHash" : <String; Required>, // Tamper detection "audOrgGuid" : <BinaryData(4); Required>, // Shard key "crtdDttmUtc" : <Date; Required>, // Datetime record was inserted "evnt" : {// Required subdocument "dttmUtc" : <Date; Required>, // Date/time that event occurred "typ" : <String; Required>, // Event record type; ~ 300 types "ptDataTyp" : <String; Required>, // Standard set of patient data types "actn" : <String; Required>, // Standard set of actions "sys" : <String; Required> // Source system for audit event }, "usr" : { // Required subdocument "usrId" : <String; Required>, // Human-readable ID "usrGuid" : <BinaryData(4); Required>, // Machine-readable ID "dispNm" : <String; Required>, // Required; Display name for user "orgId" : <String; Required>, "orgNm" : <String; Required> }, "altUsr" : { // Optional subdocument for second user ... // Subdocument contains same properties as "usr" }, "pt" : { // Optional subdocument "ptId" : <String; Required>, // Human-readable ID for patient "ptPracGuid" : <BinaryData(4); Required>, // Machine-readable ID for patient "dispNm" : <String; Required>, // Display name for patient "orgId" : <String; Required>, "orgNm" : <String; Required> }, "body" : { // Optional subdocument ... // Flattened list of attributes, specific to audit event subtype } } JSON Document Schema for Audit Events Audit Event Participant Object Audit System User 0..n1..1 1..2 15
  16. 16. Schema Design – Lessons Learned + Prop nms strd per doc  Long names add up for large collections (ours: 1 TB)  Consider using abbreviated property names  Up-vote this feature request: https://jira.mongodb.org/browse/SERVER-863 + Know your application read/write patterns + Application responsible for data integrity + Be aware of data type behaviors  Indexed string search is case sensitive. Upvote: https://jira.mongodb.org/browse/SERVER-90  Several binary data types for UUID—use type 4 (default type is specific to database driver) Found at: http://www.milesfinchinnovation.com/blog/wp- content/uploads/2013/02/iStock_000019474446Medium.jpg 16
  17. 17. Schema Design – Lessons Learned Leverage native data types: + Date + Boolean + Numeric  "1" + "1"  "11"  "11" + "1"  "111" + UUID  "8c290139-f4e3-49c1-9ba2-a883defc6a15"  "8C290139-F4E3-49C1-9BA2-A883DEFC6A15"  "8c29-0139-f4e3-49c1-9ba2-a883-defc-6a15"  "8c290139f4e349c19ba2a883defc6a15"  "{8c290139-f4e3-49c1-9ba2-a883defc6a15}"  "{8C290139-F4E3-49C1-9BA2-A883DEFC6A15}" Found at: http://www.industryweek.com/innovation/innovation-one-size-fits-one 17
  18. 18. Activity Feed Audit Event Type Activity Feed Parameter Action Type Patient Data Type (~300) (~4 billion) (~30 billion) (10) (18) UserPatient (~100,000)(~100 million) Practice (~50,000) Legacy Auditing System – Relational Schema Issues around data normalization + New requirements introduced + Filter criteria and sort criteria stored in five different tables + Audit events must be read into memory for filtering and sorting  Join and expand data set by practice  Sort and filter expanded data set + Response time suffers for large practices with many audit events 18
  19. 19. Schema Design – Lessons Learned Activity Feed Audit Event Type Activity Feed Parameter Action Type Patient Data Type UserPatient Practice Denormalize with care: { "_id" : <BinaryData(4)>, "docHash" : <String; Required>, "audOrgGuid" : <BinaryData(4); Required>, "crtdDttmUtc" : <Date; Required>, "evnt" : { "dttmUtc" : <Date; Required>, "typ" : <String; Required>, "ptDataTyp" : <String; Required>, "actn" : <String; Required>, "sys" : <String; Required> }, "usr" : { "usrId" : <String; Required>, "usrGuid" : <BinaryData(4); Required>, "dispNm" : <String; Required>, "orgId" : <String; Required>, "orgNm" : <String; Required> }, "pt" : { "ptId" : <String; Required>, "ptPracGuid" : <BinaryData(4); Required>, "dispNm" : <String; Required>, "orgId" : <String; Required>, "orgNm" : <String; Required> }, "body" : { ... } } 19
  20. 20. + Millions of audit events per medical practice + Require fast response time for interactive audit report UI + Audit report UI allows events to be sorted/filtered five different ways + UI allows paging through audit events + Create a secondary index for each sort method Index Design 20
  21. 21. + Organization, event date DESC db.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.dttmUtc": -1} ); + Organization, patient, event date DESC db.auditEvent.ensureIndex ( {"audOrgGuid": 1, "pt.ptId": 1, "evnt.dttmUtc": -1 } ); + Organization, user, event date DESC db.auditEvent.ensureIndex ( {"audOrgGuid": 1, "usr.usrId": 1, "evnt.dttmUtc": -1 } ); + Organization, patient data type, event date DESC db.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.ptDataTyp": 1, "evnt.dttmUtc": -1 } ); + Organization, user action type, event date DESC db.auditEvent.ensureIndex ( {"audOrgGuid": 1, "evnt.actn": 1, "evnt.dttmUtc": -1} ); + Document created date DESC db.auditEvent.ensureIndex ( {"crtdDttmUtc": -1 } ); Index Definitions 21
  22. 22. + Filter by practice GUID + Sort by event created date time, descending order + Limit to 20 documents db.auditEvent.find( {"audOrgGuid": BinData(4,"ABrlAG57Rx6gY3zyHzFK3Q==")} ) .sort( {"evnt.dttmUtc" : -1} ).limit(20).explain(); { "clusteredType" : "ParallelSort", "shards" : { "RepSet02/MNGODDB03-SHRD02:27018, MNGODDB04-SHRD02:27018" : [ { "cursor" : "BtreeCursor auditEvent_audOrgGuid_dttmUtc", ... } ] } ... "numshards" : 1, ... Query Plan 22
  23. 23. Indexing Strategy – Lessons Learned + As with relational databases, indexes are essential for efficient queries + Learn how to use .explain() to read query plans + Avoid collection scans "cursor" : "BasicCursor" + For compound indexes, query sort order must match index sort order + Enable mongod --notablescan option in test / staging environments Found at: http://www.ebay.com/itm/13-pc-Hex-Shank-Titanium-Drill-Bit-Set-Quick-Change- Bits-/350526103504?pt=LH_DefaultDomain_0&hash=item519cfbdfd0 23
  24. 24. Principle of least privilege + MongoDB cluster not accessible from public Internet + Security enabled on cluster + Application users granted minimum permissions required Signed audit events + Audit events signed with hash of audit event contents + Recompute hash on reads—test the data against hash value + Send security alert when hash does not match Oplog monitoring + Use mongo-connector Python scripts to monitor oplog + Watch for .update() and .delete() operations on collection + Send security alert when data changes are detected Tamper Prevention and Detection Found at: http://legacymedia.localworld.co.uk/275663/ Article/images/17639732/4416792.jpg 24
  25. 25. Security – Lessons Learned + Minimize network access to MongoDB cluster + Enable authentication + Leverage role-based authorization + Use SSL (MongoDB Enterprise) + Disable REST interface and HTTP status interface Found at: http://www.harborfreight.com/3-1-2-half-inch-circular-padlock-98972.html 25
  26. 26. + Shard the database to scale out + Begin with small number of shards (2 or 3) + Group all audit events from the same medical practice  Every audit event is “owned” by some practice  Audit report UI always queries events by medical practice + Composite shard key on { PracticeGuid, _id } db.runCommand({ shardcollection : "AuditLog.auditEvent", key: {audOrgGuid: 1, _id: 1}}); Transaction Volume: 1,000 New Documents per Second Found at: http://s3.amazonaws.com/Reconsales/800/0bfe72e0- 9b06-42ac-9644-5727a3ca9c79.jpg 26
  27. 27. Sharding the Database – Lessons Learned + At the onset of development determine whether to shard + Specify shard key in queries  Allows mongos to route query  Minimize distributed “scatter/gather” queries  Queries spanning chunks likely span shards + Choose a key that allows even balancing  Balancing is performed in 32 MB chunks  Design shard key to ensure chunks will not exceed 32 MB Found at: http://www.airbrushaction.com/content/sites/default/files/tipstricks-images/4_27.png 27
  28. 28. High Availability and Disaster Recovery – Replica Sets + If audit log is down, then 100,000 health care providers are idle + Audit logging subsystem must be more reliable than customer EHR + Node failover must be automatic + Protect against network and data center failure scenarios Found at: http://www.huntsmart.com/App_Themes/hs.com/ProductImages/250/DNSBC.jpg 28
  29. 29. Disaster Recovery DC Primary DC DC2 AZ2 Sharded Cluster Replicated Across Multiple Data Centers config mongos shard 2 arbiter mongos amq arbiter amq DC3 AZ1 shard 2 DC2 AZ1 shard 2 mongos shard 3 arbiter mongos arbiter shard 3shard 3 mongos shard 1 arbiter mongos arbiter shard 1shard 1 config config amq amq 29
  30. 30. Performance and Stress Testing – Lessons Learned + Acquire or build load testing tools + Test using a realistic, unbiased data set + Test database cluster to ensure write throughput + Ensure read & write performance meets load requirements + Find the performance ceiling + Find and resolve bottlenecks + Tune IO and memory Found at: http://www.webdesign.org/img_articles/21892/broken_chain.jpg 30
  31. 31. Data Migration – Lessons Learned Data Migration + Parallelize data migration process + Identify and remove bottlenecks + Scale out MongoDB cluster to handle heavy write load + Determine whether best to add indexes before or after migration + It takes a while to extract, transform, and load billions of documentsFound at: http://www.dennissy.com/wp-content/uploads/2010/07/house_moving_malaysia.jpg 31
  32. 32. Data Repair – Lessons Learned 32 Bulk update on collections + Use Bulk() operation builder  bulk.find.update()  Simple, unordered  parallelized  > 200,000 updates/minute + Regular update operation  ~ 2,000 updates/minute
  33. 33. Choosing the Appropriate Data Store MongoDB over relational? + Scale out for transaction volume and data volume + Developer productivity Easy map between application and data store + Highly varying document structure + Offload read activity in optimized format different from data writes (a.k.a. CQRS pattern) Found at: http://www.meonuk.com/hammers-mauls 33
  34. 34. Choosing the Appropriate Data Store Relational over MongoDB? + Complex normalized data model + Diverse read patterns requiring joins + Ad hoc reporting and analysis + Data integrity difficult to manage in application layer Found at: http://3.bp.blogspot.com/_QUmmdgc7l6A/TTPUyRWFNPI/AAAAAAAAAO8/KV_i2c2lrRk/s160 0/saws+various.jpg 34
  35. 35. MongoDB @ Practice Fusion Upcoming MongoDB projects + Observations data store Scale-out data store for patient vital signs, etc. + Clinical data repository Read cache for patient medical records (CQRS pattern) + Upgrades for Audit 2.0 WiredTiger + compression Found at: http://jbirdmedia.org/vessels/images/uploads/framing-new-const-lg.jpg 35
  36. 36. Q&A Michael Poremba mporemba@practicefusion.com linkedin.com/in/michaelporemba @mporemba 36

    Be the first to comment

    Login to see the comments

  • schaelle1

    May. 14, 2015
  • albertofrosi

    May. 14, 2015
  • wangwangchuang

    May. 14, 2015
  • ssuserbe46cf

    May. 15, 2015
  • tantrieuf31

    May. 16, 2015
  • jimmiethesun

    May. 17, 2015
  • wojtha

    May. 19, 2015
  • ssahi

    Jul. 3, 2015
  • sadhiesh

    May. 31, 2016
  • yazid.jibrel

    Jul. 17, 2016

Presented by Michael Poremba, Director of Data Architecture, Practice Fusion. Michael will share technical insights and key lessons learned by the Practice Fusion audit team, which can be leveraged for your own MongoDB projects. Key topics include: security, schema design, scalability, data migration, disaster recovery, and organizing the team of technical experts. Practice Fusion decided to move the primary audit system for their Electronic Health Record (EHR) from SQL Server to MongoDB. The goal was to provide a scale-out data store for a system that was under IO pressure. The project required moving 4 TB of production data from a traditional database schema into a document data store while the system was in operation with a peak transaction throughput of up to 1,000 writes per second. The audit system is a tier-1 component mandated by law, and any interruption in the availability of the audit system would result in a system outage. BIO: Michael Poremba is Director of Data Architecture at Practice Fusion, a cloud-based electronic health records (EHR) service used by over 100,000 health care providers. He has over 20 years of experience in software engineering with expertise in architecture, performance, and scalability of transactional database applications.

Views

Total views

2,694

On Slideshare

0

From embeds

0

Number of embeds

375

Actions

Downloads

43

Shares

0

Comments

0

Likes

10

×