Vip sexy Call Girls Service In Sector 137,9999965857 Young Female Escorts Ser...
Scaling up an openEHR CDR
1. Scaling up an openEHR CDR
Christian Chevalley, Khon-Kaen, Thailand
christian@adoc.co.th
– Born in Geneva, Switzerland
– Studied Physics and Computer Science at Geneva
University
– Worked for several blue chip companies (HP, Sun
Microsystems)
– Developed 5 commercial enterprise systems for
Finance and Healthcare
– Founded ADOC Software in 2009: a Thailand based
operation, BOI supported
– Wrote EtherCIS in 2011
– Migrated EtherCIS to EHRbase in 2019
2. ●
Governance
– Hanover Medical School (https://www.mhh.de/en/)
– Vitasystems Gmbh (https://www.vitagroup.ag/de_DE/Ueber-uns/vitasystems)
– HiGHmed Medical Informatics (https://highmed.org/), sponsored by:
●
German Ministry of Education and Research (https://www.bmbf.de/en/index.html)
●
Medical Informatics Initiative Germany (https://www.medizininformatik-initiative.de/en/about-
initiative)
– Open Source!
3. EHRBase: What Is it?
●
openEHR CDR: Reference Model (RM 1.0.4), ADL 1.4
●
Transactional DB Centric Application (PostgreSQL 11+)
●
OpenEHR REST API incl. AQL
●
Development:
– Java 11, jOOQ, Archie, SQL
– Test Automation, Continuous Integration: Robot, Circle CI
– Load Testing: jmeter
– Quality Checking: Sonar Analysis (sonarcloud.io)
4. Scalability: Some Numbers
●
Deal with > 10’s Mio EHRs
●
Avg nnn compositions/EHR
●
> nnn TB of data (even PB!)
●
> nnnn concurrent users
5. Many Challenges
●
Multiple Levels of Technical Limitations
– Storage I/Os
– DB (even stated as limitless...)
– Network Latency
– Middleware latency (!) (in particular transformations)
●
Overlapping NFRs
– Multi-Tenancy
– Secondary use (analytics)
– Availability
– Security
– Administration: maintenance, disaster management, monitoring
6. My Observations
●
Two areas of concern
– CRUD
– Querying (AQL)
●
Query/transaction has to be really fast (~ 1ms or less)
– Minimize middleware/DB transactions
●
ONE query to the DB
●
Resolve containments and paths before launching the query
– Optimize DB model
●
Deal with limitations (denormalization of ITEM_STRUCTURE)
●
Indexing
●
Monitor query execution (query planner)
●
Keep SQL translations as short as possible
7. Observation/Optimization
●
DB CRUD should be performed in ONE transaction
●
Query (AQL) is accelerated by pre-calculation of
value points paths. Then executed in ONE
transaction
●
OpenEHR middleware (many) format
transformations remain costly!
8. Benchmark
650 000 EHRs - 130 000 000 compositions
PostgreSQL cluster with 5 nodes, (12 vCPU, 8 GB RAM, 3 TB disk)
select e/ehr_id/value, a/uid, o/data[at0001]/events[at0002]/data
[at0003]/items[at0004]/value from EHR e contains COMPOSITION a[openEHR-
EHR-COMPOSITION.sample_encounter.v1] contains OBSERVATION o[openEHR-EHR-
OBSERVATION.sample_blood_pressure.v1] where o/data[at0001]/events[at0002]
/data[at0003]/items[at0004]/value/magnitude > 20 limit 50
9. Distributing Transaction Load
●
Deploy DB as a “dumb” cluster
●
Deploy DB as a hyperscale cluster
●
Deploy the middleware as a distributed cluster
w/distributed AQL optimizer
10. DB Dumb Cluster (1)
Pros
- Easy to deploy (at the beginning)
Cons
- DB maintenance:schema, migration,
backup/recovery
- Storage (replication!)
- No parallelization
- No failover of node
- Heavy procedure to add nodes
- Expensive in a Cloud environment
- Security
- No easy secondary usage
- Has an impact on code logic!
11. DB Dumb Cluster (2)
Pros
- Somehow easy to deploy (at the
beginning)
Cons
- DB maintenance:
schema,
migration,
backup/recovery
- Storage (replication!)
- No parallelization
- Heavy procedure to add nodes
- Expensive in a Cloud environment
- Security
- Potentially reach DB limits...
12. HyperScale DB
(Citus, YugabyteDB etc.)
Pros
- Transparent DB maintenance
(single master for admin)
- distributed storage
- parallelism
- Automated failover
- Tools to maintain nodes
- Distributed Security policy
Cons
- Can be tricky to deploy (DB
system setting, driver, may require
additional sharding key...)
14. Conclusion
●
Assuming the right topology (cluster + db sharding), operation
involves Capacity Planning: monitoring, thresholds,
orchestration tool etc.
●
Other infrastructure aspects must be factored in:
– Network latency between nodes
– Storage technology (SSD, write ahead, caching)
– Significant operating concepts and administration
– Requires skills to be administered properly