Scaling up an openEHR CDR

Scaling up an openEHR CDR
Christian Chevalley, Khon-Kaen, Thailand
christian@adoc.co.th
– Born in Geneva, Switzerland
– Studied Physics and Computer Science at Geneva
University
– Worked for several blue chip companies (HP, Sun
Microsystems)
– Developed 5 commercial enterprise systems for
Finance and Healthcare
– Founded ADOC Software in 2009: a Thailand based
operation, BOI supported
– Wrote EtherCIS in 2011
– Migrated EtherCIS to EHRbase in 2019

●
Governance
– Hanover Medical School (https://www.mhh.de/en/)
– Vitasystems Gmbh (https://www.vitagroup.ag/de_DE/Ueber-uns/vitasystems)
– HiGHmed Medical Informatics (https://highmed.org/), sponsored by:
●
German Ministry of Education and Research (https://www.bmbf.de/en/index.html)
●
Medical Informatics Initiative Germany (https://www.medizininformatik-initiative.de/en/about-
initiative)
– Open Source!

EHRBase: What Is it?
●
openEHR CDR: Reference Model (RM 1.0.4), ADL 1.4
●
Transactional DB Centric Application (PostgreSQL 11+)
●
OpenEHR REST API incl. AQL
●
Development:
– Java 11, jOOQ, Archie, SQL
– Test Automation, Continuous Integration: Robot, Circle CI
– Load Testing: jmeter
– Quality Checking: Sonar Analysis (sonarcloud.io)

Scalability: Some Numbers
●
Deal with > 10’s Mio EHRs
●
Avg nnn compositions/EHR
●
> nnn TB of data (even PB!)
●
> nnnn concurrent users

Many Challenges
●
Multiple Levels of Technical Limitations
– Storage I/Os
– DB (even stated as limitless...)
– Network Latency
– Middleware latency (!) (in particular transformations)
●
Overlapping NFRs
– Multi-Tenancy
– Secondary use (analytics)
– Availability
– Security
– Administration: maintenance, disaster management, monitoring

My Observations
●
Two areas of concern
– CRUD
– Querying (AQL)
●
Query/transaction has to be really fast (~ 1ms or less)
– Minimize middleware/DB transactions
●
ONE query to the DB
●
Resolve containments and paths before launching the query
– Optimize DB model
●
Deal with limitations (denormalization of ITEM_STRUCTURE)
●
Indexing
●
Monitor query execution (query planner)
●
Keep SQL translations as short as possible

Observation/Optimization
●
DB CRUD should be performed in ONE transaction
●
Query (AQL) is accelerated by pre-calculation of
value points paths. Then executed in ONE
transaction
●
OpenEHR middleware (many) format
transformations remain costly!

Benchmark
650 000 EHRs - 130 000 000 compositions
PostgreSQL cluster with 5 nodes, (12 vCPU, 8 GB RAM, 3 TB disk)
select e/ehr_id/value, a/uid, o/data[at0001]/events[at0002]/data
[at0003]/items[at0004]/value from EHR e contains COMPOSITION a[openEHR-
EHR-COMPOSITION.sample_encounter.v1] contains OBSERVATION o[openEHR-EHR-
OBSERVATION.sample_blood_pressure.v1] where o/data[at0001]/events[at0002]
/data[at0003]/items[at0004]/value/magnitude > 20 limit 50

Distributing Transaction Load
●
Deploy DB as a “dumb” cluster
●
Deploy DB as a hyperscale cluster
●
Deploy the middleware as a distributed cluster
w/distributed AQL optimizer

DB Dumb Cluster (1)
Pros
- Easy to deploy (at the beginning)
Cons
- DB maintenance:schema, migration,
backup/recovery
- Storage (replication!)
- No parallelization
- No failover of node
- Heavy procedure to add nodes
- Expensive in a Cloud environment
- Security
- No easy secondary usage
- Has an impact on code logic!

DB Dumb Cluster (2)
Pros
- Somehow easy to deploy (at the
beginning)
Cons
- DB maintenance:
schema,
migration,
backup/recovery
- Storage (replication!)
- No parallelization
- Heavy procedure to add nodes
- Expensive in a Cloud environment
- Security
- Potentially reach DB limits...

HyperScale DB
(Citus, YugabyteDB etc.)
Pros
- Transparent DB maintenance
(single master for admin)
- distributed storage
- parallelism
- Automated failover
- Tools to maintain nodes
- Distributed Security policy
Cons
- Can be tricky to deploy (DB
system setting, driver, may require
additional sharding key...)

EHRbase Cluster+HyperScale DB
(Citus, YugabyteDB etc.)
Pros
- Distributed Middleware
processing
Cons
- Can be tricky to deploy

Conclusion
●
Assuming the right topology (cluster + db sharding), operation
involves Capacity Planning: monitoring, thresholds,
orchestration tool etc.
●
Other infrastructure aspects must be factored in:
– Network latency between nodes
– Storage technology (SSD, write ahead, caching)
– Significant operating concepts and administration
– Requires skills to be administered properly

Scaling up an openEHR CDR

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scaling up an openEHR CDR

Similar to Scaling up an openEHR CDR (20)

More from openEHR-Japan

More from openEHR-Japan (20)

Recently uploaded

Recently uploaded (20)

Scaling up an openEHR CDR