Developing future-proof open source information systems for
Abstract. Healthcare information is full of context. The current design
approach to healthcare information systems (HIS) doesn't provide a facility to
transfer that context when the data is exchanged with other systems. This
paper aims to review the scientific literature regarding success and failures in
HIS implementation and integration projects and to present the fundamentals
of the development of a future-proof HIS, the Open Source Health Information
It has long been recognized that there is an inherent need for full semantic
interoperability between information systems that are used to manage healthcare
information [Lopez and Blobel 2009], [Hovenga 2008], [Ma et al. 2007], [Engel et al.
2006], [Nykanen and Karimaa 2006], [Rossi Mori and Consorti 1998]. Thus, in the
broadest sense, a Healthcare Information System (HIS) is any information system that
maintains and manages information affecting the health of a person or population
[Wikimedia Foundation, Inc. 2009].
Healthcare information has many complex temporal and spatial components and
then mix in the fact that medical and healthcare knowledge changes rapidly. It is widely
accepted that not only the medical condition of a person is significant to their health but
a host of social, economic and environmental conditions place direct and indirect effects
on personal and population health [Ghosh 2007], [Norgall et al. 2006], [Ruland and
Bakken 2001], [Starr and Starr 1995]. This is the crux of why HIS re so complex and
typically 'purpose-built' without any real interoperability built into them.
While the science of information design encompasses much more than
computerized information systems, the aim of this paper is discussing the
computerization of healthcare information. Expressed in its simplest terms, the current
approach to developing an information system follows these steps: (a) requirements
gathering, using one or more of the many standard methodologies; (b) systems analysis,
in order to determine from the requirements the systems involved to meet the needs of
the requirements; (c) data modeling, which means building a model of the data elements
available from the information sources (systems) to meet the needs of the requirements;
(d) implementation of the data model in some persistent storage facility; (e)
implementation of the software required to add, edit and manage the storage and use of
the data elements defined in the model; and (f) maintenance of the system then entails
modifying the data model and the software as the requirements change. This process has
proven robust and reliable over many years and millions of software applications being
deployed when sound software engineering principles are adhered to in the entire
process [Covitz et al. 2003], [Xu et al. 2001], [Kanoui et al. 2000].
Looking into how typical applications handle the information management
aspect a bit more, we see how an application like this actually manages information. We
see that the data elements are discretely persisted and the software and/or data
management layer manages the relationships between these data elements.
Data elements are simply data elements. It is their relationship to each other and
their relationship to the user, via input and output design in each application, which
gives them their semantic context in order to create information from their existence
[Schuurman and Leszczynski 2008], [Rector et al. 2002], [Degoulet et al. 1998]. As an
example of the data versus information conundrum, let us say that we have a data
element of the integer 102. That integer can represent a systolic blood pressure
measurement, a heart rate, a body temperature, and so on. The validity, reliability and
clinical significance of this data element necessarily depends upon the context (that
includes, e.g., units of measurement, patient position, point of measurement, device
used, demographic data from the patient, etc.).
There are certain elements of a database management system that lend to this
context, but this feature is a part of the software layer, not the actual persistence of the
data. While it is often conceived that table names and column names in, for example,
SQL databases, provide this functionality, it is important to consider situations when the
data needs to be exchanged with an application that uses a hierarchical or object or
XML database. In this case, there is no standard way to do this translation. In fact, even
between applications based on SQL databases, the migration of data must be performed
on a case-by-case basis where the table and column names are not the same and a
custom translation must be built for every case [Huang et al. 2005], [Stitt 1995].
Certainly the development of standardized HL7 messages was a big leap
forward in information exchange in healthcare [Tracy and Dougherty 2002], [Hettinger
and Brazile 1994]. Two big drawbacks to this approach are that the message formats
had to be made very generic in order to accommodate systems that may or may not have
all of the data elements in a specific message and this exchange process requires a
manual mapping on both the sending and receiving ends for each message to their local
database. An entire industry developed around providing these mapping engines and the
expertise to setup the complex mapping routines. Even if the development of the
message formats were defined by domain experts, the mapping is typically performed
by software experts, not clinicians [Hiroi et al. 2007], [Lin et al. 2006], [Ma et al.
2005], [McDonald et al. 2003], [Kinsey et al. 2000], [Ma 1995].
It is important to highlight that information should be universally available
inside the healthcare sector, from professionals at the point of care up to
epidemiologists, policy makers and economists [Hartz and John 2009], [Iglehart 2004].
Nevertheless, the immense amount of standardized data sets built into applications
published on literature are almost entirely made based on the interests of specific
research projects [Elisa and Heimar 2006], [Tierney et al. 2006], [Innes et al. 2001],
[Studnicki et al. 2001]. It is not possible for every healthcare application to incorporate
all the “minimum data sets” being requested by these competing interests. Even if it
happened, the real semantic context of collected data that results in information is in the
software of the original application, not in the data itself [Mead, 2006].
Completely controlled environments like the U.S. Veterans Administration and
Kaiser-Permanente have demonstrated that a centrally controlled approach to HIS
development and management using current approaches does work and does reduce
adverse events as well as improve access to patient information at the point of care
[Protti and Groen 2008], [Raymond 2005]. The evidence exists that interoperable HIS
are a benefit to those organizations as well as to their patient population [Brennan et al.
2008]. However, even in countries with nationalized healthcare services it is still a
challenge to mandate a common data set for all HIS that meets the needs of all
healthcare information users [Halamka et al. 2005], [Suselj and Cuber 1998]. So, we
believe that what is needed is a new way of thinking about complex information
systems. The objective of this paper is to present an implementation of an information
model that addresses those challenges in healthcare information.
As an analogy, the idea of Lego® blocks is suggested. A box of these blocks contains
various sizes and shapes of components designed to work together to form objects. The
kit comes with instructions so that a certain sub-group of the blocks can be used to build
a specific object (e.g., a truck). Some of those same blocks can be used in combination
with additional ones, to build another specific object (e.g., a helicopter) according to a
different set of instructions. There are other instructions to build various other objects
all based on the same set of common blocks, all according to the restrictions described
in the instructions.
In healthcare information, we define a Common Reference Model (CRM) as the
common set of healthcare information blocks implemented in software that represent
clinical concepts as an information instance, and the instructions about the
representation of each clinical concept, as Clinical Knowledge Model (CKM).
If the CRM is abstract enough then it could be implemented in any object-
oriented language on any hardware platform. It therefore provides the freedom of use
and creativity to meet the needs and desires of application developers. The CKM units
(instructions) can be described by the domain experts for each concept because they
exist as constraint descriptions of the very broad based CRM, not existing in software
themselves, but simply informing the software about what they are representing. In fact,
the CKM units describe themselves completely with their entire semantic context. Since
they exist as data instances and not lines of software code, they can be exchanged with
other applications based on the CRM without any loss of semantics, and be persisted in
any type of data management system. CKM units can also be queried with a
standardized query language so that even the queries only have to be built one time,
since the data structure is standardized. Additionally, for every change either of the
healthcare science or management, a new version of a specific CKM unit can be
adapted from the existing version, with no need to change the software nor is there any
loss to the semantic integrity of the existing data.
The openEHR specifications are designed following those principles, and it has
been proven in multiple programming languages using multiple persistence layers and
hardware platforms, and also provides definitions for service and support layers in order
to allow the development of real healthcare applications [Beale and Heard 2007].
The Open Source Health Information Platform (OSHIP) is the reference
implementation of the openEHR specifications release 1.0.2 in Python language,
heavily dependent on the Zope Component Architecture (ZCA) version 3.4.x. It is a
library combined with other open source components designed to facilitate the creation
of interoperable healthcare applications. The persistence (storage) layer is completely
independent of the reference model and component libraries. Any type of database can
be used, such as object database or any SQL databases, but a native Python object
database such as the Zope Object Database (ZODB) provides transparent object
persistence, which eliminates the issues of mapping objects to SQL tables. OSHIP itself
uses the ZODB for the Archetype Server and Terminology Server. The current status of
development of the project can be found on http://launchpad.net/oship.
The approach for designing HIS here presented requires the adoption of a new way of
thinking, but not letting go of sound software engineering principles. In fact, this
approach enforces the sound principles of separation of software and data. Of course
high quality, rapidly delivered information is needed by healthcare decision makers; but
the mainstream approach on this matter leads to wasted time and effort and even poor
quality results due to a lack of interoperability across applications causing
misinterpretation of existing data. The context currently lies in the software, where it
can’t be exchanged. There is a need to return it to the data, where it belongs.
Beale T, Heard S (eds.). openEHR Architecture Overview Revision 1.1. London, 2007;
Brennan DM, Holtz BE, Chumbler NR, Kobb R, Rabinowitz T. Visioning technology
for the future of telehealth. Telemed J E Health 2008; 14: 982-5.
Covitz PA, Hartel F, Schaefer C, De Coronado S, Fragoso G, Sahni H, Gustafson S,
Buetow KH. caCORE: a common infrastructure for cancer informatics.
Bioinformatics 2003; 19: 2404-12.
Degoulet P, Sauquet D, Jaulent MC, Zapletal E, Lavril M. Rationale and design
considerations for a semantic mediator in health information systems. Methods Inf
Med 1998; 37: 518-26.
Elisa R, Heimar M. Nursing minimum data set: A literature review. Stud Health
Technol Inform 2006; 122: 734-7.
Engel K, Blobel B, Pharow P. Standards for enabling health informatics
interoperability. Stud Health Technol Inform 2006; 124: 145-50.
Ghosh T. Structuration and sensemaking: frameworks for understanding the
management of health information systems in the ICU. Stud Health Technol Inform
2007; 130: 45-57.
Halamka J, Aranow M, Ascenzo C, Bates D, Debor G, Glaser J, Goroll A, Stowe J,
Tripathi M, Vineyard G. Health care IT collaboration in Massachusetts: the
experience of creating regional connectivity. J Am Med Inform Assoc 2005; 12:
Hartz S, John J. Public health policy decisions on medical innovations: What role can
early economic evaluation play? Health Policy 2009; 89: 184-92.
Hettinger BJ, Brazile RP. Health Level Seven (HL7): standard for healthcare electronic
data transmissions. Comput Nurs 1994; 12: 13-6.
Hiroi K, Ido K, Yang W, Nakaya J. Interface analysis between GSVML and HL7
version 3. J Biomed Inform 2007; 40: 527-38.
Hovenga EJS. Importance of achieving semantic interoperability for national health
information systems. Texto & contexto enferm 2008; 17: 158-167.
Huang EW, Wang DW, Liou DM. Development of a deterministic XML schema by
resolving structure ambiguity of HL7 messages. Comput Methods Programs Biomed
2005; 80: 1-15.
Iglehart JK. Global health policy and free access to information. Health Aff (Millwood)
2004; 23: 7-8.
Innes G, Murray M, Grafstein E. A consensus-based process to define standard national
data elements for a Canadian emergency department information system. CJEM
2001; 3: 277-84.
Kanoui H, Joubert M, Maury G. A semantic-based kernel for advanced health
information systems. Med Inform Internet Med 2000; 25: 19-43.
Kinsey TV, Horton MC, Lewis TE. Interfacing the PACS and the HIS: results of a 5-
year implementation. Radiographics 2000; 20: 883-91.
Lin IC, Hsu HM, Liu CT. A HL7 transformer application for vaccination data report. Int
J Electron Healthc 2006; 2: 117-31.
Lopez DM, Blobel BG. A development framework for semantically interoperable health
information systems. Int J Med Inform 2009; 78: 83-103.
Ma C, Frankel H, Beale T, Heard S. EHR query language (EQL) - a query language for
archetype-based health records. Stud Health Technol Inform 2007; 129(Pt 1):
Ma H, Rolka H, Mandl K, Buckeridge D, Fleischauer A, Pavlin J. Implementation of
laboratory order data in BioSense Early Event Detection and Situation Awareness
System. MMWR 2005; 54(Suppl): 27-30.
Ma H. Mapping clause of Arden Syntax with HL7 and ASTM E 1238-88 standard. Int J
Biomed Comput 1995; 38: 9-21.
McDonald CJ, Huff SM, Suico JG, Hill G, Leavelle D, Aller R, Forrey A, Mercer K,
DeMoor G, Hook J, Williams W, Case J, Maloney P. LOINC, a universal standard
for identifying laboratory observations: a 5-year update. Clin Chem 2003; 49:
Mead CN. Data interchange standards in healthcare IT - computable semantic
interoperability: now possible but still difficult, do we really need a better
mousetrap? J Healthc Inf Manag 2006; 20: 71-8.
Norgall T, Blobel B, Pharow P. Personal health - the future care paradigm. Stud Health
Technol Inform 2006; 121: 299-306.
Nykanen P, Karimaa E. Success and failure factors in the regional health information
system design process - results from a constructive evaluation study. Methods Inf
Med 2006; 45: 85-9.
Protti D, Groen P. Implementation of the Veterans Health Administration VistA clinical
information system around the world. Healthc Q 2008; 11: 83-9.
Raymond B. The Kaiser Permanente IT transformation. Healthc Financ Manage 2005;
Rector AL, Rogers J, Roberts A, Wroe C. Scale and context: issues in ontologies to link
health- and bio-informatics. Proc AMIA Symp 2002; 642-6.
Rossi Mori A, Consorti F. Exploiting the terminological approach from CEN/TC251
and GALEN to support semantic interoperability of healthcare record systems. Int J
Med Inform 1998; 48: 111-24.
Ruland CM, Bakken S. Representing patient preference-related concepts for inclusion
in electronic health records. J Biomed Inform 2001; 34: 415-22.
Schuurman N, Leszczynski A. A method to map heterogeneity between near but non-
equivalent semantic attributes in multiple health data registries. Health Informatics J
2008; 14: 39-57.
Starr P, Starr S. Reinventing vital statistics. The impact of changes in information
technology, welfare policy, and health care. Public Health Rep 1995; 110: 534-44.
Stitt FW. A standards-based clinical information system for HIV/AIDS. Medinfo; 8(Pt
1): 402, 1995.
Studnicki J, Luther SL, Kromrey J, Myers B. A minimum data set and empirical model
for population health status assessment. Am J Prev Med 2001; 20: 40-9.
Suselj M, Cuber S. Smartcard - a vehicle of access to the health care services. Stud
Health Technol Inform 1998; 52(Pt 1): 31-5.
Tierney WM, Beck EJ, Gardner RM, Musick B, Shields M, Shiyonga NM, Spohr MH.
Viewpoint: a pragmatic approach to constructing a minimum data set for care of
patients with HIV in developing countries. J Am Med Inform Assoc 2006; 13:
Tracy WR, Dougherty M. HL7 standard shapes content, exchange of patient
information. J AHIMA 2002; 73: 48-51.
Wikimedia Foundation, Inc, 2009. “Health informatics”. In: Wikipedia®, the free
encyclopedia. Available on: http://en.wikipedia.org/wiki/Medical_informatics.
Accessed in Jan 14, 2009.
Xu Y, D'Alessio L, Jaulent MC, Sauquet D, Spahni S, Degoulet P. Integrating medical
applications in an open architecture through generic and reusable components. Stud
Health Technol Inform 2001; 84(Pt 1): 63-7.