2010 last papers

Clinical Document Data Warehouse (CD2W)
Josep Vilalta Marzo a , Diego Kaminker b , Josep M. Picas Vidal c, , M. Lluisa Bernard Antoranz d

, Cristina Siles c , Rafael Rosa Prat a
a
Vico Open Modeling S.L.,b Kern Information Technology SRL, c Hospital de la Santa Creu i Sant Pau , d Institut Català
de la Salut

Abstract Nowadays, there are several on-going projects with the shared
goal of consolidation of clinical information, both at the Spain
This paper shows the development of the Clinical Document level or at a autonomic community level.
Data Warehouse project (CD2W) and its implementation These projects also share a great level of complexity and costs
using CDA R2 documents (Clinical Document Architecture to achieve their goals. This small project CD2W aspires to
Release 2). help in achieving this consolidation of clinical information
The main objectives of this project was to prototype a web with focus on easing the task for the healthcare providers,
based portal allowing access to a clinical document repository organizations administering huge data bases with several
and a data warehouse with clinical information about patients difficulties to integrate and no reference information model
from several medical organizations in Barcelona, Spain (a available.
major hospital and 4 primary care centers), giving access to
clinical patient information for primary care and leveraging
the same standardized information to populate the data Materials and Methods
warehouse (secondary use).
The project was developed during the first half of 2009 under Hospital de la Santa Creu i Sant Pau is a high complexity
the general direction of the Hospital de la Santa Creu i Sant hospital which dates back six centuries, making it the oldest
Pau. (HSCSP) hospital in Spain. Healthcare is centered on Barcelona but
Due to the prototype nature of the project, the scope was extends to the rest of Catalonia. The center plays a prominent
limited to patients with Congestive Heart Disease (CHD) who role in Spain and is internationally renowned.
consented the use of their information for clinical research to
the HSCSP and with the current vocabularies used by the The hospital has distinguished itself in the healthcare provided
providers (ICD9 and other local terminologies). in many fields, making it a reference centre in several
The data warehouse was developed using HL7 RIM basic specialties. The center attends over 34,000 admissions each
concepts. year and more than 150,000 emergencies. Approximately
The project mission was to improve patient care with the use 300,000 people are visited at the ambulatory services annually
of global standards, open technology, low exploitation cost and the Day Hospital attends over 60,000 users. There are 71
and ease of use. CDA R2 documents. day hospital beds, 634 hospitalization beds and 19 surgical
During this project, we used the SCRUM agile methodology rooms.
allowing scalable, progressive and incremental software Teaching and training programmes at the Hospital de la Santa
development process. Creu i Sant Pau cover many levels, comprising the UAB
(Universitat Autonoma de Barcelona) Faculty of Medicine
Teaching Unit, the University School of Nursing, participation
Introduction – Business Case in the State Residency Programmes to train specialists,
masters and doctorate course, continuing education, etc. In the
Solving the questions arising when a patient shows several field of research, Hospital de la Santa Creu i Sant Pau is one
clinical issues is one of the greatest challenges for healthcare of the most prominent centers in Spain, as can be appreciated
providers. from the volume of papers published and their input factor, the
Fast clinical decisions making at the healthcare location number and quality of projects which receive funding and the
generates the need to put all relevant and up-to-date clinical grants awarded. The Hospital de la Santa Creu i Sant Pau is
information to support the process. governed by the Patronat de la Fundació de Gestió Sanitària
Usually, we don't have efficient filtering flagging the critical (FGSHSCSP), a board with representatives from the Regional
issues where to focus our attention. We encounter scenarios Catalan Government (Generalitat de Catalunya), the City Hall
with data overload demanding a big effort to synthesize useful of Barcelona and the Archbishopric of Barcelona.
information, or sparse data demanding the use of imagination
The goals for this project were:
to connect and reach any conclusion.
Another problem is that usually our available clinical 1. Define a scheme to integrate information from disjoint
information source is the sole organization supporting the information platforms.
healthcare provider. Other information generated by other 2. Implement a process for periodical data and document
channels where the patient was attended is usually brought by exchange with minimal workload implications for the primary
the patient in several paper formats. Consolidation of relevant care centers.
and up to date information from distinct healthcare 3. Generate a clinical data store enabling the coexistence of
organizations from different authorities is a pending issue clinical documents and relational data in a longitudinal patient
waiting for an agreement from the healthcare authorities and healthcare record..
harmonization of different data base schemes.

4. Create a simple user interface to ease fast queries to the
relevant clinical information about a patient..
5. Restrict access to clinical information to professionals
authorized by the participating organizations (HSP and ICS)
6. Use open technology for design, development and
implementation of the data warehouse, minimizing
exploitation costs and use global interoperability standards to
enable universal access.
7. Evaluate the impact of normalizing data from different
organizations and code systems..
8.Evaluate useability and added value chain of CD2W to the
physicians for healthcare decisions
9. Evaluate results of this prototype to study the possible
extension of the access to the patients or to other
professionals.
10 .Evaluate the results for a new project with a broader
scope. Figure 2 – CD2W Domain Analysis
Implementation, Methodology and Tools And finally the derivation of a datawarehouse model to
store facts, dimensions and supporting standard clinical
The project has four main design aspects: documents.
1. Clinical Datawarehouse Design RIM-wise, facts are acts, and main dimensions are
2. Standard Document Design entities and roles and their attributes.
3. Datawarehouse Population Process Design
4. User Interface Design 2. Standard Document Design.
5. Technological implementation.
Lets review them:
The documents were the 'lingua franca' between disparate
systems used by the participating organizations (primary care
1. Clinical Datawarehouse Design centers, Hospital her and discharge system) [1].

The model for CD2W was derived from the RIM base classes We designed two different clinical document templates, one
(role,entity,act,participation) [3] following a three step for the evolution note from the primary care centers and one
methodology: for the discharge note from the Hospital discharge system.
a- Development of a conceptual framework, to be discussed
with the CD2W stakeholders, including which were the The templates shared the same information at the header level,
measures or facts, and how should the information be
but differed in their section contents.
classified (dimensions). Since this data warehouse was intended for secondary use,
patient and physician information was de-identified [2]
(names, identifications and addresses were removed or
replaced).

The information generated by the local provider applications
was transformed using an XSL to clinical documents and this
standard documents were processed and stored into the
C2DW.

We tested our mapping, process and query interface with
16125 clinical documents from the hospital and primary care
centers.

Figure 1 – CD2W conceptual model

b. Then, on a more technical level, a domain analysis model
was generated.

In order to guide the development team a table was built with
the main required elements from each primary care center and
its location inside of the standard clinical document. B. Data Generation Process

Figure 4 – Transformation Table Use Cases related to the DW population

Login
- User validation
- Batch load
- Stat Processing
- Document Processing
- Update audit log

C. Queries

Data access and retrieval from the users
− Retrieve app parameters
− Query control panel (figure 3)
− Query Patient Monitor (figure 4)
− Query Encounter Monitor
3. Data warehouse population process − Query Conditions Monitor
− Query Healthcare Centers
− Query Healthcare Professionals
The process to populate the data ware house included several
steps. − Query Healthcare Services
[At the primary care center] − Query Stats (figure 5,6)
1. Select encounters for patients suffering of CHD with − Query audit logs
signed consents. − Browse CDA R2 document (figure 7)
2. Create a basic, shallow XML for each primary care center
encounter The following figures illustrate the main user interface forms for
[At the data processing center] CD2W:
3. Create a CDA conformant document using the defined
mapping for the center, for each instance
4. Populate the CD2W database with the information from
each document header, and the document itself.

This process was triggered periodically by the primary care
centers and HSP.

4. User interface design

The user interface design was use-case based.
Identified use cases were:

A. App Administration

Application setup and parametrization
− Define healthcare agent
− Define healthcare agent type
− Define healthcare agent role
− Define app parameters
− Define service catalog
− Define service

Figure 3 – CD2W Control Panel

Figure 7- CDA R2 Document for an encounter

5. Technological Implementation
The system was implemented using an open source Apache
application service, a MySQL database, PHP development
environment and the amCharts data analysis and graphing
component.

Evaluation/Assessment/Lessons Learned
The HL7 Reference Information Model was very useful to aid
Figure 4 – CD2W Patient Query modeling our specific domain and generate a scheme to
integrate all participating applications into an information
model.

The process for exchange could be established, but we needed
to educate the participating centers on the use of CDA R2 and
the rationale for asking each piece of information.
Nevertheless, we needed to bridge their data model to CDA
R2 by providing them with a customized XSL for each center.
Coexistence of documents with the relational information
needed to explore the embedded information was possible,
although we need to test with more volume.
The user interface was enough for our pilot users from three
centers to achieve their data exploration and verification
needs. The use of open technologies and standards was a key
Figure 5 – Services for one patient factor to minimize development time. Our normalizing efforts
were not finished, we end up using ICD-9 and internal ICS
vocabularies.

Future Plans

Extension of this tool to be accesible to patients and other
professionals will be studied, but current policies make very
difficult to gain access to patient data, even for this approved
project. We also want to explore using native XML open
source databases. A project with a greater scope will be
studied but the whole approach and the generated model are
suitable for other domains.

Figure 6- Temporal series for a patient

Acknowledgements

Spain Ministry of Health for Scholarship FIS Dossier P105/230.

Bibliography

[1] HL7 Clinical Document Architecture, Release 2
Robert H Dolin et al
JAMIA 2006;13:30-39 doi:10.1197/jamia.M1888

[2] Ley Orgánica 15/1999, de 13 de diciembre, de Protección de
Datos de Carácter Personal
(BOE núm. 298, de 14-12-1999, pp. 43088-43099)

[3] The Design of the HL7 RIM-based Sharing
Components for Clinical Information Systems
Wei-Yi Yang, Li-Hui Lee, Hsiao-Li Gien, Hsing-Yi Chu, Yi-Ting
Chou, and Der-Ming Liou
World Academy of Science, Engineering and Technology 53 2009

Contact
Josep Vilalta Marzo
Francesc Layret 24, Badalona, Barcelona, Spain
Email: jvilalta@vico.org

Improving the Usability of HL7 Information Models by Automatic Filtering

Antonio Villegas and Antoni Oliv´ e Josep Vilalta
Services and Information Systems Engineering Department HL7 Education & e-Learning Services
Universitat Polit` cnica de Catalunya
e HL7 Spain (Health Level Seven International)
Barcelona, Spain Barcelona, Spain
Email: {avillegas, olive}@essi.upc.edu Email: jvilalta@vico.org

Abstract—The amount of knowledge represented in the to manually extract knowledge from them. This problem is
Health Level 7 International (HL7) information models is very shared by other large models [6].
large. The sheer size of those models makes them very useful Currently, there is a lack of computer support to make
for the communities for which they are developed. However,
the size of the models and their overall organization makes it those models usable for the goal of knowledge extraction. In
difficult to manually extract knowledge from them. this paper, we propose to extract that knowledge by using a
We propose to extract that knowledge by using a novel novel filtering method that we have developed, and we show
filtering method that we have developed. Our method is based that the use of our prototype implementation of that method
on the concept of class interest as a combination of class improves the usability of HL7 information models.
importance and class closeness. The application of our method
automatically obtains a filtered information model of the whole
The structure of the paper is as follows. Section II
HL7 models according to the user preferences. We show that introduces the HL7 models and describes the main UML
the use of a prototype tool that implements that method and constructs used to build them. Section III describes the
produces such filtered model improves the usability of the HL7 concept of class importance and references the methods that
models due to its high precision and low computational time. can be used to compute it. Section IV describes the concept
Keywords-Usability, Health Level Seven International, HL7, of class interest with respect to a filter set of classes and
Models, Filtering, UML explains how to compute it. Section V presents our model
filtering method. Section VI evaluates the use of the method
I. I NTRODUCTION in the context of the HL7 models. Finally, Section VII
The Health Level Seven International (HL7) is a not-for- summarizes the conclusions and points out future work.
profit, ANSI-accredited standards developing organization
II. HL7 I NFORMATION M ODELS
dedicated to providing a comprehensive framework and
related standards for the exchange, integration, sharing, and Types of Models
retrieval of electronic health information that supports clini- The HL7 information models comprise three types of
cal practice and the management, delivery and evaluation of models. Each of the model types is based on the UML,
health services [1]. although the concrete notation used differs depending on
HL7 develops specifications, the most widely used being the model type. Also, the models differ from each other in
a messaging standard that enables disparate healthcare ap- terms of their information content, scope, and intended use.
plications to exchange key sets of clinical and administrative The following types of information models are defined:
data. The HL7 standard specifications are unified by shared • Reference Information Model (RIM) - The RIM is the
reference models of the healthcare and technical domains information model that encompasses the HL7 domain
[2], [3]. of interest as a whole. The RIM is a coherent, shared
The amount of knowledge represented in the HL7 infor- information model that is the source for the data content
mation models is very large and continuously improved. The of all HL7 interoperability artifacts: V2.x messages and
sheer size of those models makes them very useful to the XML clinical documents CDA R2 [3].
communities for which they were developed: HL7 interna- • Domain Message Information Model (D-MIM) - A D-
tional affiliates with more than fifty HL7 active working MIM is a refined subset of the RIM that includes a set
groups (Structured Documents, Clinical Decision Support, of classes, attributes and relationships that can be used
Clinical Genomics...), large integrated healthcare delivery to create messages and structured clinical documents
networks, government agencies and other organizations that for a particular domain (a particular area of interest
use those models for the development of their enterprise in healthcare). There are predefined D-MIMs for a set
information architecture of health systems [4], [5]. of over 15 universal domains, such as Accounting and
However, the size of HL7 information models and their Billing, Care Provision, Claims and Reimbursement,
organization makes it very difficult for those communities and so on.

Figure 1. Sample of HL7 RIM refinements related to ActAppointment class

• Refined Message Information Model (R-MIM) - The R- 2) The multiplicities of an association defined between
MIM is a subset of a D-MIM that is used to express the RIM classes are strengthened in the subclasses.
information content for a message/document or set of 3) The multiplicity of an attribute of a RIM class is
messages/documents with annotations and refinements strengthened in a subclass. An optional attribute in
that are message/document specific. The content of an a RIM class can be made mandatory or not allowed
R-MIM is drawn from the D-MIM for the specific in a subclass. Note that it is not allowed to add new
domain in which the R-MIM is used. attributes.
R-MIM models refine D-MIM models in the same way.
Structure of the HL7 Information Models
In all cases, the three kind of refinements can be expressed
The RIM, D-MIM and R-MIM models can be analyzed using UML constructs.
as if they were built using in a particular way a small subset Figure 1 shows a few refinements related to the ActAppo-
of constructs provided by the UML [7]. Figure 1 illustrates intment class. The instances of this class are appointments
with a very small fragment of the RIM and of one D-MIM (a particular kind of Act). There may be several kinds of
the main UML constructs used. RIM comprises six backbone participations in an appointment. Figure 1 shows only two
classes: Act, Participation, Entity, Role, ActRelationship and of them: PerformerOfActAppointment and SubjectOfActAp-
RoleLink. Figure 1 shows the first four of these classes. pointment. To indicate that when the act is an appointment
Each one has a number of attributes with a defined mul- then the participations must be instances of PerformerOf-
tiplicity. Surprisingly, there are only eight main associations ActAppointment or of SubjectOfActAppointment, we redefine
between the RIM classes, all of them binary and with their the association Participation-Act as shown in the figure. Note
corresponding multiplicities. Figure 1 shows four of these that redefinition is a UML construct, which is very useful in
associations. situations like this one. The redefinition of the association
Each of the RIM classes has many subclasses, although Role-Participation is similar. The overall semantics of these
only a few of them are explicitly shown in the diagrams redefinitions is that the performer of an appointment is a
of the HL7 RIM specification. There are many special- Person that plays the role AssignedPerson, and that the
ization/generalization relationships (called IsA relationships, subject of an appointment is a Person that plays the role
e.g. Organization IsA Entity) in the HL7 models. The num- Patient.
ber of RIM classes and subclasses is over 2,500. Figure 1 Sometimes, the UML redefinition construct does not allow
shows seven subclasses of four of the backbone RIM classes the graphical representation of the strengthening of asso-
and seven IsA relationships. ciation multiplicities. In these cases, the redefinition must
D-MIM models refine the RIM in three ways: be formally captured by OCL invariants. For example, in
1) The participants of one of the eight main associations Figure 1, the refinement of act in SubjectOfActAppointment
defined between RIM classes are refined in the sub- also implies that an instance of ActAppointment is associated
classes. This is the refinement most often used in the with a non-empty set (1..*) of SubjectOfActAppointment.
HL7 models. Note that it is not allowed to add new However, this cannot be expressed graphically, and an OCL
associations. invariant must be used instead [8].

Table I
Figure 1 also shows the redefinitions of associations T OP -10 M OST I MPORTANT C LASSES .
player-playedRole and scoper-scopedRole between Entity
and Role. The player and the scoper of an AssignedPerson Rank Class Importance
and of Patient must be a Person and an Organization, 1 Act 7.51
respectively. 2 Role 5.11
3 ActRelationship 4.03
III. I MPORTANCE OF HL7 C LASSES
4 Participation 3.67
Our filtering method is based on the concept of class 5 Entity 3.5
importance. The importance of a class is a real number that 6 Observation 2.64
measures the relative importance of that class in a model. 7 InfrastructureRoot 1.81
We will see in the next section that we use that importance 8 Organization 1.72
to select which classes are shown to the users. 9 RoleLink 1.59
There exist different kinds of methods to compute the 10 FinancialTransaction 1.54
importance of classes in the literature. The simplest family
of methods is that based on occurrence counting [9]–[11],
where the importance of a class is equal to the number of The filtering method described in the next sections can
characteristics the class has represented in the model. These be used in connection with any of the existing methods for
methods are class centered in the sense that the importance computing the importance of classes.
of a class depends only in the information the class has.
Therefore, the more information about a class, the more IV. I NTEREST OF HL7 C LASSES
important it will be. The importance of a class is an absolute metric that
Another family of methods are those based in link analysis depends only on the whole set of HL7 models. The metric
[11], [12], where the importance of a class is defined as is useful when a user wants to know which are the most
a combination of the importance of the classes that are important classes, but it is of little use when the user is
connected to it with associations and/or IsA relationships. interested in a specific subset of classes, independently from
Such recursive definition results in an equation system and their importance. What is needed then is a metric that
indicates that the more important the classes connected measures the interest of a class with respect to such set,
to a class are, the more important such class will be. In that we call filter set.
these methods the importance is shared through connections, A filter set FS of classes is a non-empty set of classes
changing from a class centered philosophy to a more in- from the HL7 models. The filter set comprises the minimum
terconnected approach of the importance. Iterative methods set of classes in which a user is interested at a particular
are required to solve the importance equation system, which moment. For example, if the user wants to see what is
increases the computational cost of this kind of methods. the knowledge the models have about classes Patient and
Finally, there are some methods that even use the infor- ActAppointment, then she defines FS = {Patient, ActAppo-
mation about the existing instances of the classes and the intment}. We will see in the next section that starting from
associations of the model. Therefore, the importance they this filter set, our filtering method retrieves the knowledge
compute takes into account the structural part of the model represented in the models about Patient and ActAppointment
but also the data that the classes instantiate. The problem that is likely to be of more interest to the user.
with this family of instance-dependent methods [13], [14] Additionally, it is possible to define a set of classes not to
is that without instances the method cannot be used. be considered in the filtering method. We call such set the
As an example, Table I shows the 10 most important rejection set RS.
classes of the HL7 models1 computed using the CEntityRank Intuitively, the interest to a user of a class c with respect
importance algorithm (see 3.6 of [15]). To compute this to a filter set FS should take into account both the absolute
importance, the method takes into account the classes, the importance of c (as explained in the previous section) and
IsA relationships between them, the attributes and their a closeness measure of c with regard to the classes in FS.
multiplicities, the associations and their multiplicities, the For this reason, we define:
association redefinitions and the OCL invariants.
The in-depth study of the computation of the importance
of classes is beyond the scope of this paper. A review of Φ(c, FS) = α × Ψ(c) + (1 − α) × Ω(c, FS) (1)
methods to compute the importance taking into account where Φ(c, FS) is the interest of class c with respect to
different levels of knowledge is given in [15]. FS, Ψ(c) the absolute importance of class c, and Ω(c, F S)
1 The results have been obtained taking into account the RIM and the
is the closeness of class c with respect to FS.
following D-MIM models: Laboratory, Account and Billing, Scheduling Note that α is a balancing parameter in the range [0,1]
and Medical Records [1]. to set the preference between closeness and importance for

the retrieved knowledge. An α > 0.5 benefits importance
against closeness while an α < 0.5 does the opposite. The
default α value is set to 0.5 and can be modified by the user.
There may be several ways to compute the closeness
Ω(c, FS) of class c with respect to the classes of FS.
Intuitively, the closeness of class c should be directly related
to the inverse of the distance of c to the filter set FS. For
this reason, we define:
|FS|
Ω(c, FS) = (2)
d(c, c )
c ∈F S

where |FS| is the number of classes of FS and d(c, c )
is the minimum distance between a class c and a class
c belonging to the filter set FS. Intuitively, those classes
that are closer to more classes of FS will have a greater
closeness Ω(c, FS).
We assume that a pair of classes c, c are directly
connected to each other if there is a direct association (or
redefinition of association) between them or if one class is
a direct subclass of the other. For these cases, d(c, c ) = 1. Figure 2. Method Overview.
Otherwise, when c, c are not directly connected, d(c, c )
is defined as the length of the shortest path between them
traversing associations and/or ascending/descending through
class hierarchies.
As an example, Table II shows the top-10 classes with a non-empty initial filter set FS. An example
a greater value of interest when the user defines FS =
{Patient, ActAppointment} and α = 0.5. to obtain knowledge about patient and appoint
Results in Table II indicate that included within the
top-10 there are classes that are directly connected to
HL7 can be FS = {Patient, ActAppointment}.
all members of the filter set FS = {Patient, ActAp- In the same way, the user can specify a rejec
pointment} as in the case of SubjectOfActAppointment
(Ω(SubjectOfActAppointment, FS) = 1.0) but also classes (may be empty) with those classes that have n
that are not directly connected to any class of FS (although
they are closer).
her.
V. F ILTERING HL7 I NFORMATION M ODELS
In addition to the filter set, the user can decid
We have developed a method for filtering large models, knowledge she wants to obtain by indicating
of
and we have used the HL7 models as a case study for
developing and experimenting with the method, and its additional classes (Cmax ) the method has t
of
associated tool. The method consists of four consecutive
steps. The characteristics of each step are detailed below.
include in the filtered information model.
Figure 2 presents an overview of the method and steps. Apart from that, the user has the possibility to
Intuitively, from a small subset of classes selected by the
user the method automatically obtains a filtered information importance method (see Section III) wants to b
model with knowledge of interest.
following step. Also, she can include her prefe
Step 1: Setting the User Preferences
The first step of the method consists of prepare the
closeness and importance by setting a value for t
required information to filter the HL7 information models parameter α (see (1) in Section IV).
according to the user preferences. Basically, the user focus
on a set of classes (filter set) she is interested in and our Note that RS, Cmax , the importance meth
method surrounds them with additional knowledge from the parameter α have default values (RS = ∅,
HL7 models. Therefore, it is mandatory for the user to select
the default importance method is CEntityRa
α = 0.5) and therefore are all optional.
The user interaction is required only in this

Table II
M OST I NTERESTING CLASSES WITH REGARD TO F S = {Patient, ActAppointment}.

Rank Class (c) Ψ(c) d(c, Patient) d(c, ActAppointment) Ω(c, F S) Φ(c, F S)
1 SubjectOfActAppointment 0.11 1 1 1.0 0.7003
2 Organization 1.72 1 3 0.5 0.3552
3 Person 1.22 1 3 0.5 0.3537
4 ServiceDeliveryLocation 0.79 2 2 0.5 0.3524
5 AssignedPerson 0.72 2 2 0.5 0.3522
6 ManufacturedDevice 0.55 2 2 0.5 0.3517
7 LocationOfActAppointment 0.26 3 1 0.5 0.3508
8 ReusableDeviceOfActAppointment 0.19 3 1 0.5 0.3506
9 SubjectOfAccountEvent 0.13 1 3 0.5 0.3504
10 AuthorOfActAppointment 0.12 3 1 0.5 0.3503

On the other hand, to compute the closeness Ω(c, FS) of The main goal of this step consists in filtering information
an HL7 class with regard to the filter set FS it is required from the whole HL7 information models involving classes
to know the minimum distances between classes in the HL7 in the filtered model. To achieve this goal, the method
models (see (2) in Section IV). However, it is only necessary explores the associations, redefinitions of associations, and
to compute the distance from each class in the filter set to generalization/specialization relationships in the HL7 infor-
any class out of FS, which requires a lower computational mation models that are defined between those classes and
cost. Note that the method computes the closeness only for includes them in the filtered model to obtain a connected
those classes that are out of the filter set. model. The filtered information model for FS = {Patient,
ActAppointment} and the previous Interest Set is shown in
Step 3: Select Interest Set
Figure 3.
The third step of the method consists in computing the Our method also takes into account associations that
interest (Φ) for each class out of the FS. As previously are specified between superclasses of classes included in
shown in (1) of Section IV, the interest Φ(c, FS) of a the filtered information model, and brings them down to
candidate class c to be included in the output model is a connect such subclasses. An example of that behaviour is
linear combination of the importance Ψ(c) and the closeness the association between Participation and ActAppointment in
Ω(c, FS) taking into account the balancing parameter α. Figure 3. Such association is originally defined between Par-
Note that if a non-empty rejection set RS was defined ticipation and Act (see Figure 1). Given that, ActAppointment
in the first step of our method, those classes included in is a subclass of Act. Such association is descended to the
such set will not be considered for the final result nor their context of ActAppointment to indicate that there exists the
interest Φ will be computed. connection with Participation although Act was not included
The interest Φ produces a sorted ranking of HL7 classes in the Interest Set.
and the method selects the top classes of that ranking until
reaching the selected limit Cmax specified in the first step. When descending an association there exist the case that
We call such set of classes the Interest Set. Second column such association could be repeated. Figure 3 shows the
of Table II shows the classes that belong to the Interest association between Participation and ActAppointment. Note
Set according to FS = {Patient, ActAppointment} when that Participation is not a member of the Interest Set (see
Cmax = 10. Table II). However, Participation has been included in the
In case of two or more classes get the same interest our filtered information model as an auxiliary class (marked
method is non-deterministic: it might select any of those. in Figure 3 with a light grey color). The rationale is
Some enhancements can be done to try to avoid selecting that such association should be descended between each
classes in a random manner, like prioritizing the classes of the five subclasses (SubjectOfActAppointment, AuthorOf-
with a higher value of closeness or importance (or any other ActAppointment, ReusableDeviceOfActAppointment, Loca-
measure) in case of ties. tionOfActAppointment and SubjectOfAccountEvent) of Par-
ticipation present in the Interest Set and ActAppointment
Step 4: Compute Filtered Information Model which is not an UML compliant situation.
Finally, the last step of the method obtains the Interest Set To avoid repeated associations our method finds the lowest
of classes from the previous step and puts it together with common parent (LCP) for the previous subclasses, which
the classes of the filter set FS in order to create a filtered in this case is Participation, includes it in the filtered
information model with the classes of both sets. information model as an auxiliary class, and descends the

Figure 3. Filtered Information Model for F S = {Patient, ActAppointment}.

association to such LCP class. The same situation occurs • The ability of the method to withhold non-relevant
for RoleClassAssociative and RoleChoice, which are LCP knowledge (precision)
classes included as auxiliary in the filtered information • The interval between the request being made and the
model of Figure 3. answer being given (time)
Besides, if there are two classes in the filtered information
Precision Analysis
model such that one is an indirect subclass of the other in
the HL7 models, our method creates an IsA relationship A correct method must retrieve the relevant knowledge
between them in the filtered information model (marked according to the user preferences. The precision of a method
as indirect) to indicate such knowledge. Figure 3 shows is defined as the percentage of relevant knowledge presented
that the five subclasses of Participation and the four ones to the user.
of RoleClassAssociative are indirect subclasses by marking In our context, we use the concept of precision applied
those IsA relationships in a light gray color. For the case to HL7 universal domains (specified with D-MIM’s). Each
of RoleChoice, its subclasses are directly connected to it domain contains a main class which is the central point
by means of IsA relationships (marked with ordinary black of knowledge to the users interested in such domain. The
color). other classes presented in the domain conform the relevant
Finally, the filtered information model presented in Fig- knowledge related to the main class.
ure 3 shows information about two HL7 domains: the HL7 professionals interested in a particular domain decide
Scheduling domain and the Account and Billing domain. about the knowledge to incorporate in it through ballots.
By using our filtering method, a user that wanted to know Thus, a common situation for a user is to focus on the main
about patients and appointments discovers that patients are class of a domain and to navigate through the D-MIM to
also related to account events. This way, the user easily can understand its related knowledge.
compose another filter set like FS = {Patient, SubjectOf- To know the precision of our method, we simulate the
AccountEvent} to get more knowledge about them in a new generation of a D-MIM from its main class. We define a
iteration of our method. single-class filter set with such class and set Cmax with
the size of the domain. This way, we will obtain a filtered
VI. E VALUATION information model with the same number of classes as such
Our filtering method and prototype tool provide support domain.
to the task of extracting knowledge from the HL7 models, In one iteration of our method, we obtain two groups
which has normally been done manually or with little of classes within the resulting filtered information model:
support. the relevant classes to the user, that is, the ones that were
Finding a measure that reflects the ability of our method to originally defined in the D-MIM by experts, and the non-
satisfy the user is a complicated task. However, there exists relevant ones. The precision of the result is defined as the
related work [16], [17] about some measurable quantities in fraction of the relevant classes over the total Cmax .
the field of information retrieval that can be applied to our To refine the obtained result, the non-relevant classes are
context: included in a rejection set RS and the method is executed

Precision Precision (Zoom Iterations 1−5)

100

100
q

q q q

q q q q
90

90
q q q q q q q q
q q
80

80
Precision (%)

Precision (%)
q q
70

70
60

60
Medical Records Medical Records
Scheduling Scheduling
q q
Account and Billing q q
Account and Billing
50

50
Laboratory Laboratory
40

40
0 5 10 15 20 25 30 1 2 3 4 5

Iterations Iterations

Figure 4. Precision analysis for HL7 domains.

again taking into account RS. It is expected that the filtered It is expected that as we increase the size of the filter
information result of this step will have a greater precision. set, the time will increase linearly. Our method computes
This manner, at each iteration non-relevant classes to the the distances from each class in the filter set to all the
user are rejected, and we know that in a finite number rest of classes. This computation requires the same time (in
of steps our filtering method will obtain all the classes of average) for each class in the filter set. Therefore, the more
the original domain. The smaller the number of required classes we have in a filter set, the more the time our method
iterations until getting such domain, the better the method. spends in computing distances.
Figure 4 shows the number of iterations needed to reach In our experimentation, we set our prototype tool to apply
the maximum precision for four of the HL7 domains. Note the filtering method several times with an increasing number
that right side of Figure 4 zooms in the first five iterations. of classes in the filter set. The average results for sizes from a
The test reveals that to reach more than 80% of the relevant single-class filter set up to a 40-classes filter set are presented
classes of a domain, only three iterations are required. in Figure 5.
According to the expected use of our method, having
Time Analysis
a filter set FS of 40 classes is not a common situation
It is clear that a good method does not only require (although possible). Sizes of filter sets up to 10 classes
precision, but it also needs to present the results in an are more realistic, in which case the average time does not
acceptable time according to the user. exceed one second.
To find the time spent by our method it is only necessary
to record the time lapse between the request of knowledge, VII. C ONCLUSIONS
i.e. once a filter set FS has been indicated by the user, and
HL7 information models are very large. The wealth of
the receipt of the filtered information model.
knowledge they contain makes them very useful to their
potential target audience. However, the size and the or-
Average Time ganization of these models makes it difficult to manually
extract knowledge from them. This task is basic for the
4.0 improvement of services provided by HL7 affiliates, ven-
q
3.5 q
q

3.0 q
q q q q
q
q
dors and other organizations that use those models for the
q
q q
2.5 q development of health systems.
Time (s)

q q
q
q q

2.0
q
q
q q
q
What is needed is a tool that makes HL7 models more
q
1.5
q
q
q
q q
usable for that task. We have presented a method that
1.0 q
q

0.5 q q q
q q
q
makes it easier to automatically extract knowledge from the
q q

0.0 HL7 models. Input to our method is the set of classes the
0 5 10 15 20 25 30 35 40
user is interested in. The method computes the interest of
each class with respect to that set as a combination of its
Filter Set Size
importance and closeness. Finally the method selects the
most interesting classes from that models, including their
Figure 5. Time analysis for different sizes of F S. defined knowledge in the original models (e.g. associations,

redefinition of associations, IsA relationships). [4] J. Conesa, V. C. Storey, and V. Sugumaran, “Usability of
The experiments we have done clearly show that the upper level ontologies: The case of researchcyc,” Data &
proposed method and its associated tool provides an easier Knowledge Engineering, vol. 69, no. 4, pp. 343–356, 2010.
way to extract knowledge from the models. Concretely, our [5] A. Danko, R. Kennedy, R. Haskell, I. Androwich, P. Button,
prototype tool recovers more than 80% of the knowledge C. Correia, S. Grobe, M. Harris, S. Matney, and D. Russler,
of a D-MIM in three iterations, with an average time per “Modeling nursing interventions in the act class of HL7 RIM
iteration that for common uses does not exceed one second. Version 3,” Journal of biomedical informatics, vol. 36, no.
4-5, pp. 294–303, 2003.
We plan to continue our work along three directions. The
first is to include all HL7 models into our tool to give full [6] J. Lyman, S. Pelletier, K. Scully, J. Boyd, J. Dalton, S. Tro-
support to all HL7 communities. Currently we have four pello, and C. Egyhazy, “Applying the HL7 reference in-
D-MIMs. Experimentation with the full set of models will formation model to a clinical data warehouse,” in IEEE
International Conference on Systems, Man and Cybernetics,
allow us to improve the method.
vol. 5, 2003, pp. 4249–4255.
We also plan to experiment with the latest definition
and nomenclature of HL7 models published by the HL7 [7] OMG, Unified Modeling Language: Superstructure, version
iternational. Basically, it specifies a new level on top of the 2.1.1, Object Modeling Group, February 2007.
RIM model that consists on a domain analysis model (DAM) [8] OMG, Object Constraint Language, version 2.0, Object Mod-
to describe business process and use cases, and a localized eling Group, May 2006.
information model (LIM) in the bottom of the model types
to adapt the R-MIMs to locale-specific requirements for [9] S. Castano, V. De Antonellis, M. G. Fugini, and B. Pernici,
“Conceptual schema analysis: techniques and applications,”
structure and terminology. To take into account these two
ACM Transactions on Database Systems, vol. 23, no. 3, pp.
new models is a challenge that will improve our work. 286–333, 1998.
Finally, another research area to explore consists in gen-
erating traceability links from the elements in the filtered [10] D. L. Moody and A. Flitman, “A methodology for clustering
model to the original models, so that it is easy to find entity relationship models - a human information processing
approach,” in Conceptual Modeling - ER 1999, 18th Inter-
out the origin of each element. Keeping such backward national Conference on Conceptual Modeling, ser. Lecture
links improves the integration of different models in an Notes in Computer Science, vol. 1728. Springer, 1999, pp.
interoperability context. Also, our method and tool imple- 114–130.
menting traceability could be used as an aid in the design
[11] Y. Tzitzikas, D. Kotzinos, and Y. Theoharis, “On Ranking
of implementation guides for HL7 interoperability artifacts RDF Schema Elements (and its Application in Visualiza-
(HL7 V3 messaging and CDA R2 documents). tion),” Journal of Universal Computer Science, vol. 13,
no. 12, pp. 1854–1880, 2007.
ACKNOWLEDGMENT
The authors want to thank the collaboration of Diego [12] Y. Tzitzikas and J.-L. Hainaut, “How to tame a very large
er diagram (using link analysis and force-directed drawing
Kaminker, HL7 Education WG co-chair and HL7 Inter- algorithms),” in Conceptual Modeling - ER 2005, 24th In-
national Mentoring Committee co-chair, Carles Gallego, ternational Conference on Conceptual Modeling, ser. Lecture
current Chair of HL7 Spain, and Dr. Joan Guanyabens, Notes in Computer Science, vol. 3716. Springer, 2005, pp.
former Chair of HL7 Spain. 144–159.
We would also like to thank the people of the GMC
[13] C. Yu and H. V. Jagadish, “Schema summarization,” in VLDB
group for their useful comments to previous drafts of this 2006, 32nd International Conference on Very Large Data
paper. This work has been partly supported by the Ministerio Bases, 2006, pp. 319–330.
de Ciencia y Tecnologá under the project TIN2008-00444,
ı
Grupo Consolidado. [14] X. Yang, C. M. Procopiuc, and D. Srivastava, “Summariz-
ing relational databases,” in VLDB 2009, 35th International
R EFERENCES Conference on Very Large Data Bases, 2009, pp. 634–645.

[1] Health Level Seven International, “HL7 web,” feb 2010. [15] A. Villegas and A. Oliv´ , “On computing the importance of
e
[Online]. Available: http://www.hl7.org entity types in large conceptual schemas,” in Advances in
Conceptual Modeling - Challenging Perspectives, ER 2009
[2] R. Dolin, L. Alschuler, C. Beebe, P. Biron, S. Boyer, D. Essin, Workshops, ser. Lecture Notes in Computer Science, vol.
E. Kimber, T. Lincoln, and J. Mattison, “The HL7 clinical 5833. Springer, 2009, pp. 22–32.
document architecture,” Journal of the American Medical
Informatics Association, vol. 8, no. 6, pp. 552–569, 2001. [16] R. Baeza-Yates and B. Ribeiro-Nieto, Modern Information
Retrieval. Addison Wesley, 1999.
[3] R. Dolin, L. Alschuler, S. Boyer, C. Beebe, F. Behlen,
P. Biron, and A. Shabo, “HL7 clinical document architecture, [17] C. Van Rijsbergen, “Information Retrieval,” Cataloging &
release 2,” Journal of the American Medical Informatics Classification Quarterly, vol. 22, no. 3, 1996. [Online].
Association, vol. 13, no. 1, pp. 30–39, 2006. Available: http://www.dcs.gla.ac.uk/Keith/Preface.html

2010 last papers

Recommended

Recommended

More Related Content

Similar to 2010 last papers

Similar to 2010 last papers (20)

Recently uploaded

Recently uploaded (20)

2010 last papers