Clinical Document Data Warehouse (CD2W)
           Josep Vilalta Marzo a , Diego Kaminker b , Josep M. Picas Vidal c, , M....
4. Create a simple user interface to ease fast queries to the
relevant clinical information about a patient..
5. Restrict ...
In order to guide the development team a table was built with
the main required elements from each primary care center and...
Figure 7- CDA R2 Document for an encounter

                                          5. Technological Implementation

Spain Ministry of Health for Scholarship FIS Dossier P105/230.


[1] HL7 Clinical Document ...
Improving the Usability of HL7 Information Models by Automatic Filtering

                  Antonio Villegas and Antoni O...
Figure 1.   Sample of HL7 RIM refinements related to ActAppointment class

  •   Refined Message Information Model (R-MIM)...
Table I
   Figure 1 also shows the redefinitions of associations                                  T OP -10 M OST I MPORTANT...
the retrieved knowledge. An α > 0.5 benefits importance
against closeness while an α < 0.5 does the opposite. The
default α...
Table II
                                 M OST I NTERESTING CLASSES WITH REGARD TO F S = {Patient, ActAppointment}.

Figure 3.   Filtered Information Model for F S = {Patient, ActAppointment}.

association to such LCP class. The same sit...
Precision                                                                                            Precision (Zoom Itera...
redefinition of associations, IsA relationships).                        [4] J. Conesa, V. C. Storey, and V. Sugumaran, “Us...
Upcoming SlideShare
Loading in …5

2010 last papers


Published on

Last papers published with regard to: UML, HL7 CDA, HL7 RIM, Clinical Datawarehouse and Semantic Interoperability.

Published in: Health & Medicine, Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

2010 last papers

  1. 1. Clinical Document Data Warehouse (CD2W) Josep Vilalta Marzo a , Diego Kaminker b , Josep M. Picas Vidal c, , M. Lluisa Bernard Antoranz d , Cristina Siles c , Rafael Rosa Prat a a Vico Open Modeling S.L.,b Kern Information Technology SRL, c Hospital de la Santa Creu i Sant Pau , d Institut Català de la Salut Abstract Nowadays, there are several on-going projects with the shared goal of consolidation of clinical information, both at the Spain This paper shows the development of the Clinical Document level or at a autonomic community level. Data Warehouse project (CD2W) and its implementation These projects also share a great level of complexity and costs using CDA R2 documents (Clinical Document Architecture to achieve their goals. This small project CD2W aspires to Release 2). help in achieving this consolidation of clinical information The main objectives of this project was to prototype a web with focus on easing the task for the healthcare providers, based portal allowing access to a clinical document repository organizations administering huge data bases with several and a data warehouse with clinical information about patients difficulties to integrate and no reference information model from several medical organizations in Barcelona, Spain (a available. major hospital and 4 primary care centers), giving access to clinical patient information for primary care and leveraging the same standardized information to populate the data Materials and Methods warehouse (secondary use). The project was developed during the first half of 2009 under Hospital de la Santa Creu i Sant Pau is a high complexity the general direction of the Hospital de la Santa Creu i Sant hospital which dates back six centuries, making it the oldest Pau. (HSCSP) hospital in Spain. Healthcare is centered on Barcelona but Due to the prototype nature of the project, the scope was extends to the rest of Catalonia. The center plays a prominent limited to patients with Congestive Heart Disease (CHD) who role in Spain and is internationally renowned. consented the use of their information for clinical research to the HSCSP and with the current vocabularies used by the The hospital has distinguished itself in the healthcare provided providers (ICD9 and other local terminologies). in many fields, making it a reference centre in several The data warehouse was developed using HL7 RIM basic specialties. The center attends over 34,000 admissions each concepts. year and more than 150,000 emergencies. Approximately The project mission was to improve patient care with the use 300,000 people are visited at the ambulatory services annually of global standards, open technology, low exploitation cost and the Day Hospital attends over 60,000 users. There are 71 and ease of use. CDA R2 documents. day hospital beds, 634 hospitalization beds and 19 surgical During this project, we used the SCRUM agile methodology rooms. allowing scalable, progressive and incremental software Teaching and training programmes at the Hospital de la Santa development process. Creu i Sant Pau cover many levels, comprising the UAB (Universitat Autonoma de Barcelona) Faculty of Medicine Teaching Unit, the University School of Nursing, participation Introduction – Business Case in the State Residency Programmes to train specialists, masters and doctorate course, continuing education, etc. In the Solving the questions arising when a patient shows several field of research, Hospital de la Santa Creu i Sant Pau is one clinical issues is one of the greatest challenges for healthcare of the most prominent centers in Spain, as can be appreciated providers. from the volume of papers published and their input factor, the Fast clinical decisions making at the healthcare location number and quality of projects which receive funding and the generates the need to put all relevant and up-to-date clinical grants awarded. The Hospital de la Santa Creu i Sant Pau is information to support the process. governed by the Patronat de la Fundació de Gestió Sanitària Usually, we don't have efficient filtering flagging the critical (FGSHSCSP), a board with representatives from the Regional issues where to focus our attention. We encounter scenarios Catalan Government (Generalitat de Catalunya), the City Hall with data overload demanding a big effort to synthesize useful of Barcelona and the Archbishopric of Barcelona. information, or sparse data demanding the use of imagination The goals for this project were: to connect and reach any conclusion. Another problem is that usually our available clinical 1. Define a scheme to integrate information from disjoint information source is the sole organization supporting the information platforms. healthcare provider. Other information generated by other 2. Implement a process for periodical data and document channels where the patient was attended is usually brought by exchange with minimal workload implications for the primary the patient in several paper formats. Consolidation of relevant care centers. and up to date information from distinct healthcare 3. Generate a clinical data store enabling the coexistence of organizations from different authorities is a pending issue clinical documents and relational data in a longitudinal patient waiting for an agreement from the healthcare authorities and healthcare record.. harmonization of different data base schemes.
  2. 2. 4. Create a simple user interface to ease fast queries to the relevant clinical information about a patient.. 5. Restrict access to clinical information to professionals authorized by the participating organizations (HSP and ICS) 6. Use open technology for design, development and implementation of the data warehouse, minimizing exploitation costs and use global interoperability standards to enable universal access. 7. Evaluate the impact of normalizing data from different organizations and code systems.. 8.Evaluate useability and added value chain of CD2W to the physicians for healthcare decisions 9. Evaluate results of this prototype to study the possible extension of the access to the patients or to other professionals. 10 .Evaluate the results for a new project with a broader scope. Figure 2 – CD2W Domain Analysis Implementation, Methodology and Tools And finally the derivation of a datawarehouse model to store facts, dimensions and supporting standard clinical The project has four main design aspects: documents. 1. Clinical Datawarehouse Design RIM-wise, facts are acts, and main dimensions are 2. Standard Document Design entities and roles and their attributes. 3. Datawarehouse Population Process Design 4. User Interface Design 2. Standard Document Design. 5. Technological implementation. Lets review them: The documents were the 'lingua franca' between disparate systems used by the participating organizations (primary care 1. Clinical Datawarehouse Design centers, Hospital her and discharge system) [1]. The model for CD2W was derived from the RIM base classes We designed two different clinical document templates, one (role,entity,act,participation) [3] following a three step for the evolution note from the primary care centers and one methodology: for the discharge note from the Hospital discharge system. a- Development of a conceptual framework, to be discussed with the CD2W stakeholders, including which were the The templates shared the same information at the header level, measures or facts, and how should the information be but differed in their section contents. classified (dimensions). Since this data warehouse was intended for secondary use, patient and physician information was de-identified [2] (names, identifications and addresses were removed or replaced). The information generated by the local provider applications was transformed using an XSL to clinical documents and this standard documents were processed and stored into the C2DW. We tested our mapping, process and query interface with 16125 clinical documents from the hospital and primary care centers. Figure 1 – CD2W conceptual model b. Then, on a more technical level, a domain analysis model was generated.
  3. 3. In order to guide the development team a table was built with the main required elements from each primary care center and its location inside of the standard clinical document. B. Data Generation Process Figure 4 – Transformation Table Use Cases related to the DW population Login - User validation - Batch load - Stat Processing - Document Processing - Update audit log C. Queries Data access and retrieval from the users − Retrieve app parameters − Query control panel (figure 3) − Query Patient Monitor (figure 4) − Query Encounter Monitor 3. Data warehouse population process − Query Conditions Monitor − Query Healthcare Centers − Query Healthcare Professionals The process to populate the data ware house included several steps. − Query Healthcare Services [At the primary care center] − Query Stats (figure 5,6) 1. Select encounters for patients suffering of CHD with − Query audit logs signed consents. − Browse CDA R2 document (figure 7) 2. Create a basic, shallow XML for each primary care center encounter The following figures illustrate the main user interface forms for [At the data processing center] CD2W: 3. Create a CDA conformant document using the defined mapping for the center, for each instance 4. Populate the CD2W database with the information from each document header, and the document itself. This process was triggered periodically by the primary care centers and HSP. 4. User interface design The user interface design was use-case based. Identified use cases were: A. App Administration Application setup and parametrization − Define healthcare agent − Define healthcare agent type − Define healthcare agent role − Define app parameters − Define service catalog − Define service Figure 3 – CD2W Control Panel
  4. 4. Figure 7- CDA R2 Document for an encounter 5. Technological Implementation The system was implemented using an open source Apache application service, a MySQL database, PHP development environment and the amCharts data analysis and graphing component. Evaluation/Assessment/Lessons Learned The HL7 Reference Information Model was very useful to aid Figure 4 – CD2W Patient Query modeling our specific domain and generate a scheme to integrate all participating applications into an information model. The process for exchange could be established, but we needed to educate the participating centers on the use of CDA R2 and the rationale for asking each piece of information. Nevertheless, we needed to bridge their data model to CDA R2 by providing them with a customized XSL for each center. Coexistence of documents with the relational information needed to explore the embedded information was possible, although we need to test with more volume. The user interface was enough for our pilot users from three centers to achieve their data exploration and verification needs. The use of open technologies and standards was a key Figure 5 – Services for one patient factor to minimize development time. Our normalizing efforts were not finished, we end up using ICD-9 and internal ICS vocabularies. Future Plans Extension of this tool to be accesible to patients and other professionals will be studied, but current policies make very difficult to gain access to patient data, even for this approved project. We also want to explore using native XML open source databases. A project with a greater scope will be studied but the whole approach and the generated model are suitable for other domains. Figure 6- Temporal series for a patient
  5. 5. Acknowledgements Spain Ministry of Health for Scholarship FIS Dossier P105/230. Bibliography [1] HL7 Clinical Document Architecture, Release 2 Robert H Dolin et al JAMIA 2006;13:30-39 doi:10.1197/jamia.M1888 [2] Ley Orgánica 15/1999, de 13 de diciembre, de Protección de Datos de Carácter Personal (BOE núm. 298, de 14-12-1999, pp. 43088-43099) [3] The Design of the HL7 RIM-based Sharing Components for Clinical Information Systems Wei-Yi Yang, Li-Hui Lee, Hsiao-Li Gien, Hsing-Yi Chu, Yi-Ting Chou, and Der-Ming Liou World Academy of Science, Engineering and Technology 53 2009 Contact Josep Vilalta Marzo Francesc Layret 24, Badalona, Barcelona, Spain Email:
  6. 6. Improving the Usability of HL7 Information Models by Automatic Filtering Antonio Villegas and Antoni Oliv´ e Josep Vilalta Services and Information Systems Engineering Department HL7 Education & e-Learning Services Universitat Polit` cnica de Catalunya e HL7 Spain (Health Level Seven International) Barcelona, Spain Barcelona, Spain Email: {avillegas, olive} Email: Abstract—The amount of knowledge represented in the to manually extract knowledge from them. This problem is Health Level 7 International (HL7) information models is very shared by other large models [6]. large. The sheer size of those models makes them very useful Currently, there is a lack of computer support to make for the communities for which they are developed. However, the size of the models and their overall organization makes it those models usable for the goal of knowledge extraction. In difficult to manually extract knowledge from them. this paper, we propose to extract that knowledge by using a We propose to extract that knowledge by using a novel novel filtering method that we have developed, and we show filtering method that we have developed. Our method is based that the use of our prototype implementation of that method on the concept of class interest as a combination of class improves the usability of HL7 information models. importance and class closeness. The application of our method automatically obtains a filtered information model of the whole The structure of the paper is as follows. Section II HL7 models according to the user preferences. We show that introduces the HL7 models and describes the main UML the use of a prototype tool that implements that method and constructs used to build them. Section III describes the produces such filtered model improves the usability of the HL7 concept of class importance and references the methods that models due to its high precision and low computational time. can be used to compute it. Section IV describes the concept Keywords-Usability, Health Level Seven International, HL7, of class interest with respect to a filter set of classes and Models, Filtering, UML explains how to compute it. Section V presents our model filtering method. Section VI evaluates the use of the method I. I NTRODUCTION in the context of the HL7 models. Finally, Section VII The Health Level Seven International (HL7) is a not-for- summarizes the conclusions and points out future work. profit, ANSI-accredited standards developing organization II. HL7 I NFORMATION M ODELS dedicated to providing a comprehensive framework and related standards for the exchange, integration, sharing, and Types of Models retrieval of electronic health information that supports clini- The HL7 information models comprise three types of cal practice and the management, delivery and evaluation of models. Each of the model types is based on the UML, health services [1]. although the concrete notation used differs depending on HL7 develops specifications, the most widely used being the model type. Also, the models differ from each other in a messaging standard that enables disparate healthcare ap- terms of their information content, scope, and intended use. plications to exchange key sets of clinical and administrative The following types of information models are defined: data. The HL7 standard specifications are unified by shared • Reference Information Model (RIM) - The RIM is the reference models of the healthcare and technical domains information model that encompasses the HL7 domain [2], [3]. of interest as a whole. The RIM is a coherent, shared The amount of knowledge represented in the HL7 infor- information model that is the source for the data content mation models is very large and continuously improved. The of all HL7 interoperability artifacts: V2.x messages and sheer size of those models makes them very useful to the XML clinical documents CDA R2 [3]. communities for which they were developed: HL7 interna- • Domain Message Information Model (D-MIM) - A D- tional affiliates with more than fifty HL7 active working MIM is a refined subset of the RIM that includes a set groups (Structured Documents, Clinical Decision Support, of classes, attributes and relationships that can be used Clinical Genomics...), large integrated healthcare delivery to create messages and structured clinical documents networks, government agencies and other organizations that for a particular domain (a particular area of interest use those models for the development of their enterprise in healthcare). There are predefined D-MIMs for a set information architecture of health systems [4], [5]. of over 15 universal domains, such as Accounting and However, the size of HL7 information models and their Billing, Care Provision, Claims and Reimbursement, organization makes it very difficult for those communities and so on.
  7. 7. Figure 1. Sample of HL7 RIM refinements related to ActAppointment class • Refined Message Information Model (R-MIM) - The R- 2) The multiplicities of an association defined between MIM is a subset of a D-MIM that is used to express the RIM classes are strengthened in the subclasses. information content for a message/document or set of 3) The multiplicity of an attribute of a RIM class is messages/documents with annotations and refinements strengthened in a subclass. An optional attribute in that are message/document specific. The content of an a RIM class can be made mandatory or not allowed R-MIM is drawn from the D-MIM for the specific in a subclass. Note that it is not allowed to add new domain in which the R-MIM is used. attributes. R-MIM models refine D-MIM models in the same way. Structure of the HL7 Information Models In all cases, the three kind of refinements can be expressed The RIM, D-MIM and R-MIM models can be analyzed using UML constructs. as if they were built using in a particular way a small subset Figure 1 shows a few refinements related to the ActAppo- of constructs provided by the UML [7]. Figure 1 illustrates intment class. The instances of this class are appointments with a very small fragment of the RIM and of one D-MIM (a particular kind of Act). There may be several kinds of the main UML constructs used. RIM comprises six backbone participations in an appointment. Figure 1 shows only two classes: Act, Participation, Entity, Role, ActRelationship and of them: PerformerOfActAppointment and SubjectOfActAp- RoleLink. Figure 1 shows the first four of these classes. pointment. To indicate that when the act is an appointment Each one has a number of attributes with a defined mul- then the participations must be instances of PerformerOf- tiplicity. Surprisingly, there are only eight main associations ActAppointment or of SubjectOfActAppointment, we redefine between the RIM classes, all of them binary and with their the association Participation-Act as shown in the figure. Note corresponding multiplicities. Figure 1 shows four of these that redefinition is a UML construct, which is very useful in associations. situations like this one. The redefinition of the association Each of the RIM classes has many subclasses, although Role-Participation is similar. The overall semantics of these only a few of them are explicitly shown in the diagrams redefinitions is that the performer of an appointment is a of the HL7 RIM specification. There are many special- Person that plays the role AssignedPerson, and that the ization/generalization relationships (called IsA relationships, subject of an appointment is a Person that plays the role e.g. Organization IsA Entity) in the HL7 models. The num- Patient. ber of RIM classes and subclasses is over 2,500. Figure 1 Sometimes, the UML redefinition construct does not allow shows seven subclasses of four of the backbone RIM classes the graphical representation of the strengthening of asso- and seven IsA relationships. ciation multiplicities. In these cases, the redefinition must D-MIM models refine the RIM in three ways: be formally captured by OCL invariants. For example, in 1) The participants of one of the eight main associations Figure 1, the refinement of act in SubjectOfActAppointment defined between RIM classes are refined in the sub- also implies that an instance of ActAppointment is associated classes. This is the refinement most often used in the with a non-empty set (1..*) of SubjectOfActAppointment. HL7 models. Note that it is not allowed to add new However, this cannot be expressed graphically, and an OCL associations. invariant must be used instead [8].
  8. 8. Table I Figure 1 also shows the redefinitions of associations T OP -10 M OST I MPORTANT C LASSES . player-playedRole and scoper-scopedRole between Entity and Role. The player and the scoper of an AssignedPerson Rank Class Importance and of Patient must be a Person and an Organization, 1 Act 7.51 respectively. 2 Role 5.11 3 ActRelationship 4.03 III. I MPORTANCE OF HL7 C LASSES 4 Participation 3.67 Our filtering method is based on the concept of class 5 Entity 3.5 importance. The importance of a class is a real number that 6 Observation 2.64 measures the relative importance of that class in a model. 7 InfrastructureRoot 1.81 We will see in the next section that we use that importance 8 Organization 1.72 to select which classes are shown to the users. 9 RoleLink 1.59 There exist different kinds of methods to compute the 10 FinancialTransaction 1.54 importance of classes in the literature. The simplest family of methods is that based on occurrence counting [9]–[11], where the importance of a class is equal to the number of The filtering method described in the next sections can characteristics the class has represented in the model. These be used in connection with any of the existing methods for methods are class centered in the sense that the importance computing the importance of classes. of a class depends only in the information the class has. Therefore, the more information about a class, the more IV. I NTEREST OF HL7 C LASSES important it will be. The importance of a class is an absolute metric that Another family of methods are those based in link analysis depends only on the whole set of HL7 models. The metric [11], [12], where the importance of a class is defined as is useful when a user wants to know which are the most a combination of the importance of the classes that are important classes, but it is of little use when the user is connected to it with associations and/or IsA relationships. interested in a specific subset of classes, independently from Such recursive definition results in an equation system and their importance. What is needed then is a metric that indicates that the more important the classes connected measures the interest of a class with respect to such set, to a class are, the more important such class will be. In that we call filter set. these methods the importance is shared through connections, A filter set FS of classes is a non-empty set of classes changing from a class centered philosophy to a more in- from the HL7 models. The filter set comprises the minimum terconnected approach of the importance. Iterative methods set of classes in which a user is interested at a particular are required to solve the importance equation system, which moment. For example, if the user wants to see what is increases the computational cost of this kind of methods. the knowledge the models have about classes Patient and Finally, there are some methods that even use the infor- ActAppointment, then she defines FS = {Patient, ActAppo- mation about the existing instances of the classes and the intment}. We will see in the next section that starting from associations of the model. Therefore, the importance they this filter set, our filtering method retrieves the knowledge compute takes into account the structural part of the model represented in the models about Patient and ActAppointment but also the data that the classes instantiate. The problem that is likely to be of more interest to the user. with this family of instance-dependent methods [13], [14] Additionally, it is possible to define a set of classes not to is that without instances the method cannot be used. be considered in the filtering method. We call such set the As an example, Table I shows the 10 most important rejection set RS. classes of the HL7 models1 computed using the CEntityRank Intuitively, the interest to a user of a class c with respect importance algorithm (see 3.6 of [15]). To compute this to a filter set FS should take into account both the absolute importance, the method takes into account the classes, the importance of c (as explained in the previous section) and IsA relationships between them, the attributes and their a closeness measure of c with regard to the classes in FS. multiplicities, the associations and their multiplicities, the For this reason, we define: association redefinitions and the OCL invariants. The in-depth study of the computation of the importance of classes is beyond the scope of this paper. A review of Φ(c, FS) = α × Ψ(c) + (1 − α) × Ω(c, FS) (1) methods to compute the importance taking into account where Φ(c, FS) is the interest of class c with respect to different levels of knowledge is given in [15]. FS, Ψ(c) the absolute importance of class c, and Ω(c, F S) 1 The results have been obtained taking into account the RIM and the is the closeness of class c with respect to FS. following D-MIM models: Laboratory, Account and Billing, Scheduling Note that α is a balancing parameter in the range [0,1] and Medical Records [1]. to set the preference between closeness and importance for
  9. 9. the retrieved knowledge. An α > 0.5 benefits importance against closeness while an α < 0.5 does the opposite. The default α value is set to 0.5 and can be modified by the user. There may be several ways to compute the closeness Ω(c, FS) of class c with respect to the classes of FS. Intuitively, the closeness of class c should be directly related to the inverse of the distance of c to the filter set FS. For this reason, we define: |FS| Ω(c, FS) = (2) d(c, c ) c ∈F S where |FS| is the number of classes of FS and d(c, c ) is the minimum distance between a class c and a class c belonging to the filter set FS. Intuitively, those classes that are closer to more classes of FS will have a greater closeness Ω(c, FS). We assume that a pair of classes c, c are directly connected to each other if there is a direct association (or redefinition of association) between them or if one class is a direct subclass of the other. For these cases, d(c, c ) = 1. Figure 2. Method Overview. Otherwise, when c, c are not directly connected, d(c, c ) is defined as the length of the shortest path between them traversing associations and/or ascending/descending through class hierarchies. As an example, Table II shows the top-10 classes with a non-empty initial filter set FS. An example a greater value of interest when the user defines FS = {Patient, ActAppointment} and α = 0.5. to obtain knowledge about patient and appoint Results in Table II indicate that included within the top-10 there are classes that are directly connected to HL7 can be FS = {Patient, ActAppointment}. all members of the filter set FS = {Patient, ActAp- In the same way, the user can specify a rejec pointment} as in the case of SubjectOfActAppointment (Ω(SubjectOfActAppointment, FS) = 1.0) but also classes (may be empty) with those classes that have n that are not directly connected to any class of FS (although they are closer). her. V. F ILTERING HL7 I NFORMATION M ODELS In addition to the filter set, the user can decid We have developed a method for filtering large models, knowledge she wants to obtain by indicating of and we have used the HL7 models as a case study for developing and experimenting with the method, and its additional classes (Cmax ) the method has t of associated tool. The method consists of four consecutive steps. The characteristics of each step are detailed below. include in the filtered information model. Figure 2 presents an overview of the method and steps. Apart from that, the user has the possibility to Intuitively, from a small subset of classes selected by the user the method automatically obtains a filtered information importance method (see Section III) wants to b model with knowledge of interest. following step. Also, she can include her prefe Step 1: Setting the User Preferences The first step of the method consists of prepare the closeness and importance by setting a value for t required information to filter the HL7 information models parameter α (see (1) in Section IV). according to the user preferences. Basically, the user focus on a set of classes (filter set) she is interested in and our Note that RS, Cmax , the importance meth method surrounds them with additional knowledge from the parameter α have default values (RS = ∅, HL7 models. Therefore, it is mandatory for the user to select the default importance method is CEntityRa α = 0.5) and therefore are all optional. The user interaction is required only in this
  10. 10. Table II M OST I NTERESTING CLASSES WITH REGARD TO F S = {Patient, ActAppointment}. Rank Class (c) Ψ(c) d(c, Patient) d(c, ActAppointment) Ω(c, F S) Φ(c, F S) 1 SubjectOfActAppointment 0.11 1 1 1.0 0.7003 2 Organization 1.72 1 3 0.5 0.3552 3 Person 1.22 1 3 0.5 0.3537 4 ServiceDeliveryLocation 0.79 2 2 0.5 0.3524 5 AssignedPerson 0.72 2 2 0.5 0.3522 6 ManufacturedDevice 0.55 2 2 0.5 0.3517 7 LocationOfActAppointment 0.26 3 1 0.5 0.3508 8 ReusableDeviceOfActAppointment 0.19 3 1 0.5 0.3506 9 SubjectOfAccountEvent 0.13 1 3 0.5 0.3504 10 AuthorOfActAppointment 0.12 3 1 0.5 0.3503 On the other hand, to compute the closeness Ω(c, FS) of The main goal of this step consists in filtering information an HL7 class with regard to the filter set FS it is required from the whole HL7 information models involving classes to know the minimum distances between classes in the HL7 in the filtered model. To achieve this goal, the method models (see (2) in Section IV). However, it is only necessary explores the associations, redefinitions of associations, and to compute the distance from each class in the filter set to generalization/specialization relationships in the HL7 infor- any class out of FS, which requires a lower computational mation models that are defined between those classes and cost. Note that the method computes the closeness only for includes them in the filtered model to obtain a connected those classes that are out of the filter set. model. The filtered information model for FS = {Patient, ActAppointment} and the previous Interest Set is shown in Step 3: Select Interest Set Figure 3. The third step of the method consists in computing the Our method also takes into account associations that interest (Φ) for each class out of the FS. As previously are specified between superclasses of classes included in shown in (1) of Section IV, the interest Φ(c, FS) of a the filtered information model, and brings them down to candidate class c to be included in the output model is a connect such subclasses. An example of that behaviour is linear combination of the importance Ψ(c) and the closeness the association between Participation and ActAppointment in Ω(c, FS) taking into account the balancing parameter α. Figure 3. Such association is originally defined between Par- Note that if a non-empty rejection set RS was defined ticipation and Act (see Figure 1). Given that, ActAppointment in the first step of our method, those classes included in is a subclass of Act. Such association is descended to the such set will not be considered for the final result nor their context of ActAppointment to indicate that there exists the interest Φ will be computed. connection with Participation although Act was not included The interest Φ produces a sorted ranking of HL7 classes in the Interest Set. and the method selects the top classes of that ranking until reaching the selected limit Cmax specified in the first step. When descending an association there exist the case that We call such set of classes the Interest Set. Second column such association could be repeated. Figure 3 shows the of Table II shows the classes that belong to the Interest association between Participation and ActAppointment. Note Set according to FS = {Patient, ActAppointment} when that Participation is not a member of the Interest Set (see Cmax = 10. Table II). However, Participation has been included in the In case of two or more classes get the same interest our filtered information model as an auxiliary class (marked method is non-deterministic: it might select any of those. in Figure 3 with a light grey color). The rationale is Some enhancements can be done to try to avoid selecting that such association should be descended between each classes in a random manner, like prioritizing the classes of the five subclasses (SubjectOfActAppointment, AuthorOf- with a higher value of closeness or importance (or any other ActAppointment, ReusableDeviceOfActAppointment, Loca- measure) in case of ties. tionOfActAppointment and SubjectOfAccountEvent) of Par- ticipation present in the Interest Set and ActAppointment Step 4: Compute Filtered Information Model which is not an UML compliant situation. Finally, the last step of the method obtains the Interest Set To avoid repeated associations our method finds the lowest of classes from the previous step and puts it together with common parent (LCP) for the previous subclasses, which the classes of the filter set FS in order to create a filtered in this case is Participation, includes it in the filtered information model with the classes of both sets. information model as an auxiliary class, and descends the
  11. 11. Figure 3. Filtered Information Model for F S = {Patient, ActAppointment}. association to such LCP class. The same situation occurs • The ability of the method to withhold non-relevant for RoleClassAssociative and RoleChoice, which are LCP knowledge (precision) classes included as auxiliary in the filtered information • The interval between the request being made and the model of Figure 3. answer being given (time) Besides, if there are two classes in the filtered information Precision Analysis model such that one is an indirect subclass of the other in the HL7 models, our method creates an IsA relationship A correct method must retrieve the relevant knowledge between them in the filtered information model (marked according to the user preferences. The precision of a method as indirect) to indicate such knowledge. Figure 3 shows is defined as the percentage of relevant knowledge presented that the five subclasses of Participation and the four ones to the user. of RoleClassAssociative are indirect subclasses by marking In our context, we use the concept of precision applied those IsA relationships in a light gray color. For the case to HL7 universal domains (specified with D-MIM’s). Each of RoleChoice, its subclasses are directly connected to it domain contains a main class which is the central point by means of IsA relationships (marked with ordinary black of knowledge to the users interested in such domain. The color). other classes presented in the domain conform the relevant Finally, the filtered information model presented in Fig- knowledge related to the main class. ure 3 shows information about two HL7 domains: the HL7 professionals interested in a particular domain decide Scheduling domain and the Account and Billing domain. about the knowledge to incorporate in it through ballots. By using our filtering method, a user that wanted to know Thus, a common situation for a user is to focus on the main about patients and appointments discovers that patients are class of a domain and to navigate through the D-MIM to also related to account events. This way, the user easily can understand its related knowledge. compose another filter set like FS = {Patient, SubjectOf- To know the precision of our method, we simulate the AccountEvent} to get more knowledge about them in a new generation of a D-MIM from its main class. We define a iteration of our method. single-class filter set with such class and set Cmax with the size of the domain. This way, we will obtain a filtered VI. E VALUATION information model with the same number of classes as such Our filtering method and prototype tool provide support domain. to the task of extracting knowledge from the HL7 models, In one iteration of our method, we obtain two groups which has normally been done manually or with little of classes within the resulting filtered information model: support. the relevant classes to the user, that is, the ones that were Finding a measure that reflects the ability of our method to originally defined in the D-MIM by experts, and the non- satisfy the user is a complicated task. However, there exists relevant ones. The precision of the result is defined as the related work [16], [17] about some measurable quantities in fraction of the relevant classes over the total Cmax . the field of information retrieval that can be applied to our To refine the obtained result, the non-relevant classes are context: included in a rejection set RS and the method is executed
  12. 12. Precision Precision (Zoom Iterations 1−5) 100 100 q q q q q q q q 90 90 q q q q q q q q q q 80 80 Precision (%) Precision (%) q q 70 70 60 60 Medical Records Medical Records Scheduling Scheduling q q Account and Billing q q Account and Billing 50 50 Laboratory Laboratory 40 40 0 5 10 15 20 25 30 1 2 3 4 5 Iterations Iterations Figure 4. Precision analysis for HL7 domains. again taking into account RS. It is expected that the filtered It is expected that as we increase the size of the filter information result of this step will have a greater precision. set, the time will increase linearly. Our method computes This manner, at each iteration non-relevant classes to the the distances from each class in the filter set to all the user are rejected, and we know that in a finite number rest of classes. This computation requires the same time (in of steps our filtering method will obtain all the classes of average) for each class in the filter set. Therefore, the more the original domain. The smaller the number of required classes we have in a filter set, the more the time our method iterations until getting such domain, the better the method. spends in computing distances. Figure 4 shows the number of iterations needed to reach In our experimentation, we set our prototype tool to apply the maximum precision for four of the HL7 domains. Note the filtering method several times with an increasing number that right side of Figure 4 zooms in the first five iterations. of classes in the filter set. The average results for sizes from a The test reveals that to reach more than 80% of the relevant single-class filter set up to a 40-classes filter set are presented classes of a domain, only three iterations are required. in Figure 5. According to the expected use of our method, having Time Analysis a filter set FS of 40 classes is not a common situation It is clear that a good method does not only require (although possible). Sizes of filter sets up to 10 classes precision, but it also needs to present the results in an are more realistic, in which case the average time does not acceptable time according to the user. exceed one second. To find the time spent by our method it is only necessary to record the time lapse between the request of knowledge, VII. C ONCLUSIONS i.e. once a filter set FS has been indicated by the user, and HL7 information models are very large. The wealth of the receipt of the filtered information model. knowledge they contain makes them very useful to their potential target audience. However, the size and the or- Average Time ganization of these models makes it difficult to manually extract knowledge from them. This task is basic for the 4.0 improvement of services provided by HL7 affiliates, ven- q 3.5 q q 3.0 q q q q q q q dors and other organizations that use those models for the q q q 2.5 q development of health systems. Time (s) q q q q q 2.0 q q q q q What is needed is a tool that makes HL7 models more q 1.5 q q q q q usable for that task. We have presented a method that 1.0 q q 0.5 q q q q q q makes it easier to automatically extract knowledge from the q q 0.0 HL7 models. Input to our method is the set of classes the 0 5 10 15 20 25 30 35 40 user is interested in. The method computes the interest of each class with respect to that set as a combination of its Filter Set Size importance and closeness. Finally the method selects the most interesting classes from that models, including their Figure 5. Time analysis for different sizes of F S. defined knowledge in the original models (e.g. associations,
  13. 13. redefinition of associations, IsA relationships). [4] J. Conesa, V. C. Storey, and V. Sugumaran, “Usability of The experiments we have done clearly show that the upper level ontologies: The case of researchcyc,” Data & proposed method and its associated tool provides an easier Knowledge Engineering, vol. 69, no. 4, pp. 343–356, 2010. way to extract knowledge from the models. Concretely, our [5] A. Danko, R. Kennedy, R. Haskell, I. Androwich, P. Button, prototype tool recovers more than 80% of the knowledge C. Correia, S. Grobe, M. Harris, S. Matney, and D. Russler, of a D-MIM in three iterations, with an average time per “Modeling nursing interventions in the act class of HL7 RIM iteration that for common uses does not exceed one second. Version 3,” Journal of biomedical informatics, vol. 36, no. 4-5, pp. 294–303, 2003. We plan to continue our work along three directions. The first is to include all HL7 models into our tool to give full [6] J. Lyman, S. Pelletier, K. Scully, J. Boyd, J. Dalton, S. Tro- support to all HL7 communities. Currently we have four pello, and C. Egyhazy, “Applying the HL7 reference in- D-MIMs. Experimentation with the full set of models will formation model to a clinical data warehouse,” in IEEE International Conference on Systems, Man and Cybernetics, allow us to improve the method. vol. 5, 2003, pp. 4249–4255. We also plan to experiment with the latest definition and nomenclature of HL7 models published by the HL7 [7] OMG, Unified Modeling Language: Superstructure, version iternational. Basically, it specifies a new level on top of the 2.1.1, Object Modeling Group, February 2007. RIM model that consists on a domain analysis model (DAM) [8] OMG, Object Constraint Language, version 2.0, Object Mod- to describe business process and use cases, and a localized eling Group, May 2006. information model (LIM) in the bottom of the model types to adapt the R-MIMs to locale-specific requirements for [9] S. Castano, V. De Antonellis, M. G. Fugini, and B. Pernici, “Conceptual schema analysis: techniques and applications,” structure and terminology. To take into account these two ACM Transactions on Database Systems, vol. 23, no. 3, pp. new models is a challenge that will improve our work. 286–333, 1998. Finally, another research area to explore consists in gen- erating traceability links from the elements in the filtered [10] D. L. Moody and A. Flitman, “A methodology for clustering model to the original models, so that it is easy to find entity relationship models - a human information processing approach,” in Conceptual Modeling - ER 1999, 18th Inter- out the origin of each element. Keeping such backward national Conference on Conceptual Modeling, ser. Lecture links improves the integration of different models in an Notes in Computer Science, vol. 1728. Springer, 1999, pp. interoperability context. Also, our method and tool imple- 114–130. menting traceability could be used as an aid in the design [11] Y. Tzitzikas, D. Kotzinos, and Y. Theoharis, “On Ranking of implementation guides for HL7 interoperability artifacts RDF Schema Elements (and its Application in Visualiza- (HL7 V3 messaging and CDA R2 documents). tion),” Journal of Universal Computer Science, vol. 13, no. 12, pp. 1854–1880, 2007. ACKNOWLEDGMENT The authors want to thank the collaboration of Diego [12] Y. Tzitzikas and J.-L. Hainaut, “How to tame a very large er diagram (using link analysis and force-directed drawing Kaminker, HL7 Education WG co-chair and HL7 Inter- algorithms),” in Conceptual Modeling - ER 2005, 24th In- national Mentoring Committee co-chair, Carles Gallego, ternational Conference on Conceptual Modeling, ser. Lecture current Chair of HL7 Spain, and Dr. Joan Guanyabens, Notes in Computer Science, vol. 3716. Springer, 2005, pp. former Chair of HL7 Spain. 144–159. We would also like to thank the people of the GMC [13] C. Yu and H. V. Jagadish, “Schema summarization,” in VLDB group for their useful comments to previous drafts of this 2006, 32nd International Conference on Very Large Data paper. This work has been partly supported by the Ministerio Bases, 2006, pp. 319–330. de Ciencia y Tecnolog´a under the project TIN2008-00444, ı Grupo Consolidado. [14] X. Yang, C. M. Procopiuc, and D. Srivastava, “Summariz- ing relational databases,” in VLDB 2009, 35th International R EFERENCES Conference on Very Large Data Bases, 2009, pp. 634–645. [1] Health Level Seven International, “HL7 web,” feb 2010. [15] A. Villegas and A. Oliv´ , “On computing the importance of e [Online]. Available: entity types in large conceptual schemas,” in Advances in Conceptual Modeling - Challenging Perspectives, ER 2009 [2] R. Dolin, L. Alschuler, C. Beebe, P. Biron, S. Boyer, D. Essin, Workshops, ser. Lecture Notes in Computer Science, vol. E. Kimber, T. Lincoln, and J. Mattison, “The HL7 clinical 5833. Springer, 2009, pp. 22–32. document architecture,” Journal of the American Medical Informatics Association, vol. 8, no. 6, pp. 552–569, 2001. [16] R. Baeza-Yates and B. Ribeiro-Nieto, Modern Information Retrieval. Addison Wesley, 1999. [3] R. Dolin, L. Alschuler, S. Boyer, C. Beebe, F. Behlen, P. Biron, and A. Shabo, “HL7 clinical document architecture, [17] C. Van Rijsbergen, “Information Retrieval,” Cataloging & release 2,” Journal of the American Medical Informatics Classification Quarterly, vol. 22, no. 3, 1996. [Online]. Association, vol. 13, no. 1, pp. 30–39, 2006. Available: