Abstract. Enhancing the outputs of the Clinical Decision Support systems (CDS) is a permanent concern for many research communities, which have to deal with an abundance of entities, data, structures, methods, application, tools, and so on. In the few past decades, there were theorized and standardized tech- nologies that could help researchers to obtain better results. The paper presents a method to enrich the inputs of the CDS through a semantic integration of sev- eral medical knowledge sources, by using the Topic Maps standard, in order to obtain more refined medical recommendations. Future research directions and challenges are summarized and conclusions are issued.
2. 338 D. Dragu, V. Gomoi, and V. Stoicu-Tivadar
patient” [4]. This data model is about to be redefined within the current research ac-
tivity, by using a semantic technology, trying to obtain an enhanced integration of
sources.
Data, information and knowledge have to be organized and represented in such a
manner that both, human and computer, to be able to understand their meaning. The
Topic Maps (TM) technology is the perfect solution for this issue, being able to con-
ceptualize and represent any subject [5] in a computer-understandable and also a hu-
man-understandable way. TM is an ISO standard, ISO/IEC 13250:2003 Topic Maps,
and it was designed to give users the possibility to represent their knowledge and
rapidly find information they need. It was developed “as an answer to the problem of
how to automate merging of (digital) back-of-books indexes” [6].
The methods and technologies used within this approach should provide the
achievement of the desired results, allowing an enhanced interoperability not only for
CDS systems, but for a lot of other healthcare information systems which rely on the
vMR data model to manage their own informational structures.
2 Mapping vMR to TM
2.1 Technologies - Theoretical Considerations
Finding a common data model for organizing medical knowledge was the main con-
cern of the research activity. The requirements for that data model stated that it should
be able to represent a collection of informational constructs used in Clinical Decision
Support systems, and also to let this collection be updated with information from
various sources. Virtual Medical Record (vMR) represents a solution provided by the
HL7 organization, and even if it is not a standard yet, the probability for this model to
be standardized is high [7].
The analysis of the most important semantic technologies revealed that TM stan-
dard could be eligible to represent such a model. By declaring the identity of the sub-
jects, users have the opportunity to link the formal representation of the vMR data
model to other data and knowledge sources [5]. There are other reasons that led to the
decision to use TM technology:
• the flexibility, by offering an open vocabulary;
• the ability to quickly find information, by offering “very good support for full-text
searching, complex queries, and also providing an excellent basis for natural lan-
guage querying” [8];
• the extensibility, through the properties of merging distinct topic maps and topics;
• the ability to acquire and represent knowledge, by making computer understand it.
We can say that TM is a technology able to acquire, represent, store, manage and
retrieve knowledge. Besides all these, by linking its internal constructs with external
references (Fig. 1), a topic map provides an opportunity to connect it with other topic
maps, and even with other representations of knowledge that use other semantic tech-
nologies. Also, it can be concluded that it offers a possibility to widely spread the
contained knowledge.
3. Achieving Semantic Integration of Medical Knowledge 339
Fig. 1. The TAO1
of Topic Maps [6]
2.2 Tools
The applicative part of the current research activity tries to implement the specifica-
tions of the technologies previously presented, seeking to obtain a high level of medi-
cal data integration. The whole system consists of two distinct applications
working together: one, which has to deal with knowledge organization, and the other
one, which is responsible for the conversion of the incoming informational
constructs into new representations in compliance with the Topic Maps Data Model
(TMDM) [5].
The tool used to represent vMR in accordance with TM specifications is Topincs,
“a software for rapid development of web databases”, which provides the possibility
to represent any domain “by modeling a TM schema” (Fig. 2). It also gives develop-
ers and users opportunities “to transform, aggregate, and modify the data”[9]. Other
reasons to chose Topincs are: “any technological jargon is hidden from the user” [9],
is a solution on top of the AMP (short from Apache, MySQL, PHP) stack, is an open
source, and well done documentation is provided. Topincs is not just a TM engine; it
also gives users the possibility to control the behavior of their applications through
PHP scripts. These are the most important concepts used in the programming process:
tobject – PHP object for accessing topics, domain classes and services, triggers and
access filters. The Topincs application includes an under development web database
which will be enriched with a package of services as the support for the connection
with the C# application.
1
Short from Topic-Association-Occurrence.
4. 340 D. Dragu, V. Gomoi, and V. Stoicu-Tivadar
Fig. 2. TM schema for vMR data model
Topincs is an application running on top of Apache 2, MySQL and PHP 5, these
being some other tools used in this project. Since PHP syntax is used to model the
output of the database, it is obvious that the system lies on a server.
C# programming language under .NET platform will be used to develop the me-
diator, a conversion module, which has to be able to seek and identify the representa-
tions of the same subject within files written in accordance with different data models.
A set of conversion algorithms is about to be developed and this work will be part of a
subsequent paper. In this context, special attention is given to data validation, to en-
sure the correctness of the resulted TM constructs, which have to match to the data
type specified in the TM schema.
2.3 Implementation
System Overview
After initial phase of the research activity, some questions regarding the implementa-
tion of this approach raised:
• what is the best way to represent vMR terms with TM?
• what algorithms will drive the conversion processes?
• what are the tools about to be used?
• is there any chance to fulfill the initial requirements?
• how many efforts the whole project will require?
• why should developers use the final product?
• is there any simple way to do the same thing?
5. Achieving Semantic Integration of Medical Knowledge 341
Answering these questions we found that this research field requires knowledge from
different domains and will consume a lot of human resources. Also, there are other
possibilities to do the same, but this approach seems to better fit to the needs of the
domain, following the current trend in information and computer technology and
pushing healthcare to a step further. Afterwards, we decided to design and develop an
application which relies on a web database, created and maintained with Topincs, and
a conversion module, written in C#, that should ensure the automatic process of ac-
quiring data from different types of sources (Fig. 3). In the next paragraphs within the
current section, there will be shortly described some of the past steps, the current
activity and some directions for the further development.
Fig. 3. System overview
The main objective of the current research is to improve the integration of medical
data by using TMDM. The immediate requirement for the new data model is the easi-
ness to be assimilated by the users familiar with the original data model, without hav-
ing to know TMDM, only in case they want to change the behavior or to rewrite the
data model. So, it assumes that a high level of usability is ensured for the new data
model. In order to meet these objectives, a list of requirements was written:
• the result of the data translation between models can be determined;
• a common vocabulary must be developed;
• data can be translated from one model into the other and vice versa;
• the result of data translation from one model into the other and vice versa is an
informational construct with the same content and semantics;
• the translation guide uses common terms of the two models;
• the limits of the translation mechanism will be specified;
We must admit that other requirements may become imperious over the research
process.
Creating TM Ontology
The first step in the development of the topic map is to elaborate the TM ontology and
for this purpose we used the methodology described in [10]. The most important ac-
tions were to identify the boundaries of the domain, to discover and collect the rele-
vant documentation. The latest informative ballot regarding the data analysis model
for vMR was downloaded, and there were identified the most eligible tools to be used.
The main concern for the second step, the analysis phase, was to underline the as-
pects involved in the process of using a common data model for CDS, and to find the
6. 342 D. Dragu, V. Gomoi, and V. Stoicu-Tivadar
list of questions that could be addressed to the topic map. Also, there were defined the
concepts within the domain and the relationships between them. One important re-
quirement for this project was to use, for the new TM constructs, the same denomina-
tion for as many vMR terms as possible, in order to obtain a rapid integration of the
new data model.
The third step was to sketch the TM ontology, by listing the new concepts and set-
ting up the topic types, association types, occurrence types and some examples of
instances for those concept types. All these concepts form a TM ontology, as it was
defined in [10]. Based on this TM ontology and by using Topincs, a web database was
designed and developed. To a better understanding of this part of work, a brief intro-
duction on some of the concepts we used is presented in the subsequent two para-
graphs. Also, it has to be specified that we are talking about two different activities:
the design process of the TM ontology and the conception of a TM schema.
An instance of the TM technology is called topic map, and it may exists in various
forms: text files, xml files, databases or even stored in human mind. For the current
application we decided to manage data, information and knowledge with the help of a
web database. The development process of that web database, started with the crea-
tion of a schema. This means that all topic types, association types, role types, and
occurrence types were represented following the rules stated in the TMDM, and con-
straints were defined in accordance with The Topic Maps Constraint Language
(TMCL) [11]. The result is the TM schema of the represented domain, and it controls
the behavior of the whole database.
In order to achieve the requirements of the project, a package of Topincs services
is intended to be written. These services are PHP parameterized scripts and they mod-
el the response of the database, allowing custom queries and further automation of the
integration processes within the current project. In order to develop Topincs services,
a PHP object called tobject is used to gain access to the topics. According to the
TMDM specification, a topic is a conceptualization of a subject, which “can be can be
anything whatsoever, regardless of whether it exists or has any other specific charac-
teristics, about which anything whatsoever may be asserted by any means whatsoev-
er” [5]. The tobject has a virtual programming interface, given based on the
constraints within the TM schema, and the serialization names, these being formal
denominations which are set by the designer of the topic map for the previously men-
tioned types within the TM schema. The programming interface of the tobject exposes
“methods that make sense given the constraints of the topic type” [12]. New custom
methods can be added to the programming interface, by the programmers, using do-
main classes. All these features give programmers and knowledge engineers the op-
portunity to rapidly develop powerful applications, based on a very flexible data
model.
The next step in the development of the topic map is to refine it, and, in this case,
this activity started with the creation of the first definitions and classifications, and it
will last throughout whole of the implementation process. The immediate actions
within the research activity are: to populate the topic map, and to design and develop
the integrating algorithms that will drive the supervised and/or automatic acquisition
process.
7. Achieving Semantic Integration of Medical Knowledge 343
Integrating Knowledge
To ensure the automation of the data acquisition process, a mediator is under devel-
opment. It has to identify and convert into TM as many pieces of information as poss-
ible from data models that use XML-based syntax (Fig. 4).
The application waits for XML files to be sent to the server and checks if their tags
have any known identifiers or if they correspond to any of the predefined templates.
The database has to be able to provide identifiers for all elements of the topic map
ontology, and these will be further used to define and/or verify the identity of ele-
ments from the incoming messages. Identifying the subjects of representations is a
process that has not only to comply with the TMDM specifications, but to ensure the
automatic data fluxes between different types of information representations and the
vMR-TM. Also, while the list of the subject identifiers grows, the chances to discover
new information fragments about represented subjects increase, and this should con-
duct to a more accurate response at the requirements of the users from CDS domain.
Fig. 4. Example of a potential incoming xml file
In case of any tags match to exposed identifiers, the attributes are verified to see if
their subjects have an already defined topic. If topics about the same subject exist,
they are updated, and else, they will be created. In case there is no matching, the user
interface provides a way to supervise the identification process for unknown terms
(Fig. 5). This should become deprecated as the development of the identification
algorithms advances.
8. 344 D. Dragu, V. Gomoi
The identification proces
described in TMDM, and th
TMDM stipulates that “an
and defines this concept as
map in an attempt to unam
human being” [5]. We can
subject and the subject ide
that, attaching an URI to a
tion to follow any of the co
discover and merge the dist
TMDM, the result of the
properties of two topics rep
redundant informational co
location. Also, it is not a cr
URI does not work or even
the merging process will wo
Fig
i, and V. Stoicu-Tivadar
ss firstly relies on the identification methods of topic ite
his capability will be enhanced with other algorithms. T
ny information resource can become a subject indicato
s “an information resource that is referred to from a to
mbiguously identify the subject represented by a topic t
say that the subject indicator may be a web page about t
ntifier is the URI of that web page. It has to be specif
topic as subject identifier does not imply the TM appli
ontained URIs, but to compare them as a string, in orde
tinct representations of the same subjects. According to
merging process is a single topic, which aggregates
presenting the same subject, and has the role to elimin
nstructs and, also, to get all data about a subject at a sin
ritical event in a topic map if the web page at the specif
n does not exist. The unique identification can be done
ork fine.
g. 5. Supervised identification of a tag
ems
The
or”,
opic
to a
that
fied
ica-
er to
the
the
nate
ngle
fied
and
9. Achieving Semantic Integration of Medical Knowledge 345
There are still issues with the unique identification of a subject, starting with deli-
berate or accidental actions that may occur over this process, and finishing with the
misunderstanding of the represented domain, but these could be managed and mostly
solved by following the recommendations of the Organization for the Advancement
of Structured Information Standards (OASIS) about the use of the published subject
(Fig. 6). The document lists the requirements, recommendations and methods used to
adopt published subject indicators (PSIs) and published subject identifiers (PSIDs),
providing the directions to follow in order to benefit of ”an open, scaleable, URI-
based method of identifying subjects of discourse” [13.].
Fig. 6. Using a PSI to identify a subject [13]
3 Further Work and Expected Results
The future research focuses on the completion of TM schema, according with vMR
specifications, and on the design of services, in order to meet project requirements.
Based on these services and to support the automation of information acquisition,
several templates will be written, expecting a better exchange of informational fluxes.
The conversion module, written in C#, is designed to identify the conceptualiza-
tions of the same subject into different data models, and represent them in accordance
with TMDM, by using the new defined vocabulary and previously mentioned servic-
es. In this stage of development, the application has limits that should be considered
until the completion of the identification algorithms. In this context, a new method for
the subject identification process is intended to be developed, especially designed to
fit on traditional RDBMS and eHR systems. This work should extend the identifica-
tion features provided by TM standard with methods for supervised identification,
which rely on probabilistic calculus made on some interchange formats and data
models relevant for healthcare.
10. 346 D. Dragu, V. Gomoi, and V. Stoicu-Tivadar
The expected results of the current research activity are:
• the achievement of a flexible data model which can be used to obtain accurate
medical recommendations;
• the development of a set of conversion algorithms which will allow automatic ac-
quisition and integration of medical data from TM instances and other data models;
4 Conclusions
The novelty of this approach lies in the design of a new vocabulary based on the vMR
data model and according with TMDM, allowing the further integration of virtual any
kind of (digital) medical data, information and knowledge. The use of the resulted
topic map is as medical knowledge base, which links its content to relevant resources
on the web. It allows users to navigate through a vast medical knowledge base and
represent their own medical ontologies. It also can be viewed as an interface between
different types of healthcare data models and the CDS systems, whose inference en-
gines could use the content of the topic map to extract highly accurate medical rec-
ommendations.
The package of services enriches the application with some custom capabilities,
and also provides a picture of how to programmatically model the response of a topic
map developed with Topincs.
Comparing with other approaches, this relies on a semantic technology with an
open vocabulary and n-ary associations, which translates into a high power of
representing things. Some references indicate that similar technologies to TM could
be: concept maps, mind maps, or RDF, but a closer look suggests that they are rather
different. TM can be viewed as an envelope for almost all other knowledge represen-
tation and classification technologies, being able to represent from simple classifica-
tions, indexes, taxonomies, thesauri, folksonomies, to highly complex ontologies [15].
Because of its special characteristics, and besides the issues involved by the creation
of ontological representations, TM seems to be one of the most eligible to acquire,
preserve, represent, manage and disseminate knowledge. TM is a powerful data mod-
el that provides flexibility, extensibility, understandability, findability, integration and
assimilation to the represented domains.
Like other knowledge representation methods, this standard has not only benefits,
but issues, like how to find the most important subjects in targeted domains and how
to conceptualize those subjects, or the need of human resources for hand coding and
maintenance. It is also hardly recommended that any ontological assertion to be made
or supported by specialists of the field represented.
Acknowledgement. “This work was partially supported by the strategic grant
POSDRU 107/1.5/S/77265, inside POSDRU Romania 2007-2013 co-financed by the
European Social Fund – Investing in People, and the strategic grant
POSDRU/88/1.5/S/50783, Project ID50783 (2009), co-financed by the European
Social Fund – Investing in People, within the Sectorial Operational Programme
Human Resources Development 2007-2013.”
11. Achieving Semantic Integration of Medical Knowledge 347
References
1. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web: A new form of Web content
that is meaningful to computers will unleash a revolution of new possibilities. Scientific
American (2001)
2. Health Level Seven International, Inc., http://www.hl7.org
3. Foemig, F., Blobel, B.: Semantic Interoperability between Health Communication Stan-
dards through Formal Ontologies. In: Proceedings of MIE 2009 – The XXII-nd Interna-
tional Congress of the European Federation for Medical Informatics, pp. 200–204 (2009),
doi:10.3233/978-1-60750-044-5-26
4. Kawamoto K.: Virtual Medical Record (vMR) for Clinical Decision Support– Domain
Analysis Model – HL7 Project #184 Informative Ballot (2011),
http://wiki.hl7.org/images/7/71/
HL7vMR_vMR_Domain_Analysis_Model_Release_1.pdf
(accessed May 27, 2012)
5. ISO 13250: Topic Maps, Topic Maps — Data Model,
http://www.isotopicmaps.org/sam/sam-model/2008-06-03/(accessed
May 29, 2012)
6. Pepper, S.: Topic Maps, 3rd edn. Encyclopedia of Library and Information Sciences
(2009), doi:10.1081/E-ELIS3-120044331
7. Kawamoto, K., Del Fiol, G., Strasberg, H.R., Hulse, N., Curtis, C., Cimino, J.J., et al.:
Multi-national, multi-institutional analysis of clinical decision support data needs to inform
development of the HL7 Virtual Medical Record standard. In: AMIA Annu. Symp. Proc.
2010, pp. 377–381 (2010)
8. Presutti, V., Garshol, L.M., Vitali, F., Pepper, S., Gessa, N.: Towards the definition of
guidelines for RDF and Topic Maps interoperability. In: Proceedings of the 5th Interna-
tional Workshop on Knowledge Markup and Semantic Annotation, SEMANNOT 2005,
Colocated with the 4th International Semantic Web Conference, ISWC 2005, CEUR-WS
Proceedings, Galway, Ireland, vol. 185, pp. 83–88 (2005)
9. Cerny, R.: Topincs: A Software for Rapid Development of Web Databases. In: Proceed-
ings of the International Conference on Knowledge Management and Information Sharing,
KMIS 2011, pp. 187–194. SciTePress (2011) ISBN 978-989-8425-81-2
10. Garshol, L.M.: Towards a Methodology for Developing Topic Maps Ontologies. In:
Maicher, L., Sigel, A., Garshol, L.M. (eds.) TMRA 2006. LNCS (LNAI), vol. 4438, pp.
20–31. Springer, Heidelberg (2007)
11. ISO/IEC JTC1/SC34, Information Technology - Document Description and Processing
Languages, Topic Maps Constraint Language,
http://www.itscj.ipsj.or.jp/sc34/open/1053.pdf
(accessed May 23, 2012)
12. Topincs manual, http://www.cerny-online.com/topincs/manual/
programming (accessed May 26, 2012)
13. Pepper, S.: Published Subjects: Introduction and Basic Requirements, OASIS TC Recom-
mendation (2003),
http://www.oasis-open.org/committees/download.php/3050/
pubsubj-pt1-1.02-cs.pdf (accessed May 21, 2012)
14. Park, J., Hunting, S.: XML Topic Maps: Creating and Using Topic Maps for the Web, 2nd
edn., p. 25. Addison-Wesley Professional (2003) ISBN 0-201-74960-2
15. Garshol, L.M.: Metadata? Thesauri? Taxonomies? Topic Maps! Making Sense of it all.
Journal of Information Science 30(4), 378–391 (2004)