Pcori2013 (23)

Platform for Patient Centric Collaborative
Research
Dadong Wan, Sophia Cao, Karthik Gomadam,
Accenture Technology Labs, San Jose, CA.
{ dadong.wan, sophia.cao, karthik.gomadam }@accenture.com

1 Abstract
The Affordable Care Act is perhaps the most significant “face-lift” in the U.S.
healthcare system since the introduction of Medicare and Medicaid. Key focus
areas of ACA include evidence based care and pay for performance. Patient
engagement is at the heart of both of these focus areas. However, finding relevant
patients to engage with medical providers is an important challenge. In this
paper, we describe our solution to alleviate this problem that leverages patient
data avaialble in online health communities and seeks to match the patients in
these communities for relevant projects. Our solution can be applied to data
from any patient community and patients can engage with researchers from
within the communities they are already a part of. We believe that this approach
will help researchers find highly relevant patients and will enable patient centric,
dynamic and responsive research.

2 Introduction
The Affordable Care Act is perhaps the most significant “face-lift” in the U.S.
healthcare system since the introduction of Medicare and Medicaid. Key focus
areas of ACA include evidence based care and pay for performance. Patient en-
gagement is at the heart of both of these focus areas. For example, researchers
who want to study the effectiveness of levetiracetam, lamotrigine, or oxcar-
bazepine on pediatric epilepsy patients should engage with the patients and
their caregivers. Measuring this would allow them to validate their care plan
process for helping patients manage their conditions as well as that of their treat-
ment plans. Having these validations will help providers analyze and optimize
their performance in the pay for performance age. However, finding relevant
patients to engage with medical providers is a non-trivial problem. In the above
example, providers will need to recruit patients who are children, have epilepsy,
and are prescribed levetiracetam, lamotrigine, or oxcarbazepine.
In this paper, we propose a solution to address this problem using the data
from online health communities. Our experience in the past when we developed

1

applications to match patients and clinical trial investigators had proved to us
that patients will not flock to recruitment platforms and any meaningful solution
should /emphfish where the fishes are. We realized that patient communities
such as PatientsLikeMe and Medhelp have millions of patients who are sharing
information about their medical conditions, medications, and their experience
in managing their conditions. We developed a solution that takes advantage of
this patient data, allowing researchers to find patients from these communities.
We apply semantic and text mining algorithms to analyze patient conversations
in these communities to build rich patient profies that captures their medical
conditions, medications, and demographic information. We build similar profiles
for research projects (listed at PCORI.org). We then match and rank the project
and the patient profiles to find the most relevant patients.
One challenge in matching projects with patients based on patient conver-
sations is the difference in the ways in which different participants (researchers,
patients, caregivers) describe the same thing. For example, a researcher will use
diabetes mellitus while a patient might say type 2. Using semantic Web tech-
nologies (UMLS ontologies, OpenCalais entity extractor, semantic type match-
ing) allows us to overcome this problem.
We have prototyped our approach (available at: http://bit.ly/pccr_acn)
that demonstrates the effectiveness of our approach in finding patients. Due
to privacy concerns, we were not able to integrate with existing online commu-
nities. We have developed a sample online community, MeMed (available at:
http://bit.ly/me_med_), and created posts similar to those found in existing
communities. Our prototype allows users to add PCORI projects and finding
matching patients in MeMed.

3 Overview of the PCCR Platform
In this section we briefly describe the PCCR platform. We begin by describing
the main models in the system. These are illustated in figure 1.
1. Investigators: captures the information about the investigators who are
seeking participants for their projects. We model the institution and the
areas of interest for an investigator. The areas of interest of an investigator
are automatically created by analyzing their projects.
2. Projects: Each investigator can have multiple projects. Each project has
a title, description, goals, project type that captures the nature of the
project, the medical conditions and medications of interest described in
the project, and the expected outcomes. Our matching algorithm matches
participants across these different dimensions and calculates a match score.
The patients are ranked based on this match score.
3. Patients: We extract patient profiles based on their conversations / partic-
ipation in existing online health communities. We identify and use their
demographic, socio-economic, and medical information in creating their
profile.

2

Inves-gators( Pa-ents(
Name( Name(
Organiza-on( Age(
Areas(of(interest( Gender(
Project(History( Loca-on(
Economic(status(
Race(
Areas(of(interest(
Medical(Condi-ons(/(stage(
Ac-vity(

Project(1( Project(2( Project(k(

Project(defini-on(
Medical(
Statement( Goals( Type( Condi-ons( Demographics( Outcome(

Preven-ve( Medical( Age( Trial(
Diagnos-c( condi-on( Gender( Tests(
PR( PR( Therapeu-c( Condi-on(stage( Economic( Studies(
Pallia-ve( Medica-on( Region( Surveys(
UC( UC( Health(Delivery( Race(

UC( UC(

PR( PR(

*PR(–(Pa-ent(response,(*UCN(User(comment(

Figure 1: PCCR Matching Platform - Data Definitions

Project Title & Description Patient Communities

Big Data & Multidimensional Big Data & Multidimensional
Semantic Analysis Semantic Analysis

Rich Project Profile Rich Patient Profile

Multidimensional Semantic
Match Engine

Matched Participants Across
Online Patient Communities

Figure 2: PCCR Matching Platform - Data Flow

The researcher and patients profiles are used by our matching engine to
identify relevant patients for a project. At the heart of the PCCR platform is
our matching engine. Figure 2 illustrates the data flow of our matching engine.
The two main components of the matching engine are the researcher profile
generator and the patient profile generator.
The researcher profile generator takes as input the textual description of a
research project. For the purposes of this challenge, we use the descriptions of
funded PCORI projects. This profile is passed through a semantic analyzer.
The semantic analyzer is built using concepts in RXNorm and SNOMED and

3

Figure 3: Example output of semantic analysis

identifies medical terminologies and concepts in the description, along with their
semantic types. In addition to the semantic analyzer, the description is also sent
to OpenCalais Web API for entity identification. A final list of entites and types
is created by combining the output of the semantic analyzer and OpenCalais.
The demographic analyzer module extracts demographic information (such as
age group of target population, gender, and location information). We use
textual cues to identify expected outcomes. Figure 3 illustrates the entities
identified from the description of a PCORI project on Epilepsy.
The patient profile generator uses the semantic analyzer and the demo-
graphic analyzer. However, given the volume of patient data, we needed to
adopt a more scalable approach as semantic analysis can be expensive. We use
a Map-Reduce based solution, where we have a series of map and reduce jobs.
The first map job takes user profiles as input and uses the semantic analyzer
to identify entities and types. In parallel, we have another map job that uses
extracts entities and types using OpenCalais. The respective reduce jobs com-
bine all the identified entities for a patient. We merge these lists to create a
semantic signature of the patient consisting of a collection of entities and their
types. Similarly, the demographic and socio-economic information is identified.
Combining all of the above information yields a rich patient profile. We store
the profile as a structured object in Mongo.
The matching algorithm takes as input a rich project profile. For each of
the facets in medical condition, medication, and demographics, the match-
ing algorithm first finds the relevant patients using set containment opera-
tors. We also use Mongo’s geo querying to filter users by location, if the
project description mentions such as a restriction. Further, we apply a seman-
tic similarity (based on Ted Pedersons UMLS Similarity project available at
http://umls-similarity.sourceforge.net/), to compute the semantic sim-
ilarity of a patient profile to that of a project. All of these are then combined
to create a match score that is used in selected and ranking patients.

4 Related Work
The techniques we have used in this paper are built upon prior research in the
areas of semantic Web, hierarchical object matching, and entity extraction. In
the context of patient matching for healthcare, the TrialX system [4]is very
relevant to work. We also use our prior work in the area of faceted matching
and searching of unstructured documents [3] for factet extraction.We model our

4

similarity measurement technique based on the their approach. We also applied
the principles of hierarchical object matching discussed by Ganesan et. al in [2]
and Doan et. al in [1]. We also use OpenCalais Web service [5] to semantically
enrich patient conversations and project descriptions and to extract relevant
entities.

5 Conclusions
In this paper, we describe our solution to the PCORI Healthcare 2.0 chal-
lenge. Our solution leverages existing patient data available in online health
communities and creates a rich semantic profile of the patients. We have also
developed techniques for creating multi-dimensional project profiles from their
textual descriptions. We have developed a semantic matching algorithm that
finds matching patients for research projects. The PCCR platform we have de-
veloped works for any patient community. Due to privacy concerns, we have
not used any online community data in our development or demonstration. In-
stead, we use data from a patient community that we prototyped and seeded
with posts. We evaluated our system and found that our approach has over
90% accuracy in finding patients who have same or similar medical conditions.
The match rate when using demographics goes down to about 80%. We are
currently improving our demographic profiling and extraction technique. Our
approach builds on current ways patients share and interact on the Web today
and we believe that it can help researchers find very relevant patients leading
to more meaningful and productive engagements and outcomes.

References
[1] Anhai Doan, Pedro Domingos, and Alon Halevy. Learning to match the
schemas of data sources: A multistrategy approach. Machine Learning,
50(3):279–301, 2003.
[2] Prasanna Ganesan, Hector Garcia-Molina, and Jennifer Widom. Exploiting
hierarchical domain structure to compute similarity. ACM Transactions on
Information Systems (TOIS), 21(1):64–93, 2003.
[3] Karthik Gomadam, Ajith Ranabahu, Meenakshi Nagarajan, Amit P Sheth,
and Kunal Verma. A faceted classification based approach to search and rank
web apis. In Web Services, 2008. ICWS’08. IEEE International Conference
on, pages 177–184. IEEE, 2008.
[4] Chintan Patel, Sharib Khan, and Karthik Gomadam. Trialx: Using semantic
technologies to match patients to relevant clinical trials based on their per-
sonal health records. Proc. of the International Semantic Web Conference
(ISWC), 2009.
[5] T Reuters. Opencalais, 2009.

5

Pcori2013 (23)

Recommended

Recommended

More Related Content

Similar to Pcori2013 (23)

Similar to Pcori2013 (23) (20)

Pcori2013 (23)