Domain-specific Multi-stage Query Language for Medical Document Repositories

to Advance Knowledge for Humanityto Advance Knowledge for Humanity
Aastha Madaan
University of Aizu
1
Domain Specific Multi-stage Query LanguageDomain Specific Multi-stage Query Language
for Medical Document Repositoriesfor Medical Document Repositories
11/12/16 VLDB Phd Workshop 2013

IntroductionIntroduction
 Specialized Domains
 Biomedical, agriculture, medical/healthcare
 Insufficient  Search Engine-like Keywords based search
 Require  Effective search and query mechanisms
 Medical domain
Medical professionals  Specific technical articles (particular topic/sub-
topics )
General public  General information (disease or medicine).
 How to retrieve medical information effectively
11/12/16 2VLDB Phd Workshop 2013
 Query  ”general AIDS information”  medical search tool, such as
PubMed
 Result  1000s of documents Different aspects of AIDS⊆
Such as , treatment, drug therapy, transmission, diagnosis, and history

Introduction (1)Introduction (1)
 Medical Information
Knowledge  Evolved over 10s of years
Contains  Well defined terms and processes
 Available on the Web
Patient Specific Information
Knowledge-based Information
Medical Literature
Web Documents
Patient-encounter
Recordings

Complexity  Knowledge-based Resources
 Heterogeneous End-user groups
 Patients, researchers, doctors and other experts
 Variation Information Requirements
 Patient-treatment, self -diagnosis, general health information
 Structure  Medical Documents
 Scientific papers, encyclopedias and other literature
 Unique, well-defined
11/12/16 VLDB Phd Workshop 2013 4
Introduction (2)Introduction (2)

Introduction (3): Specialized DocumentsIntroduction (3): Specialized Documents
Case of medical encyclopedias
Comprehensive medical guide  Patients and clinicians
Authoritative source  NLM (National Library of Medicine)
Paper based resources  Electronic format
Examples  MedlinePlus, WebMD, ADAMS,
Merriam-Webster Medical Dictionary

Introduction (4): Why QueryIntroduction (4): Why Query
 External knowledge base  Clinicians
 Evidence based medicine
 During  different stages of point-of-care
 Patient Assessment plan
 Treatment based on Patient diagnosis
 Improve Quality of Care
 Authoritative information required
 Self Diagnosis  Patients and their relatives
 During  Early appearance of symptoms/post-checkups
 Enhance Personal Knowledge

The Underlying StructureThe Underlying Structure
The Hierarchical Structure
Topic of the Document
Subtopics
Miscellaneous/Related
Content
Subtopic 1 Subtopic 2 Subtopic n Content topic 1 Content topic 2 Content topic n
Content Content Content Content ContentContent
 Flow of Contents Organized stages of point-of-
care

Introduction (5): End UsersIntroduction (5): End Users
 Variable
a. Demographical Characteristics
b. Tasks/Purpose
c. Computer/Domain Expertise
 Practitioners and Researchers
 Well-versed
 Domain knowledge and terminologies
 Require
 Precise, complete, accurate and timely results
 Patients and their relatives
 NOT Well-versed
 Domain knowledge and terminologies
 Require
 General information
Healthcare Workers
Specialized
Researchers
Patients, their relatives

 Evidence-based Queries
 Intent: Diagnostic
 Raised by: Clinicians/Experts
 Target resources: Online Medical Repositories (e.g. medical encyclopedia)
 Example: “Cases where helicobacter bacteria causes peptic ulcer”
 Hypothesis-directed Queries
 Intent: Non-diagnostic
 Raised by: Novice users/patients
 Target resources: Online Medical Repositories (e.g. medical encyclopedia)
 Example: “Treatment in case of high fever and dizziness”
Medical QueriesMedical Queries

Query FlowsQuery Flows
 Occurrence Evidence-based and hypothesis-directed queries
 Represent  Stages of information seeking
 Comprise  Varying levels of query complexity
1. 2. 3. 4.

Query: Find chances of "Cancer Risk" in patients showing symptom "Sleep Deprivation"
and have been exposed to "Radiation" (but not "Environmental Toxins" and does not have
"Genetic Disorder") .
Help Needed
Research GapResearch Gap
Results  Large in number, irrelevant
Failure  Keyword search, domain-specific search tools
Require  Precise and easy-to-use database style query methods
Key steps:
1. Schema  understandable by users
2. Identify  Resources to query
3. Identify  Granularity of results
Healthcare Expert
Paper-based resources

Aim: Query Online Medical Information Effectively
Transform  Document Repository  User-Level Schema
Enable  High-level Query Language
Target Audience  Skilled and semi-skilled users
Utilize  Query capabilities of a database query language
Facilitate  In-depth Queries and Granular Results
Bridging the GapBridging the Gap

Query the New WayQuery the New Way
User-levelUser-level
SchemaSchema
High-level QueryHigh-level Query
languagelanguage
Traditional
Method
Proposed
Method
Resource
Resource
Keyword
Search
Query
Method
Medical
Expert
Medical
Expert
Results
-Lack specificity
-Long list of full documents
-Trustworthiness of resources  unknown
Results
-Specific, granular
-Segments of documents  query criteria
-Trustworthy/Authoritative sources only

Proposed ApproachProposed Approach

Key FeaturesKey Features
 User-Level Schema  Offline Process
 Universal , concept-level schema
 Attributes
 Understandable  Domain experts and novice users
 Provide  Granular, context-based results
 Use  Web segmentation algorithm, Domain concepts
 Multi-stage Query Language  Online Process
 Map multi-stage diagnostic process  Step-by-step Query Flow
 Interactive Querying  Continuous query refinement
 View Results  Add concept  View Results
 Support Simple, Medium, Complex, Recursive Queries
 Use  User-level schema

OutlineOutline
Two-step Framework

Data ModelData Model

Data ModelData Model
Tree Structured Repository
H1H1
f1f1
f2f2

Data Model (1)Data Model (1)

Data Model (2): SchemaData Model (2): Schema
Attributes  Diagnostic concepts/terms
 Stages of point-of-care
Do not change frequently

Data Model (3): A XML DocumentData Model (3): A XML Document
Title Causes
Symptoms
Treatment
 Document corresponding to “Aarskog Syndrome”
 MedlinePlus Medical Encyclopedia

Data Model (4): Query EffortData Model (4): Query Effort
 Query: Find if "Oxygen therapy" work for the treatment of "Chronic
Respiratory Failure" and symptoms are "Lethargy" OR "Shortness of breath”.
Advanced keyword search Proposed Method
SELECT attribute = “Treatment”
WHERE
Attribute “Disease_name” = “Chronic
Respiratory Failure”
AND
Attribute “Treatment” = “Oxygen therapy”
AND
Attribute “Symptoms” =“Lethargy”
OR
Attribute “Symptoms” = “Shortness of breath”
Easy-to-UseNot Possible
Result segment
Context of user-query

Data Model (5): Granular ResultsData Model (5): Granular Results
Queried
Attributes/Segments
Query Results
Context Granular
 Each result is a segment, combination of
 Concept/context in query
 Item of concern (content enclosed in a segment)

An ExampleAn Example
Query: Find other symptoms where “chronic kidney failure” is
caused by “anemia”
Queried segment  Symptoms
Segments in Query  Causes = “anemia” and Disease_name =
“Chronic kidney failure”
Result Segment  Symptoms
 Context  disease_name = “chronic kidney failure” & causes
= “anemia”

Next StepNext Step  Multi-stage Query LanguageMulti-stage Query Language

Proposed Query Language (1)Proposed Query Language (1)
 XQBE [Braga, 2005] User level schema
 Create queries  Drag and drop interface
 Query : “Find cases where a person is inflicted with “peptic ulcer” due to
“helicobacter pylon bacteria”
 Attributes  understandable by
end users
1. Case = disease_name
Value = ??
2. Due to = Causes
Value = “helicobacter pylon bacteria”
3. Inflicted with = Symptoms
Value = “peptic ulcer”
 Query Effort
 Minimal learning curve
 Computer-expertise  not required

 Multi-stage Query-by-Concept
 Concept  Query-able attribute
 Topic, sub-topic, medical concept
 Query Effort
 Dynamic selection of attributes
 No computer expertise
 Query Process
Proposed Query Language (2)Proposed Query Language (2)
An Example: Cases where fever is caused
due to infliction of Pneumonia and
Tuberculosis

Another ExampleAnother Example
 Query: Find cases where 3 clinical concepts (“cough”, “no
sore-throat”, and “no sterol injection”) occur in context of
symptoms along with a sub-concept (“non sterol injection at
the left side”)
 XQBE on Specialized Medical Repositories
 Multi-stage Query-by-concept Query Language

Evaluation PlanEvaluation Plan
Data Sets
 Document repository
 MedlinePlus  Health topics (900+) , encyclopedia (4000+), drugs (12000+)
 Set of Queries
 50 test queries (multi-staged)  Using literature survey and consultation with
medical users
Quantitative Studies
 Evaluation Metrics
 Accuracy of segment extraction (schema creation)  Precision and Recall
 Reduction in search space  Query Results
Qualitative Studies
 Usability Studies
 Actual End-users
 Query Performance

Initial AchievementsInitial Achievements
HTML documents  XML schema as per proposed model
XQuery on XML
Integration with XQBE
Query by concept  Enumeration using paper and pencil

ChallengesChallenges
 Scalability  Schema extraction of similar repositories
 Understand and Implement Query operations needed
 Understand  User characteristics to be considered
User Interface  Query Language

Related WorkRelated Work
 Domain-specific Information Retrieval [Yan, 2011]
 Similarity and popularity based models  Insufficient for domain experts
 “Information granulation” needs to be considered in huge document repositories
 Form-based Query Interfaces [Jayapandian, 2009]
 Easy-to-use
 Limited access to the database
 Complex queries  large number of forms
 Varying medical concepts  large number of fields in forms
 Beyond single page web search results [Varadarajan, 2008]
 Provide granular results for user’s search
 Return segments from multiple or related web documents as results
 High-level Graphical Query Languages [Braga, 2005]
 Easy-to-use and understand
 Little or no programming effort required by the user
 Common languages  QBE, XQBE

Summary and ConclusionsSummary and Conclusions
 Proposed  Multi-stage Query Language
1. Aim  Making Online medical information usable
2. Transformation  User-Level Schema
3. Facilitates  Granular/Context-based Results
4. Support  Healthcare Experts
5. Minimize  Learning curve for novice users
6. Reduce  Dependency on keyword based searches
 Provide  Web user level activity  Healthcare experts  no or
little programming effort

References (1)References (1)
[1] D. Braga, A. Campi, and S. Ceri. Xqbe (xquery by example): A visual interface to the standard xml query language. ACM
Trans. Database Syst., 30(2):398–443, June 2005.
[2] D. Cai, S. Yu, J.-R. Wen, and W.-Y. Ma. Extracting content structure for web pages based on visual representation. In
Proceedings of the 5th APWeb, pages 406–417. Springer-Verlag, 2003.
[3] M.-A. Cartright, R. W. White, and E. Horvitz. Intentions and attention in exploratory health search. In Proceedings of the 34th
Intl. ACM SIGIR conference, pages 65–74, New York, NY, USA, 2011.ACM.
[4] S. Cohen, Y. Kanza, Y. Kogan, W. Nutt, Y. Sagiv, and A. Serebrenik. Equix-a search and query language for xml. Journal of
the American Society for Information Science and Technology, 53:2002, 2000.
[5] S. M. Freire, E. Sundvall, D. Karlsson, and P. Lambrix. Performance of XML Databases for Epidemiological Queries in
Archetype-Based EHRs. In Proceedings Scandinavian Conference on Health Informatics 2012, volume 70 of Linkping Electronic
Conference Proceedings, pages 51–57. Linkping University Electronic Press, 2012.
[6] M. Gschwandtner, M. Kritz, and C. Boyer. Requirements of the health professional research. In Technical Report D8.1.2.
Khresmoi Project, 2011.
[7] A. Hanbury. Medical information retrieval, an instance of domain. In SIGIR'12. ACM, August 2012.
[8] S. Hunt, J. J. Cimino, and D. E. Koziol. A comparison of clinicians’s access to online knowledge resources using two types of
information retrieval applications in an academic hospital setting. J Med Libr Assoc, 101(1):26–31, 2013.
[9] http://www.who.int/classifications/icd/en/, 2011.
[10] M. Jayapandian and H. V. Jagadish. Automating the design and construction of query forms. ICDE, page 125, 2006.
[11] F. Li and H. V. Jagadish. Usability, databases, and hci. IEEE Data Eng. Bull., 35(3):37–45, 2012. [12] http://loinc.org/, 2011.
[13] A. Marian and W. Wang. Flexible querying of personal information. IEEE Data Eng. Bull., 32(2):20–27, 2009.
[14] http://www.nlm.nih.gov/bsd/pmresources.html, 2011.
[15] http://www.nlm.nih.gov/medlineplus/, 2009.
[16] http://www.linkedin.com/groups/ Choice-OpenEHR-persistence-layer-144276.S.208531138?qid=208adbca-fc26-4ada-bf02-
7efe5a9e5661&trk=group_most_recent_rich-0-b-ttl&goback=%2Egmr_144276, 2013.

References (2)References (2)
[17] http://www.ncbi.nlm.nih.gov/pubmed, 2011.
[18] S. A. Rahman, S. Bhalla, and T. Hashimoto. Query-by-object interface for information requirement elicitation in m-commerce. Int.
J. Hum. Comput. Interaction, 20(2):135–160, 2006.
[19] X. Y. Raymond, Y. Lau, D. Song, X. Li, and J. Ma. Toward a semantic granularity model for domain-specific information retrieval.
ACM Trans. On Information Systems., 29(3), July 2011.
[20] S. Sachdeva and S. Bhalla. Implementing high-level query language interfaces for archetype-based electronic health records
database. In COMAD, 2009.
[21] http://www.ihtsdo.org/snomed-ct/, 2011.
[22] R. Varadarajan, V. Hristidis, and T. Li. Beyond single-page web search results. IEEE Transactions on Knowledge and Data
Engineering, 20(3):411–424, 2008.
[23] A. Yasir, M. Kumara Swamy, P. Krishna Reddy, and S. Bhalla. Enhanced query-by-object approach for information requirement
elicitation in large databases. In Big Data Analytics, volume 7678 of Lecture Notes in Computer Science, pages 26–41. Springer, 2012.
[24] M. Jayapandian, H.V. Jagadish : Automating the Design and Construction of Query Forms. IEEE Trans. Knowl. Data Eng.
21(10) :1389-1402 (2009).

QuestionsQuestions

Domain-specific Multi-stage Query Language for Medical Document Repositories

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Domain-specific Multi-stage Query Language for Medical Document Repositories

Similar to Domain-specific Multi-stage Query Language for Medical Document Repositories (20)

More from Aastha Madaan

More from Aastha Madaan (7)

Recently uploaded

Recently uploaded (20)

Domain-specific Multi-stage Query Language for Medical Document Repositories

Editor's Notes