1. MWRI WIP February 2014Harry Hochheiser, harryh@pitt.edu
User tools for Biomedical Informatics:
the Human Side of the
Fundamental Theorem
Harry Hochheiser
!
University of Pittsburgh School of Medicine
Department of Biomedical Informatics
harryh@pitt.edu!
2. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
• Human + Computer > Human iff
• Value(Computer) > Cost(Computer)
• all too often, this does not hold
Hochheiser's perspective on
biomedical informatics
• Informatics tools must
• Support researcher’s tasks and goals.
• Take care of the “stupid” work
3. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
GRADS: Genomic Research In Alpha-1 Antitrypsin
Deficiency Syndrome and Sarcoidosis
• Alpha-1 antitrypsin deficiency
• “genetic predisposition to early onset pulmonary emphysema and airway
obstructions” (GRADS MOP)
• Mutation in SERPINA1 gene - codes for alpha 1-antitrypsin
• Genotyes PiMM (normal), PiMS, (80% serum level), PiSS/PiMZ (60%),
PiSZ (40%), PiZZ (20%)
• Sarcoidosis
• “systemic disease characterized by the formation of granulomatous lesions,
especially in the lungs, liver, skin, and lymph nodes, with a heterogeneous set
of clinical manifestations and a variable course” (GRADS MOP)
• No specific genetic cause
• Infection may play a role..
4. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
GRADS Goals
Use ‘omics data to characterize phenotypes
gene expression
miRNA expression
microbiome
~ 600 patients (400 sarc., 200 A1AT, distribute across
phenotypic/genotypic groups), 7 centers
detailed clinical data
lung CT
‘omics, etc.
!
5. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
GRADS Data sharing Goals
• Integrative exploration of clinical and ‘omic data
• Identify cohorts suitable for analysis
• Are there enough participants to ask my questions?
• Which genes/miRNAS/microbes might be “interesting”
• How do clinical data relate to ‘omic data
• Web-based interactive filters and exploration
• Coordinated histogram widgets as both input and output
• Initially, GRADS clinical centers
• eventually, broader community
10. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Research Challenges
• Algorithmic enhancements
• Data retrieval and management
• Calculation of “interesting” genes
• GPU-based calculation
• Additional user facilities?
• statistical comparison of subgroups?
11. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Interactive Search and Review of Clinical Records with
Multi-Layered Semantic Annotations
• Challenge: retrospective chart review for clinical research
• Quality assessment
• measuring guideline adherence for colonoscopy
• Cohort identification
• patients who may have had adverse reactions
!
• Use Natural Language Processing to extract relevant variables
• But… researchers need to review findings and correct
mistakes.
• Ultimate goal: bridge gap between NLP and clinical research
12. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
NLP Chart Review Visualization
13. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Word Tree Visualization
Wattenberg and Viégas, 2008, implementation from https://github.com/silverasm/wordtree
Patterns in the text can help facilitate review of NLP results.
16. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Next Steps
Interfaces for handling suggested revisions to NLP models:
Selecting spans
Changing variable assignments
Submitting changes
Reviewing modified variable assignments
Assessments
Usability studies
Empirical studies
How much training is needed to “seed” expert review?
17. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Monarch Initiative:
Using cross-species phenotypes to
explore disease
(some slides courtesy of M. Haendel)
Problem: Clinical
and model
phenotypes are
described
differently
18. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
OWLSim: Phenotype similarity across
patients or organisms
!https://code.google.com/p/owltools/wiki/OwlSim
Statistical details available on
demand
20. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
The Monarch Infrastructure
21. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Visualization Challenges:
How to explain the inferences driven by ontological calculations?
How to integrate multiple data types to aid interpretation?
Pathways
Gene expression
protein-protein interaction
…..
How to compare across phenotype profiles?
22. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Undiagnosed Disease Program:
Comparing Phenotype Profiles
24. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Phenotype Profile - Model Views
25. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Other challenges
Process support - search and interpretation as an ongoing
activity
!
Reducing bias - how do we avoid cherry-picking and thorough
investigation
!
Navigating semantic chains
phenotypes -> networks -> genes - > model
26. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Closing thoughts…
• The hard problems are not technical
• Collaboration required..
27. NLM Training Conference June 2014Harry Hochheiser, harryh@pitt.edu
Acknowledgments
GRADS:
U. Pittsburgh: Steve Wisniewski, Mike Becich, Scott O’Neal, Bill Shirey, Becky Boes, Sahawut Wesaratchakit
Yale: Naftali Kaminski
Support: NHLBI U01HL112707
Monarch:
U. Pittsburgh: Chuck Borromeo, Bec ky Boes, Jeremy Espino
OHSU: Melissa Haendel, Nicole Vasilevky, Matt Brush
NIH-UDP: Murat Sincan, David Adams, Neal Boerkel, Amanda Links, Bill Gahl
LBNL: Nicole Washington, Suzanna Lewis, Chris Mungall
+ colleagues at Sanger, Charite , Toronto, and JAX
UCSD: Anita Bandrowski, Amarnath Gupta, Jeff Grethe, Maryann Martone, Trish Whetzel
Support: NIH Office of Director: 1R24OD011883, NIH-UDP: HHSN2682013
Interactive Search and Review of Clinical Records with Multi-Layered Semantic
Annotations:
U. Pittsburgh: Janyce Wiebe, Rebecca Hwa, Alex Conrad, Phuong Pham, Lanfei Shi, Gaurav Trivedi
U. Utah: Wendy Chapman, Danielle Mowery
Support: NLM 7R01LM010964
!
Other Support:
Addressing Gaps in Clinically Useful Evidence on Drug-Drug Interactions (R. Boyce, NLM: 1R01LM011838)
Cancer Deep Phenotype Extraction from Electronic Medical Records (R. Crowley & G.Savova, NCI: 1U24CA184407)
Quantifying Electronic Medical Record Usability to Improve Clinical Workflow (Z. Agha, AHRQ: 5R01HS021290)