2. Biomedical Informatics
The use of informatics for the improvement of biomedical
research and clinical care
Friedman “A "fundamental theorem" of biomedical
informatics. JAMIA 2009
Shortliffe & Blois “The Computer Meets Medicine and
Biology: Emergence of a Discipline” in Shortliffe & Cimino, eds.,
“Biomedical Informatics: Computer Applications in Health Care and Biomedicine
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
3. What is “Translational” research?
●
My research is translational...
●
..because I want to get funding?
http://www.ncats.nih.gov/research/cts/cts.html
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
4. Co-Clinical Trials
Nardella, Lunardi, Pataik, and Cantley
The APL Paradigm
and the “Co-Clinical Trial” Project
Cancer Discovery 2011
Acute Proyelocytic Leukemia
●
●
13 years of work – cloning, description, transgenics, trials
●
Successful therapy now approved for clinical use..
●
Co-clinical trials – synchronize human and animal trials.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
5. Measuring Success
●
How is success of translational science measured?
●
Cures? Therapies?
●
Citations mapping model organism studies to T2, T3, T4
success?
●
How do we assess impact that doesn't play out in the usual
3-5 year grant cycle?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
6. FaceBase
http://www.facebase.org
National Institute of Dental and Craniofacial Research
Five-year initial phase 2009-2014
“..systematically compile the biological instructions to construct the middle
region of the human face and precisely define the genetics underlying its
common developmental disorders, such as cleft lip and palate”
10 Projects: U-flavor contracts
Data Management and Coordination Hub
– “One-stop access to craniofacial research data”
– “Allow scientists to more rapidly and effectively generate
hypotheses and accelerate the pace of their research”.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
8. FaceBase Data
●
10 projects
●
4 organisms: mouse, zebrafish, chick, human
●
Varying developmental time points
●
Data types
●
Expression: microarray, ChIP-Seq, miRNA,
●
Images: 3D MRI, OPT, microMRI, microCT, 3D human facial
mesh
●
Sequences: Genotypes.GWAS, CNV
●
Software: 3D facial image analysis
●
Strains: Cre recobminase drivers
●
1 Data management and coordination hub
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
9. What does it mean to effectively
share data?
Diversity of data
Diversity of projects, goals, etc.
.
Challenges are both technical and social/organizational
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
10. FaceBase Hub Technical Challenges
Sequences
Data Diversity Images..
miRNA microarrays
Anatomy Phenotype
Integration UCSC Genome Browser Between datasets
Tools Searching Browsing
Metadata
Genotypes RNA-Seq
Data Volume
“Controlled Access” Human Data Genotypes Facial Images
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
11. Metadata: Theory
●
Well-defined metadata fields/attributes
●
Controlled vocabularies provide consistent terminology for
each field
●
Link to appropriate resources as needed: NCBI, MGI, etc..
●
Additional attributes specific to each data type
●
Consistent metadata supports search, navigation...
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
12. Metadata in practice
“What's the difference between mutation and Genotype?”
“Uhh. I'll have to get back to you on that.”
“Strain name? Is “C57B6J” the same as “C57Bl/6J?”
“We're lucky to have any information at all when it comes to
these mice...”
“The official name of the strain is ____, but everybody
calls it ____”
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
13. Biomedical Ontologies might solve
some terminology and structure
problems
Human Anatomy: EHDAA, FMA
Mouse Anatomy: EMAP, MA
Mouse/Human Phenotypes:
MP/HPO
Genes: GO
Data Models: MIAME, MiXX,
ISA-TAB, OBI
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
14. Metadata Possibilties:
Ontologies & Data Models
Ontologies, Data models, etc.
User Practice
Grand Canyon NPS
http://www.fotopedia.com/items/fickr-
7553734530
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
15. Implications
Good metadata is vital for search, retrieval, data sharing,
reuse
Curation effort can compensate for some inadequacies, but
– EC Curation Effort, Mq Metadata Quality
– EC = f( 1/Mq)
– Possibly non-linear
–
This doesn't scale
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
16. Search tools
●
Search + Link: current practice in bioinformatics
repositories
●
Text search over metadata
●
Linear results list
●
Results pages with many links
●
Advantages
●
Easy, fast
●
Disadvantages
●
Relationships between items in result list are not clear
●
No overviews to help with higher-level meta-understanding
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
17. The Ontology of Craniofacial
Development and Malformation
Extend existing ontologies to provide richer models of
craniofacial anatomy, development, and malformation
Foundational model of Anatomy (FMA), Human Phenotype
Ontology (HPO), Mouse Anatomy (MA)..
Add human developmental description
Human-Mouse links
https://www.facebase.org/content/ocdm
But... better models don't guarantee usage
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
18. The tool gap
“Tools have advanced to the point of being able to support
users fairly successfully in finding and reading off data
(e.g. to classify and find multidimensional relationships of
interest) but not in being able to interactively explore
these complex relationships in context to infer causal
explanations and build convincing biological stories amid
uncertainty.”
B. Mirel Supporting cognition in systems biology analysis: findings on users' processes and design
implications (2009) J. Biomedical Discovery & Collaboration
• Need: Tools that effectively leverage combination of
human intellect and computing power
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
19. Information Visualization
●
Interactive displays of high-dimensional data sets
●
Coordinated views facilitate comparison across dimensions and
●
Multi-scale
●
Overview
●
Zoom/filter
●
Full details on requested items
●
Rapid, incremental, reversible queries
●
Avoid 0-hit or million-hit queries.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
20. Toward integrative tools
How do we build
connections?
Genotypes
How do we get
Morphometry users to think
differently
about the data?
Images
Expression
Sequence
Human Mouse Zebrafish
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
21. Information Visualization
●
Interactive displays of high-dimensional data sets
●
Coordinated views facilitate comparison across dimensions and
●
Multi-scale
●
Overview
●
Zoom/filter
●
Full details on requested items
●
Rapid, incremental, reversible queries
●
Avoid 0-hit or million-hit queries.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
22. Technology Probe:
Connecting related data items
• Single search, multiple
data types
• Links between items
indicate connections
• Images or other details
on demand for direct
access to data
• Rapid response supports
interactivity, exploration,
serendipitous discovery
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
23. Technology Probe: Gene Atlas
• Genes displayed
on timeline,
indicating when
active
• Images display
expression
localization
• Large image for
detailed views.
• Mouse-over
coordinated links
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
24. Vision vs. reality
●
Data is still coming in
●
Gene atlas requires coordinated analysis ..
●
What can we do with data that is available?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
25. Technology Probe: Gene Atlas
• Genes displayed
on timeline,
indicating when
active
• Images display
expression
localization
• Large image for
detailed views.
• Mouse-over
coordinated links
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
27. Timeline view
Approximate alignment of datasets on common timeline
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
28. 3D Facial Meshes
Two datasets
– Weinberg & Marazita:
Caucasian facial norms
– Spritz: Tanzanian
Identifiable data -access only
upon DAC approval
Query tool: identify subsets
for further analysis.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
29. UCSC Genome Browser Mirror
●
Genomebrowser.facebase.org
●
> 30 tracks, mouse mm9
●
RNA-seq
●
ChIP-seq
●
Various anatomic
locations, time
points, etc.
●
Human CNV hg19
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
30. Imaging
●
microMRI,
●
MicroCT
●
OPT
●
Interactive Browsers?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
31. Fishface
Atlas of zebrafish
craniofacial
development
C. Kimmel, et al.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
32. Current Status
As of 4 February 2013
> 200 datasets
> 100 in queue awaiting curation or investigator approval
Continuing to refine metadata and data organization
Expecting large infux of data over the next year.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
33. Toward Greater Integration
Currently Ideally
Metadata Metadata Metadata Metadata
Data Data Data Data
Links via metadata: Links via data:
Comparable organism, stage, Commonly expressed genes,
Gene, etc.. Sequences,
Pathways
“Easy” for similar data
Expression, Sequences
Harder for diverse data
Images?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
34. FaceBase as a model
●
Diverse researchers and projects
●
Focus: specific clinical domain
●
Goal: collect and coordinated data – community resource
●
Outstanding challenges?
●
Is this even the right idea?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
35. Build it and they (may) come?
●
Speculative claim: assemble good data and it will be of use
●
How to evaluate this?
●
How to compare to n R01s that might otherwise have been
funded? (question asked by program officer)
●
ENCODE controversy
“..the lesson I learned from ENCODE is that projects like
ENCODE are not a good idea.”
M. Eisen http://www.michaeleisen.org/blog/?p=1179
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
36. Three key factors
Tools Incentives
Outreach
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
37. Tools
Desperately needed!
Search tools – finding data
Annotation Tools – describing data
– Using ontologies
– Common data models
Minimize the effort required to provide the high-quality
metadata needed for translational bioinformatics.
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
38. Perhaps the worst
bioinformatics data management tool
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
39. Perhaps the best
bioinformatics data management tool
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
40. Pros and Cons of Spreadsheets
Alternatives:
Cons:
– ad-hoc data models
– ad-hoc data fields
– No consistency, provenance...
Pros:
– Ubiquitous
– Simple Analytic tools: more rigor,
less generality
– Flexible
– Familiar
Experiment Metadata
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
41. Usability and Ontologies
Can we at least get the terms right?
Maguire et al. 2012
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
42. Beyond Autocomplete
●
Substring autocomplete –
state of art for finding
terms in biomedical
ontologies
●
Cons: no context, room
for specialization or
generalization?
●
Can we build a tool that
will easily support both
search and navigation?
Foundational Model Explorer
http://fme.biostr.washington.edu/FME/index.html
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
43. Why not let the curators do it?
●
Expense
●
Accuracy
●
Speed
●
Scientists know their data best
Claim – To be sustainable, data sharing must be self-curated.
But.. why should anyone bother?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
44. Scientific incentives...
Where does
Traditionally.... Data sharing fit in?
Research
Results
Funding Papers Data
Sharing
?
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
45. How to promote sharing..
1. Better tools → reduced effort
Eannotation < Ecollection + epsilon
2. Recognize effort: alternative models of academic credit
ImpactStory...
Altmetrics: Value all research products
(H. Piwowar, Nature 1/9/13, doi:10.1038/493159a)
“For all new grant applications from 14 January, the US National Science Foundation
(NSF) asks a principal investigator to list his or her research “products” rather than
“publications” in the biographical sketch section.”
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
46. Related Efforts
●
Genomics Research in Alpha-1 Antitrypsin Deficiency and
Sarcoidosis (GRADS) – microbiome, expression, phenotype
sharing
●
LAMHDI/MONARCH – User tools for ontologically-driven
discovery of model genes for human diseases
●
Research Social Network Collaboration Finding tool – using
VIVO and related networking tools to find collaborators..
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013
47. Acknowledgments
FaceBase: Mary Marazita, Jeff Murray, Yang Chai, Bruce Arnonow, Kristin Artinger, Teri
Beaty, Jim Brinkley, David Clouthier, Michael Cunningham, Michael Dixon, Leah Rae
Donahue, Scott E. Fraser, Benedikt Hallgrimsson, Junichi Iwata, Ophir Klein, Stephen
Murray, Fernando Pardo-Manuel de Villena, John Postlethwait, Steven Potter, Lina
Shapiro, Richard Spritz, Axel Visel, Seth Weinberg, Paul Trainor
U. Pittsburgh: Mike Becich, Becky Boes, Chuck Borromeo, Lance Kennelty, Annette Krag-
Jensen, Tom Maher, Johnson Paul, Linda Schmandt, Shiyi Shen, Bill Shirey, Cristy Spino,
Mike Stefanko, Justin Stickel
OCDM :James Brinkley; Jose Leonardo Mejino; Landon Detwiler; Ravensara Travillian; Melissa
Clarkson; Timothy Cox; Carrie Heike; Michael Cunningham; Linda Shapiro
NIDCR: Steven Scholnick, Emily Harris, Jeannine Helm, Nadya Lumelsky, Lillian Shum
Support: NIH Grants U01 DE020057, 3U01DE020050-03S1
Harry Hochheiser, harryh@pitt.edu Texas A&M Health Sciences Center February 6, 2013