SlideShare a Scribd company logo
The real world of ontologies and
phenotype representation:
perspectives from the
Neuroscience Information
Framework
Maryann Martone, Ph. D.
University of California, San Diego
“Neural Choreography”
“A grand challenge in neuroscience is to elucidate brain function in relation
to its multiple layers of organization that operate at different spatial and
temporal scales. Central to this effort is tackling “neural choreography” --
the integrated functioning of neurons into brain circuits-- Neural
choreography cannot be understood via a purely reductionist approach.
Rather, it entails the convergent use of analytical and synthetic tools to
gather, analyze and mine information from each level of analysis, and
capture the emergence of new layers of function (or dysfunction) as we
move from studying genes and proteins, to cells, circuits, thought, and
behavior....
However, the neuroscience community is not yet fully engaged in exploiting the
rich array of data currently available, nor is it adequately poised to capitalize
on the forthcoming data explosion. “
Akil et al., Science, Feb 11, 2011
“Data choreography”
 In that same issue of Science
 Asked peer reviewers from last year about the availability and use of
data
 About half of those polled store their data only in their
laboratories—not an ideal long-term solution.
 Many bemoaned the lack of common metadata and archives as a
main impediment to using and storing data, and most of the
respondents have no funding to support archiving
 And even where accessible, much data in many fields is too poorly
organized to enable it to be efficiently used.
 “...it is a growing challenge to ensure that data produced during the
course of reported research are appropriately described, standardized,
archived, and available to all.” Lead Science editorial (Science 11
February 2011:Vol. 331 no. 6018 p. 649 )
 NIF is an initiative of the NIH Blueprint consortium of institutes
 What types of resources (data, tools, materials, services) are
available to the neuroscience community?
 How many are there?
 What domains do they cover? What domains do they not cover?
 Where are they?
 Web sites
 Databases
 Literature
 Supplementary material
 Who uses them?
 Who creates them?
 How can we find them?
 How can we make them better in the future? http://neuinfo.org
• PDF files
• Desk drawers
In an ideal world...
We’d like to be able to find:
 What is known****:
 What is the average diameter of a Purkinje neuron
 IsGRM1 expressed In cerebral cortex?
 What are the projections of hippocampus?
 What genes have been found to be upregulated in
chronic drug abuse in adults
 Is alpha synuclein in the striatum?
 What studies used my polyclonal antibody against
GABA in humans?
 What rat strains have been used most extensively in
research during the last 20 years?
 What is not known:
 Connections among data
 Gaps in knowledge
Without some sort of framework, very difficult to
RequiredComponents:
– Query interface
– Search strategies
– Data sources
– Infrastructure
– Results display
– Why did I get this
result?
– Analysis tools
The Neuroscience Information Framework: Discovery and
utilization of web-based resources for neuroscience
 A portal for finding and
using neuroscience
resources
 A consistent framework for
describing resources
 Provides simultaneous
search of multiple types of
information, organized by
category
 Supported by an expansive
ontology for neuroscience
 Utilizes advanced
technologies to search the
“hidden web”
http://neuinfo.org
UCSD,Yale, CalTech, George Mason, Washington Univ
Supported by NIH Blueprint
Literature
Database
Federation
Registry
We need more databases !?
•NIF Registry: A
catalog of
neuroscience-relevant
resources
•> 5000 currently
listed
•> 2000 databases
•And we are finding
more every day
NIF must work with ecosystem as
it is today
 NIF was one of the first projects to attempt data integration in
the neurosciences on a large scale
 NIF is supported by a contract that specified the number of
resources to be added per year
 Designed to be populated rapidly; set up process for progressive refinement
 No budget was allocated to retrofit existing resources; had to work with
them in their current state
 We designed a system that required little to no cooperation or work from
providers
 NIF was required to assemble (not create) ontologies very fast and to provide a
platform through which the community could view, comment and add
 NIF is enriched by ontologies but does not depend on them
 Took advantage of community ontologies
 But needed to take a very pragmatic and aggressive approach to incorporating and using them
 Neurolex semantic wiki
What are the connections of the
hippocampus?
HippocampusOR “CornuAmmonis” OR
“Ammon’s horn” Query expansion: Synonyms
and related concepts
Boolean queries
Data sources
categorized by
“data type” and
level of nervous
system
Common views
across multiple
sources
Tutorials for using
full resource when
getting there from
NIF
Link back to
record in
original
source
Imminent: NIF 5.0
 NIF 5.0 about
to be released
 New design
 New query
features
 New analytics
What do you mean by data?
Databases come in many shapes and sizes
 Primary data:
 Data available for
reanalysis, e.g., microarray data
sets from GEO; brain images from
XNAT; microscopic images
(CCDB/CIL)
 Secondary data
 Data features extracted through
data processing and sometimes
normalization, e.g, brain structure
volumes (IBVD), gene expression
levels (Allen Brain Atlas); brain
connectivity statements (BAMS)
 Tertiary data
 Claims and assertions about the
meaning of data
 E.g., gene
upregulation/downregulation,
 Registries:
 Metadata
 Pointers to data sets or
materials stored elsewhere
 Data aggregators
 Aggregate data of the same
type from multiple sources,
e.g., Cell Image Library
,SUMSdb, Brede
 Single source
 Data acquired within a single
context , e.g., Allen Brain Atlas
Researchers are producing a variety of
information resources using a multitude of
technologies
Exploration: Where is alpha synuclein?
•Spatially:
•Gene
•Protein
•Subcellular
•Cellular
•Regional
•Organism
•Semantically:
•Gene regulation networks
•Protein pathways
•Cellular local connectivity
•Regional connectivity
•Who is studying it?
•Who is funding its study?
Networks exist across scales; all important in the nervous system
 Set of modular ontologies
 86, 000 + distinct concepts +
synonyms
 Bridge files between modules
 Expressed in OWL-DL language
 Currently supports OWL 2
 Tries to follow OBO community
best practices
 Standardized to the same
upper level ontologies
 e.g., Basic Formal Ontology
(BFO), OBO Relations
Ontology (OBO-RO),
 Imports existing community
ontologies
 e.g., CHEBI, GO, PRO,
DOID, OBI etc.
 Retains identifiers in
most recent additions
but reflects history
13
Covers major domains of neuroscience:
Organisms, Brain Regions, Cells,
Molecules, Subcellular parts, Diseases,
Nervous system functions,Techniques
NIFSTD Ontologies
Fahim Imam, William Bug
“Search computing”: Query by concept
What genes are upregulated by drugs of abuse in the
adult mouse? (show me the data!)
Morphine
Increased
expression
Adult Mouse
Reasonable standards make it easy to search for and compare results
Diseases of nervous system
New: Data analytics
NIF is in a unique position to answer questions about the neuroscience
ecosystem using new analytics tools
Neurodegenerative
Seizuredisorders
Neoplasticdiseaseofnervoussystem
NIH
Reporter
NIFdatafederatedsources
Results are organized within a common
framework
Connects to
Synapsed with
Synapsed by
Input region
innervates
Axon innervates
Projects toCellular contact
Subcellular contact
Source site
Target site
Each resource implements a different, though related model;
systems are complex and difficult to learn, in many cases
NIF Concept Mapper
The scourge of neuroanatomical nomenclature:
Importance of NIF semantic framework
•NIFConnectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
•BrainArchitecture Management System (rodent)
•Temporal lobe.com (rodent)
•ConnectomeWiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (excluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
Why so many names?
 The brain is perhaps unique among major organ systems in the
multiplicity of naming schemes for its major and minor regions.
 The brain has been divided based on topology of major
features, cyto- and myelo-architecture, developmental
boundaries, supposed evolutionary origins, histochemistry, gene
expression and functional criteria.
 The gross anatomy of the brain reflects the underlying networks
only superficially, and thus any parcellation reflects a somewhat
arbitrary division based on one or more of these criteria.
The “activation map” images that commonly accompany brain imaging papers can be
misleading to inexperienced readers, by seeming to suggest that the boundaries between
“activated” and “unactivated” patches of cortex are unambigous and sharp. Instead, as
most researchers are aware, the apparent sharp boundaries are subject to the choice of
threshold applied to the statistical tests that generate the image.What, then, justifies
dividing the cortex into regions with boundaries based on this fuzzy, mutable measure of
functional profile?
(Saxe et al., 2010, p. 39).
Brainmaps.org
Program on Ontologies for Neural
Structures
 International Neuroinformatics Coordinating Committee
 Structural LexiconTask Force
 Defining brain structures
 Translate among terminologies
 Neuronal RegistryTask Force
 Consistent naming scheme for neurons
 Knowledge base of neuron properties
 Representation and DeploymentTask Force
 Formal representation
 Also interacts with Digital Atlasing Task Force
http://incf.org
NeuroLexWiki
http://neurolex.org Stephen Larson
•Provide a simple framework
for defining the concepts
required
•Light weight semantics
•Good teaching tool for
learning about
semantic integration
and the benefits of a
consistent semantic
framework
•Community based:
•Anyone can contribute
their terms, concepts,
things
•Anyone can edit
•Anyone can link
•Accessible: searched by
Google
•Building an extensive cross-
disciplinary knowledge base
for neuroscience
Demo D03
Defining nervous system structures
Parcellation scheme: Set of parcels
occupying part or all of an anatomical
entity that has been delineated using a
common approach or set of criteria,
often in a single study.A parcellation
scheme for any given individual entity
may include gaps, transitional zones, or
regions of uncertainty. A parcellation
scheme derived from a set of individuals
registered to a common target (atlas)
may be probabilistic and include overlap
of parcels in regions that reflect
individual variability or imperfections in
alignment.
14 parcellation schemes currently represented in Neurolex
Documentation available
INCF task force on
ontologies
Basic model: do not conflate conceptual
structures with parcels
Regional part of
nervous system
Functional part of
nervous system
Parcel
overlaps
overlaps overlaps
Parcel Parcel
Neuroscientists have a lot of different parcellation schemes because they have a lot of different
ways of classifying brain structures and techniques to match them are imperfect
Linking semantics to space: INCF Atlasing
www.neurolex.org
Link to spatial
representation in
scalable brain
atlas
Waxholm space
Seth Ruffins,Alan Ruttenberg, Rembrandt Bakker
Neurons in Neurolex
 International
Neuroinformatics
Coordinating Facility (INCF)
building a knowledge base of
neurons and their properties
via the NeurolexWiki
 Led by Dr. Gordon Shepherd
 Consistent and parseable
naming scheme
 Knowledge is readily
accessible, editable and
computable
 While structure is imposed,
don’t worry too much about
the upper level classes of the
ontology
Stephen Larson
A KNOWLEDGE BASE OF NEURONAL PROPERTIES
26Additional semantics added in NIFSTD by ontology engineer
Concept-based search: search by meaning
 Search Google: GABAergic neuron
 Search NIF: GABAergic neuron
 NIF automatically searches for types of
GABAergic neurons
Types of GABAergic
neurons
Challenges of multiscale neurodegenerative
disease phenotypes
•Neurodegenerative diseases target very specific cell
populations
•Model systems only replicate a subset of features of the
disease
•Related phenotypes occur across anatomical scales
•Different vocabularies are used by different communities
not
not
Midbrain degenerated
Substantianigra decreased
in volume
Substantianigra pars
compacta atrophied
Loss of Snpcdopaminergic
neurons
Degeneration of nigrostriatal
terminals
Tyrosine-hydroxylase containing
neurons degenerate
Approach: Use ontologies to provide necessary
knowledge for matching related phenotypes
Sarah Maynard, Chris Mungall,
Suzie Lewis, Fahim Imam
Midbrain
Substantianigr
a
Substantianigra pars
compacta
Substantianigra pars
compacta dopamine
cell
Dopamine
Neuron cell
soma
Neuron (CL)
Part of neuron
(GO)
Small molecule
(Chebi)
Atrophied
Decreased
volume
Fewer in
number
Degenerate
Decreased in magnitude
relative to some normal
Has part
Has part
Is part
of
Has part
Has part
Is a
Is a Is a
Is a
Entities
Qualities
NIFSTD/PKB
OBO ontology
Alzheimer’s
disease
Human
(birnlex_516)
Neocortex pyramidal
neuron
Increased
number of
Lipofuscin
has part
inheres in inheres in
towards
EQ Representation of Phenotypes in Neurodegenerative
Disease: PATO and NIFSTD
Instance: Human with
Alzheimer’s disease 050
Phenotype
birnlex_2087_56
inheres in
about
Chris Mungall, Suzanna Lewis
Structured annotation
model implemented in WIB
OBD: Ontology based database
 Provides a user
interface for matching
organisms based on
similarity of
phenotypes
 Based on EQ model
 Uses knowledge in the
ontology to compute
similarity scores and
other statistical
measures like
information content
http://www.berkeleybop.org/pkb/
Chris Mungall, Suzanna Lewis, Lawrence Berkeley
Labs
Thalamus
Cellular
inclusion
Midline nuclear
group
Lewy Body
Paracentral
nucleus
Cellular
inclusion
Computes common subsumers and information
content among phenotypes
*B6CBA-TgN (HDexon1)62) that express exon1 of the human mutant HD gene- Li et al., J
Neurosci, 21(21):8473-8481
PhenoSim: What organism is most similar to a human
with Huntington’s disease?
Putamen atrophied
Globuspallidusneuropil
degenerate
Part of basal ganglia
decreased in
magnitude
Fewer neostriatum
medium spiny neurons in
putamen
Neurons in striatum
degenerate
Neuron in striatum
decreased in
magnitude
Increased number of
astrocytes in caudate
nucleus
Neurons in striatum
degenerate
Nervous system cell
change in number in
striatum
Progressive enrichment
Understanding and comparing phenotypes will be enriched through community
knowledge bases like Neurolex
Looking forward to continuing this as part of the Monarch project with Melissa
Haendel, Chris Mungall and Suzie Lewis
Top Down vs Bottom up
Top-down ontology construction
• A select few authors have write privileges
• Maximizes consistency of terms with each other (automated consistency
checking)
• Making changes requires approval and re-publishing
•Works best when domain to be organized has: small corpus, formal categories,
stable entities, restricted entities, clear edges.
•Works best with participants who are: expert catalogers, coordinated users, expert
users, people with authoritative source of judgment
Bottom-up ontology construction
• Multiple participants can edit the ontology instantly (many eyes to correct errors)
• Semantics are limited to what is convenient for the domain
• Not a replacement for top-down construction; sometimes necessary to increase flexibility
• Necessary when domain has: large corpus, no formal categories, no clear edges
•Necessary when participants are: uncoordinated users, amateur users, naïve catalogers
• Neuroscience is a domain that is less formal and neuroscientists are more uncoordinated
NIFSTD
NEUROLEX
Important for Ontologists to define community contribution model
It’s a messy ecosystem (and that’s OK)
NIF favors a hybrid, tiered,
federated system
 Domain knowledge
 Ontologies
 Claims about results
 Virtuoso RDF triples
 Data
 Data federation
 Workflows
 Narrative
 Full text access
Neuron Brain part Disease
Organism Gene
Caudate projects to
Snpc Grm1 is upregulated in
chronic cocaine
Betz cells
degenerate in ALS
Musings from the NIF
 No one can be stopped from doing what they need to do
 Every resource is resource limited: few have enough time,
money, staff or expertise required to do everything they would
like
 If the market can support 11 MRI databases, fine
 Some consolidation, coordination is warranted though
 Big, broad and messy beats small, narrow and neat
 Without trying to integrate a lot of data, we will not know what needs to be done
 A lot can be done with messy data; neatness helps though
 Progressive refinement; addition of complexity through layers
 Be flexible and opportunistic
 A single optimal technology/container for all types of scientific data and
information does not exist; technology is changing
 Think globally; act locally:
 No source, not even NIF, isTHE source; we are all a source
Grabbing the long tail of small
data
 Analysis of NIF shows
multiple databases with
similar scope and content
 Many contain partially
overlapping data
 Data “flows” from one
resource to the next
 Data is reinterpreted,
reanalyzed or added to
 Is duplication good or bad?
Same data: different analysis
Chronic vs acute
morphine in striatum
 Drug Related Gene database:
extracted statements from
figures, tables and supplementary
data from published article
 Gemma: Reanalyzed microarray
results from GEO using different
algorithms
 Both provide results of increased
or decreased expression as a
function of experimental
paradigm
 4 strains of mice
 3 conditions: chronic morphine,
acute morphine, saline Mined NIF for all references to GEO
ID’s: found small number where the
same dataset was represented in two
or more databases
http://www.chibi.ubc.ca/Gemma/home.html
How easy was it to compare?
 Gemma: Gene ID + Gene Symbol
 DRG: Gene name + Probe ID
 Gemma: Increased expression/decreased expression
 DRG: Increased expression/decreased expression
 But...Gemma presented results relative to baseline chronic morphine; DRG with
respect to saline, so direction of change is opposite in the 2 databases
 Analysis:
 1370 statements from Gemma regarding gene expression as a function of
chronicmorphine
 617 were consistent with DRG; over half of the claims of the paper were not
confirmed in this analysis
 Results for 1 gene were opposite in DRG and Gemma
 45 did not have enough information provided in the paper to make a judgment
NIF annotation
standard
Beware of False Dichotomies
 Top-down vs bottom up
 Light weight vs heavy weight
 “Chaotic Nihilists and Semantic Idealists”
 Text mining vs annotation
 Curators vs scientists
 Human vs machine
 DOI’svsURI’s
http://www.datanami.com/datanami/2013-02-
05/chaotic_nihilists_and_semantic_idealists.html
NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Interim PI
AmarnathGupta, UCSD, Co Investigator
Anita Bandrowski, NIF Project Leader
Gordon Shepherd,Yale University
Perry Miller
Luis Marenco
RixinWang
DavidVan Essen,Washington University
Erin Reid
Paul Sternberg, CalTech
ArunRangarajan
Hans Michael Muller
Yuling Li
GiorgioAscoli,George Mason University
SrideviPolavarum
Fahim Imam, NIF Ontology Engineer
Larry Lui
Andrea Arnaud Stagg
Jonathan Cachat
Jennifer Lawrence
Lee Hornbrook
Binh Ngo
VadimAstakhov
XufeiQian
Chris Condit
Mark Ellisman
Stephen Larson
WillieWong
TimClark, Harvard University
Paolo Ciccarese
Karen Skinner, NIH, Program Officer

More Related Content

What's hot

Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
Neuroscience Information Framework
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
Maryann Martone
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
Neuroscience Information Framework
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...
Neuroscience Information Framework
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...
Neuroscience Information Framework
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
Neuroscience Information Framework
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
Neuroscience Information Framework
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Amit Sheth
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Amit Sheth
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information Framework
Maryann Martone
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
Yasmine Gaber
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
Maryann Martone
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
NeuroMat
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
Maryann Martone
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
Maryann Martone
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Amit Sheth
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
EITESANGO
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
Alexander Pico
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
Alexander Pico
 

What's hot (19)

Big data from small data: A deep survey of the neuroscience landscape data via
Big data from small data:  A deep survey of the neuroscience landscape data viaBig data from small data:  A deep survey of the neuroscience landscape data via
Big data from small data: A deep survey of the neuroscience landscape data via
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information Framework
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
Technology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network RepresentationsTechnology R&D Theme 3: Multi-scale Network Representations
Technology R&D Theme 3: Multi-scale Network Representations
 
NRNB EAC Meeting 2012
NRNB EAC Meeting 2012NRNB EAC Meeting 2012
NRNB EAC Meeting 2012
 

Similar to The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
ASIS&T
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
mikaelhuss
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Neuroscience Information Framework
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
Artificial Intelligence Institute at UofSC
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systems
ramakanz
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
Paul Groth
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
Neuroscience Information Framework
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
Connected Data World
 
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
Neuroscience Information Framework
 
NIFSTD: A Comprehensive Ontology for Neuroscience
NIFSTD: A Comprehensive Ontology for NeuroscienceNIFSTD: A Comprehensive Ontology for Neuroscience
NIFSTD: A Comprehensive Ontology for Neuroscience
Neuroscience Information Framework
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
Bioinformatics Open Source Conference
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
Neuroscience Information Framework
 
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
Russ Altman
 
Open repositories for neuroimaging research
Open repositories for neuroimaging researchOpen repositories for neuroimaging research
Open repositories for neuroimaging research
Cameron Craddock
 
Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic Web
Rutger Vos
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neuroscience Information Framework
 
Genome data management
Genome data managementGenome data management
Genome data management
Shareb Ismaeel
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
Paul Groth
 
Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework.
Neuroscience Information Framework
 

Similar to The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework (20)

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
An Up-to-date Knowledge Base and Focused Exploration System for Human Perform...
 
A knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systemsA knowledge capture framework for domain specific search systems
A knowledge capture framework for domain specific search systems
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
Connected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul GrothConnected Data for Machine Learning | Paul Groth
Connected Data for Machine Learning | Paul Groth
 
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
NIFSTD and NeuroLex: A Comprehensive Ontology Development Based on Multiple B...
 
NIFSTD: A Comprehensive Ontology for Neuroscience
NIFSTD: A Comprehensive Ontology for NeuroscienceNIFSTD: A Comprehensive Ontology for Neuroscience
NIFSTD: A Comprehensive Ontology for Neuroscience
 
A01-Openness in knowledge-based systems
A01-Openness in knowledge-based systemsA01-Openness in knowledge-based systems
A01-Openness in knowledge-based systems
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
Neuroscience Information Framework Ontologies: Nerve cells in Neurolex and NI...
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
Open repositories for neuroimaging research
Open repositories for neuroimaging researchOpen repositories for neuroimaging research
Open repositories for neuroimaging research
 
Phyloinformatics and the Semantic Web
Phyloinformatics and the Semantic WebPhyloinformatics and the Semantic Web
Phyloinformatics and the Semantic Web
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
Genome data management
Genome data managementGenome data management
Genome data management
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework.
 

Recently uploaded

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
FODUU
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
Techgropse Pvt.Ltd.
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
SitimaJohn
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 

Recently uploaded (20)

Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Things to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUUThings to Consider When Choosing a Website Developer for your Website | FODUU
Things to Consider When Choosing a Website Developer for your Website | FODUU
 
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfAI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdf
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxOcean lotus Threat actors project by John Sitima 2024 (1).pptx
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 

The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework

  • 1. The real world of ontologies and phenotype representation: perspectives from the Neuroscience Information Framework Maryann Martone, Ph. D. University of California, San Diego
  • 2. “Neural Choreography” “A grand challenge in neuroscience is to elucidate brain function in relation to its multiple layers of organization that operate at different spatial and temporal scales. Central to this effort is tackling “neural choreography” -- the integrated functioning of neurons into brain circuits-- Neural choreography cannot be understood via a purely reductionist approach. Rather, it entails the convergent use of analytical and synthetic tools to gather, analyze and mine information from each level of analysis, and capture the emergence of new layers of function (or dysfunction) as we move from studying genes and proteins, to cells, circuits, thought, and behavior.... However, the neuroscience community is not yet fully engaged in exploiting the rich array of data currently available, nor is it adequately poised to capitalize on the forthcoming data explosion. “ Akil et al., Science, Feb 11, 2011
  • 3. “Data choreography”  In that same issue of Science  Asked peer reviewers from last year about the availability and use of data  About half of those polled store their data only in their laboratories—not an ideal long-term solution.  Many bemoaned the lack of common metadata and archives as a main impediment to using and storing data, and most of the respondents have no funding to support archiving  And even where accessible, much data in many fields is too poorly organized to enable it to be efficiently used.  “...it is a growing challenge to ensure that data produced during the course of reported research are appropriately described, standardized, archived, and available to all.” Lead Science editorial (Science 11 February 2011:Vol. 331 no. 6018 p. 649 )
  • 4.  NIF is an initiative of the NIH Blueprint consortium of institutes  What types of resources (data, tools, materials, services) are available to the neuroscience community?  How many are there?  What domains do they cover? What domains do they not cover?  Where are they?  Web sites  Databases  Literature  Supplementary material  Who uses them?  Who creates them?  How can we find them?  How can we make them better in the future? http://neuinfo.org • PDF files • Desk drawers
  • 5. In an ideal world... We’d like to be able to find:  What is known****:  What is the average diameter of a Purkinje neuron  IsGRM1 expressed In cerebral cortex?  What are the projections of hippocampus?  What genes have been found to be upregulated in chronic drug abuse in adults  Is alpha synuclein in the striatum?  What studies used my polyclonal antibody against GABA in humans?  What rat strains have been used most extensively in research during the last 20 years?  What is not known:  Connections among data  Gaps in knowledge Without some sort of framework, very difficult to RequiredComponents: – Query interface – Search strategies – Data sources – Infrastructure – Results display – Why did I get this result? – Analysis tools
  • 6. The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience  A portal for finding and using neuroscience resources  A consistent framework for describing resources  Provides simultaneous search of multiple types of information, organized by category  Supported by an expansive ontology for neuroscience  Utilizes advanced technologies to search the “hidden web” http://neuinfo.org UCSD,Yale, CalTech, George Mason, Washington Univ Supported by NIH Blueprint Literature Database Federation Registry
  • 7. We need more databases !? •NIF Registry: A catalog of neuroscience-relevant resources •> 5000 currently listed •> 2000 databases •And we are finding more every day
  • 8. NIF must work with ecosystem as it is today  NIF was one of the first projects to attempt data integration in the neurosciences on a large scale  NIF is supported by a contract that specified the number of resources to be added per year  Designed to be populated rapidly; set up process for progressive refinement  No budget was allocated to retrofit existing resources; had to work with them in their current state  We designed a system that required little to no cooperation or work from providers  NIF was required to assemble (not create) ontologies very fast and to provide a platform through which the community could view, comment and add  NIF is enriched by ontologies but does not depend on them  Took advantage of community ontologies  But needed to take a very pragmatic and aggressive approach to incorporating and using them  Neurolex semantic wiki
  • 9. What are the connections of the hippocampus? HippocampusOR “CornuAmmonis” OR “Ammon’s horn” Query expansion: Synonyms and related concepts Boolean queries Data sources categorized by “data type” and level of nervous system Common views across multiple sources Tutorials for using full resource when getting there from NIF Link back to record in original source
  • 10. Imminent: NIF 5.0  NIF 5.0 about to be released  New design  New query features  New analytics
  • 11. What do you mean by data? Databases come in many shapes and sizes  Primary data:  Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL)  Secondary data  Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS)  Tertiary data  Claims and assertions about the meaning of data  E.g., gene upregulation/downregulation,  Registries:  Metadata  Pointers to data sets or materials stored elsewhere  Data aggregators  Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede  Single source  Data acquired within a single context , e.g., Allen Brain Atlas Researchers are producing a variety of information resources using a multitude of technologies
  • 12. Exploration: Where is alpha synuclein? •Spatially: •Gene •Protein •Subcellular •Cellular •Regional •Organism •Semantically: •Gene regulation networks •Protein pathways •Cellular local connectivity •Regional connectivity •Who is studying it? •Who is funding its study? Networks exist across scales; all important in the nervous system
  • 13.  Set of modular ontologies  86, 000 + distinct concepts + synonyms  Bridge files between modules  Expressed in OWL-DL language  Currently supports OWL 2  Tries to follow OBO community best practices  Standardized to the same upper level ontologies  e.g., Basic Formal Ontology (BFO), OBO Relations Ontology (OBO-RO),  Imports existing community ontologies  e.g., CHEBI, GO, PRO, DOID, OBI etc.  Retains identifiers in most recent additions but reflects history 13 Covers major domains of neuroscience: Organisms, Brain Regions, Cells, Molecules, Subcellular parts, Diseases, Nervous system functions,Techniques NIFSTD Ontologies Fahim Imam, William Bug
  • 14. “Search computing”: Query by concept What genes are upregulated by drugs of abuse in the adult mouse? (show me the data!) Morphine Increased expression Adult Mouse Reasonable standards make it easy to search for and compare results
  • 15. Diseases of nervous system New: Data analytics NIF is in a unique position to answer questions about the neuroscience ecosystem using new analytics tools Neurodegenerative Seizuredisorders Neoplasticdiseaseofnervoussystem NIH Reporter NIFdatafederatedsources
  • 16. Results are organized within a common framework Connects to Synapsed with Synapsed by Input region innervates Axon innervates Projects toCellular contact Subcellular contact Source site Target site Each resource implements a different, though related model; systems are complex and difficult to learn, in many cases
  • 18. The scourge of neuroanatomical nomenclature: Importance of NIF semantic framework •NIFConnectivity: 7 databases containing connectivity primary data or claims from literature on connectivity between brain regions •BrainArchitecture Management System (rodent) •Temporal lobe.com (rodent) •ConnectomeWiki (human) •Brain Maps (various) •CoCoMac (primate cortex) •UCLA Multimodal database (Human fMRI) •Avian Brain Connectivity Database (Bird) •Total: 1800 unique brain terms (excluding Avian) •Number of exact terms used in > 1 database: 42 •Number of synonym matches: 99 •Number of 1st order partonomy matches: 385
  • 19. Why so many names?  The brain is perhaps unique among major organ systems in the multiplicity of naming schemes for its major and minor regions.  The brain has been divided based on topology of major features, cyto- and myelo-architecture, developmental boundaries, supposed evolutionary origins, histochemistry, gene expression and functional criteria.  The gross anatomy of the brain reflects the underlying networks only superficially, and thus any parcellation reflects a somewhat arbitrary division based on one or more of these criteria. The “activation map” images that commonly accompany brain imaging papers can be misleading to inexperienced readers, by seeming to suggest that the boundaries between “activated” and “unactivated” patches of cortex are unambigous and sharp. Instead, as most researchers are aware, the apparent sharp boundaries are subject to the choice of threshold applied to the statistical tests that generate the image.What, then, justifies dividing the cortex into regions with boundaries based on this fuzzy, mutable measure of functional profile? (Saxe et al., 2010, p. 39). Brainmaps.org
  • 20. Program on Ontologies for Neural Structures  International Neuroinformatics Coordinating Committee  Structural LexiconTask Force  Defining brain structures  Translate among terminologies  Neuronal RegistryTask Force  Consistent naming scheme for neurons  Knowledge base of neuron properties  Representation and DeploymentTask Force  Formal representation  Also interacts with Digital Atlasing Task Force http://incf.org
  • 21. NeuroLexWiki http://neurolex.org Stephen Larson •Provide a simple framework for defining the concepts required •Light weight semantics •Good teaching tool for learning about semantic integration and the benefits of a consistent semantic framework •Community based: •Anyone can contribute their terms, concepts, things •Anyone can edit •Anyone can link •Accessible: searched by Google •Building an extensive cross- disciplinary knowledge base for neuroscience Demo D03
  • 22. Defining nervous system structures Parcellation scheme: Set of parcels occupying part or all of an anatomical entity that has been delineated using a common approach or set of criteria, often in a single study.A parcellation scheme for any given individual entity may include gaps, transitional zones, or regions of uncertainty. A parcellation scheme derived from a set of individuals registered to a common target (atlas) may be probabilistic and include overlap of parcels in regions that reflect individual variability or imperfections in alignment. 14 parcellation schemes currently represented in Neurolex Documentation available INCF task force on ontologies
  • 23. Basic model: do not conflate conceptual structures with parcels Regional part of nervous system Functional part of nervous system Parcel overlaps overlaps overlaps Parcel Parcel Neuroscientists have a lot of different parcellation schemes because they have a lot of different ways of classifying brain structures and techniques to match them are imperfect
  • 24. Linking semantics to space: INCF Atlasing www.neurolex.org Link to spatial representation in scalable brain atlas Waxholm space Seth Ruffins,Alan Ruttenberg, Rembrandt Bakker
  • 25. Neurons in Neurolex  International Neuroinformatics Coordinating Facility (INCF) building a knowledge base of neurons and their properties via the NeurolexWiki  Led by Dr. Gordon Shepherd  Consistent and parseable naming scheme  Knowledge is readily accessible, editable and computable  While structure is imposed, don’t worry too much about the upper level classes of the ontology Stephen Larson
  • 26. A KNOWLEDGE BASE OF NEURONAL PROPERTIES 26Additional semantics added in NIFSTD by ontology engineer
  • 27. Concept-based search: search by meaning  Search Google: GABAergic neuron  Search NIF: GABAergic neuron  NIF automatically searches for types of GABAergic neurons Types of GABAergic neurons
  • 28. Challenges of multiscale neurodegenerative disease phenotypes •Neurodegenerative diseases target very specific cell populations •Model systems only replicate a subset of features of the disease •Related phenotypes occur across anatomical scales •Different vocabularies are used by different communities not not Midbrain degenerated Substantianigra decreased in volume Substantianigra pars compacta atrophied Loss of Snpcdopaminergic neurons Degeneration of nigrostriatal terminals Tyrosine-hydroxylase containing neurons degenerate
  • 29. Approach: Use ontologies to provide necessary knowledge for matching related phenotypes Sarah Maynard, Chris Mungall, Suzie Lewis, Fahim Imam Midbrain Substantianigr a Substantianigra pars compacta Substantianigra pars compacta dopamine cell Dopamine Neuron cell soma Neuron (CL) Part of neuron (GO) Small molecule (Chebi) Atrophied Decreased volume Fewer in number Degenerate Decreased in magnitude relative to some normal Has part Has part Is part of Has part Has part Is a Is a Is a Is a Entities Qualities NIFSTD/PKB OBO ontology
  • 30. Alzheimer’s disease Human (birnlex_516) Neocortex pyramidal neuron Increased number of Lipofuscin has part inheres in inheres in towards EQ Representation of Phenotypes in Neurodegenerative Disease: PATO and NIFSTD Instance: Human with Alzheimer’s disease 050 Phenotype birnlex_2087_56 inheres in about Chris Mungall, Suzanna Lewis Structured annotation model implemented in WIB
  • 31. OBD: Ontology based database  Provides a user interface for matching organisms based on similarity of phenotypes  Based on EQ model  Uses knowledge in the ontology to compute similarity scores and other statistical measures like information content http://www.berkeleybop.org/pkb/ Chris Mungall, Suzanna Lewis, Lawrence Berkeley Labs
  • 33. *B6CBA-TgN (HDexon1)62) that express exon1 of the human mutant HD gene- Li et al., J Neurosci, 21(21):8473-8481 PhenoSim: What organism is most similar to a human with Huntington’s disease? Putamen atrophied Globuspallidusneuropil degenerate Part of basal ganglia decreased in magnitude Fewer neostriatum medium spiny neurons in putamen Neurons in striatum degenerate Neuron in striatum decreased in magnitude Increased number of astrocytes in caudate nucleus Neurons in striatum degenerate Nervous system cell change in number in striatum
  • 34. Progressive enrichment Understanding and comparing phenotypes will be enriched through community knowledge bases like Neurolex Looking forward to continuing this as part of the Monarch project with Melissa Haendel, Chris Mungall and Suzie Lewis
  • 35. Top Down vs Bottom up Top-down ontology construction • A select few authors have write privileges • Maximizes consistency of terms with each other (automated consistency checking) • Making changes requires approval and re-publishing •Works best when domain to be organized has: small corpus, formal categories, stable entities, restricted entities, clear edges. •Works best with participants who are: expert catalogers, coordinated users, expert users, people with authoritative source of judgment Bottom-up ontology construction • Multiple participants can edit the ontology instantly (many eyes to correct errors) • Semantics are limited to what is convenient for the domain • Not a replacement for top-down construction; sometimes necessary to increase flexibility • Necessary when domain has: large corpus, no formal categories, no clear edges •Necessary when participants are: uncoordinated users, amateur users, naïve catalogers • Neuroscience is a domain that is less formal and neuroscientists are more uncoordinated NIFSTD NEUROLEX Important for Ontologists to define community contribution model
  • 36. It’s a messy ecosystem (and that’s OK) NIF favors a hybrid, tiered, federated system  Domain knowledge  Ontologies  Claims about results  Virtuoso RDF triples  Data  Data federation  Workflows  Narrative  Full text access Neuron Brain part Disease Organism Gene Caudate projects to Snpc Grm1 is upregulated in chronic cocaine Betz cells degenerate in ALS
  • 37. Musings from the NIF  No one can be stopped from doing what they need to do  Every resource is resource limited: few have enough time, money, staff or expertise required to do everything they would like  If the market can support 11 MRI databases, fine  Some consolidation, coordination is warranted though  Big, broad and messy beats small, narrow and neat  Without trying to integrate a lot of data, we will not know what needs to be done  A lot can be done with messy data; neatness helps though  Progressive refinement; addition of complexity through layers  Be flexible and opportunistic  A single optimal technology/container for all types of scientific data and information does not exist; technology is changing  Think globally; act locally:  No source, not even NIF, isTHE source; we are all a source
  • 38. Grabbing the long tail of small data  Analysis of NIF shows multiple databases with similar scope and content  Many contain partially overlapping data  Data “flows” from one resource to the next  Data is reinterpreted, reanalyzed or added to  Is duplication good or bad?
  • 39. Same data: different analysis Chronic vs acute morphine in striatum  Drug Related Gene database: extracted statements from figures, tables and supplementary data from published article  Gemma: Reanalyzed microarray results from GEO using different algorithms  Both provide results of increased or decreased expression as a function of experimental paradigm  4 strains of mice  3 conditions: chronic morphine, acute morphine, saline Mined NIF for all references to GEO ID’s: found small number where the same dataset was represented in two or more databases http://www.chibi.ubc.ca/Gemma/home.html
  • 40. How easy was it to compare?  Gemma: Gene ID + Gene Symbol  DRG: Gene name + Probe ID  Gemma: Increased expression/decreased expression  DRG: Increased expression/decreased expression  But...Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases  Analysis:  1370 statements from Gemma regarding gene expression as a function of chronicmorphine  617 were consistent with DRG; over half of the claims of the paper were not confirmed in this analysis  Results for 1 gene were opposite in DRG and Gemma  45 did not have enough information provided in the paper to make a judgment NIF annotation standard
  • 41. Beware of False Dichotomies  Top-down vs bottom up  Light weight vs heavy weight  “Chaotic Nihilists and Semantic Idealists”  Text mining vs annotation  Curators vs scientists  Human vs machine  DOI’svsURI’s http://www.datanami.com/datanami/2013-02- 05/chaotic_nihilists_and_semantic_idealists.html
  • 42. NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Interim PI AmarnathGupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd,Yale University Perry Miller Luis Marenco RixinWang DavidVan Essen,Washington University Erin Reid Paul Sternberg, CalTech ArunRangarajan Hans Michael Muller Yuling Li GiorgioAscoli,George Mason University SrideviPolavarum Fahim Imam, NIF Ontology Engineer Larry Lui Andrea Arnaud Stagg Jonathan Cachat Jennifer Lawrence Lee Hornbrook Binh Ngo VadimAstakhov XufeiQian Chris Condit Mark Ellisman Stephen Larson WillieWong TimClark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer

Editor's Notes

  1. Doesn’t do it well; doesn’t organize the results in a domain specific way; doesn’t search across itFor use as content goal Dynamic inventory for deep coverage of neuroscience data: Genes -> Systems
  2. What animal models show
  3. NIFSTD and PATO ontologies served as building blocks to build a phenotype model the ontologies provide relationships between neuroscience related terms provide a structure to qualities and allow related qualities to show relationships
  4. Need an interface to explore and ask questions. Cannot view as a graph. Need to be able to ask a question not in SPARQL and get an answer. Need a better interface to put things in. Discuss Neurolex and PKB. Doesn’t have to be perfect interface, but has to allow a domain expert to ask and answer questions..
  5. Indirect matches that match due to hierarchiesNOTE: should make diagram in the style of previous slides (not screenshot)
  6. In validating our results, we see three types of matches.The first are direct matchesNOTE: should make diagram in the style of previous slides (not screenshot)