SlideShare a Scribd company logo
1 of 65
How do we know what we don't
know? Exploring the data and
knowledge space through the
Neuroscience Information
Framework
Maryann E. Martone, Ph. D.
University of California, San Diego
Building Analytics for Integrated Neuroscience Data
Ontario Brain Institute May 28-29, 2014
We say this to each other all the
time, but we set up systems for
scholarly advancement and
communication that are the
antithesis of integrationWhole brain data
(20 um
microscopic MRI)
Mosiac LM
images (1 GB+)
Conventional LM
images
Individual cell
morphologies
EM volumes &
reconstructions
Solved molecular
structures
No single technology serves
these all equally well.
Multiple data types;
multiple scales; multiple
databases
A data integration problem
• NIF is an initiative of the NIH Blueprint consortium of institutes
– What types of resources (data, tools, materials, services) are available to the
neuroscience community?
– How many are there?
– What domains do they cover? What domains do they not cover?
– Where are they?
• Web sites
• Databases
• Literature
• Supplementary material
– Who uses them?
– Who creates them?
– How can we find them?
– How can we make them better in the future?
http://neuinfo.org
• PDF files
• Desk drawers
NIF has been
surveying,
cataloging and
tracking the
neuroscience
resource
landscape since
< 2008
Old Model: Single type of content; single
mode of distribution
Scholar
Library
Scholar
Publisher
Systems for cataloging, metadata standards, and citation in
place
Scholar
Consumer
Libraries
Data Repositories
Code Repositories
Community
databases/platforms
OA
Curators
Social
Networks
Social
NetworksSocial
Networks
Peer Reviewers
Narrative
Workflows
Data
Models
Multimedia
Nanopublications
Code
The duality of modern scholarship
Observation: Those who build information systems from the
machine side don’t understand the requirements of the
human very well
Those who build information systems from the human side,
don’t understand requirements of machines very well
Scholarship requires the ability to cite and track usage of
scholarly artifacts. In our current mode of working, there is no
way to track artifacts as they move through the ecosystem; no
way to incrementally add human expertise
NIF: A New Type of Entity for New Modes of
Scientific Dissemination
• NIF’s mission is to maximize the awareness of, access to
and utility of research resources produced worldwide to
enable better science and promote efficient use
– NIF unites neuroscience information without respect to domain,
funding agency, institute or community
– NIF is like a “Pub Med” for all biomedical resources and a “Pub
Med Central” for databases
– Makes them searchable from a single interface
– Practical and cost-effective; tries to be sensible
– Learned a lot about the effective data sharing
The Neuroscience Information Framework provides a rich data
source for understanding the current resource landscape
But we have Google!
• Current web is
designed to share
documents
– Documents are
unstructured data
• Much of the content
of digital resources is
part of the “hidden
web”
• Wikipedia: The Deep Web
(also called Deepnet, the
invisible Web, DarkNet,
Undernet or the hidden
Web) refers to World
Wide Web content that is
not part of the Surface
Web, which is indexed by
standard search engines.
Surveying the resource
landscape
~3000 databases
and datasets
Populate broadly and quickly with minimum
overhead to resource providers
•NIF curators
•Nomination by the community
•Semi-automated text mining
pipelines
NIF Registry
Requires no special skills
Site map available for
local hosting
•NIF Data Federation
•DISCO interop (Yale)
•Requires some
programming skill
•But designed for quick
ingestion
Bandrowski et al., Database, 2012
Data Federation: Deep search
http://neuinfo.org
With the thousands of databases and other information sources
available, simple descriptive metadata will not suffice
Subthalamus
Data about the subthalamus
http://neuinfo.org
NIF unifies look, feel and access
What do you mean by data?
Databases come in many shapes and sizes
• Primary data:
– Data available for reanalysis, e.g.,
microarray data sets from GEO;
brain images from XNAT;
microscopic images (CCDB/CIL)
• Secondary data
– Data features extracted through
data processing and sometimes
normalization, e.g, brain structure
volumes (IBVD), gene expression
levels (Allen Brain Atlas); brain
connectivity statements (BAMS)
• Tertiary data
– Claims and assertions about the
meaning of data
• E.g., gene
upregulation/downregulation,
brain activation as a function of
task
• Registries:
– Metadata
– Pointers to data sets or
materials stored elsewhere
• Data aggregators
– Aggregate data of the same
type from multiple sources,
e.g., Cell Image Library
,SUMSdb, Brede
• Single source
– Data acquired within a single
context , e.g., Allen Brain Atlas
Researchers are producing a variety of
information artifacts using a multitude of
technologies; many duplicate effort and
content
0
50
100
150
200
250
0.01
0.1
1
10
100
1000
Jun-08 Dec-08 Jul-09 Jan-10 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13
NumberofFederatedDatabases
NumberofFederatedRecords(Millions)
Data Federation Growth
NIF searches the largest collation of
neuroscience-relevant data on the web
DISCO
Purkinje
Cell
Axon
Terminal
Axon
Dendritic
Tree
Dendritic
Spine
Dendrite
Cell body
Cerebellar
cortex
Bringing knowledge to data: Ontologies as framework
There is little obvious connection between
data sets taken at different scales using
different microscopies without an explicit
representation of the biological objects that
the data represent
NIF Semantic Framework: NIFSTD ontology
• NIF uses ontologies to help navigate across and unify neuroscience
resources
• Ontologies are built from community ontologies  cross integration with
other domains
NIFSTD
Organism
NS FunctionMolecule Investigation
Subcellular
structure
Macromolecule Gene
Molecule Descriptors
Techniques
Reagent Protocols
Cell
Resource Instrument
Dysfunction Quality
Anatomical
Structure
NIF Ontologies provide standards for integration of diverse data;
available through NIF vocabulary services
NIF links neuroscience to other domains via
community ontologies
• NIF Subcellular = Gene Ontology Cell Component
• NIF Anatomy = UBERON cross-species ontology
(Includes FMA and Neuronames)
• NIF Disease = Disease Ontology
• NIF Organism = NCBI Taxonomy
• NIF Molecule = Chemicals of Biological Interest
(CHEBI); Protein Ontology
• NIF Cell/Investigation/Function = Developed largely
by neuroscience community
Use of ontology identifiers within data sources creates linkage across databases and
across domains; the more they are used, the better they become
: C
Neurolex: > 1 million triples
Dr. Yi Zeng: Chinese neural knowledge base
NIF Cell Graph
This is your brain on computers
Concept-based search: Query by meaning
NIF provides formal definitions of many neuroscience terms
= brain region without a blood brain
barrier
Ontologies as a data integration framework
•NIF Connectivity: 7 databases containing connectivity primary data or claims
from literature on connectivity between brain regions
•Brain Architecture Management System (rodent)
•Temporal lobe.com (rodent)
•Connectome Wiki (human)
•Brain Maps (various)
•CoCoMac (primate cortex)
•UCLA Multimodal database (Human fMRI)
•Avian Brain Connectivity Database (Bird)
•Total: 1800 unique brain terms (excluding Avian)
•Number of exact terms used in > 1 database: 42
•Number of synonym matches: 99
•Number of 1st order partonomy matches: 385
Building a knowledge space for
neuroscience: Neurolex.org
http://neurolex.org
•Semantic MediWiki
•Provide a simple interface
for defining the concepts
required
•Light weight semantics
•Community based:
•Anyone can contribute their
terms, concepts, things
•Anyone can edit
•Anyone can link
•Accessible: searched by Google
•Growing into a significant
knowledge base for
neuroscience
•33,000 concepts
200,000
edits
150
contributors
Larson and Martone Frontiers in Neuroinformatics, 2013
“When I use a word...it means what I choose it
to mean”
Formalization lets us develop
metrics for the precision of the
terms we use
Mapping the known unknowns
Comprehensive ontologies provide an accounting of what we
think we know
Where are the data relative to what we think we know?
Striatum
Hypothalamus
Olfactory bulb
Cerebral cortex
Brain
Brainregion
Data source
0
1-10
11-100
>101
Open World-Closed World: Mapping the knowledge - data space
Data Sources
NIF lets us ask: where isn’t there data? What isn’t studied? Why?
Forebrain
Midbrain
Hindbrain
0
1-10
11-100
>101
Data Sources
Open World-Closed World: Mapping the knowledge - data space
Junk brain regions?
SW Oh et al. Nature 000, 1-8 (2014) doi:10.1038/nature13186
Adult mouse brain connectivity matrix: revenge of the
midbrain
The tale of the tail
“Human neuroimaging typically is performed on a whole brain basis.
However, for several reasons tail of the caudate activity can easily be missed.
•One reason is limitations in the normalization algorithms, that typically are
optimized to maximize accuracy for cortical rather than subcortical
structures. ...
•A second reason is that standard neuroimaging atlases such as the Harvard-
Oxford structural atlas used with neuroimaging analysis programs such as
FreeSurfer truncate the caudate at the body, and completely exclude the
tail...
•A final reason is that the tail of the caudate is close to the hippocampus, and
could be misidentified as such especially in tasks involving learning and
memory.
Therefore, the tail of the caudate may be recruited in additional cognitive
tasks, but yet not have been properly identified and reported in the
neuroimaging literature”
Seger CA. The visual corticostriatal loop through the tail of the caudate: circuitry and function. Front
Syst Neurosci. 2013 Dec 6;7:104. doi: 10.3389/fnsys.2013.00104. eCollection 2013.
fMRI Cerebellum
When results contradict a current theory, they may be ignored
“The Data Homunculus”
Funding drives representation in the data space
NIF Reports: Male
vs Female circa 2012
Gender bias
When data is not
made available, the
data space is an
incomplete record
of what is available
How much information makes it into
the data space?
∞
What is easily machine
processable and accessible
What is potentially knowable
What is known:
Literature, images, human
knowledge
Unstructured; Natural
language processing,
entity recognition,
image processing and
analysis; paywalls; file
drawers
Abstracts vs full
text vs tables etc
Estimates that > 50% scientific output is not recovered
Chan et al. Lancet, 383, 2014
Data sharing in the long tail of neurosciences
A place for my data
NIF lists over 350 data repositories=accept data
contributions from the community
“Empty Archives”
Repository Type of Data
Date
started Host
Public
data Comments
CARMEN
neuroscience /
electrophysiology 2008
Newcastle University; United
Kingdom 100 Requires account
INCF Dataspace various 2012
International
Neuroinformatics
Coordinating Facility ?
Open Source Brain models 2014 University College London 47 Cells and Networks; 23 (Technology -showcases)
XNAT Central Neuroimaging 2010
Washington University
School of Medicine in St.
Louis; Missouri; USA 34
States 370 projects, 3804 subjects, and 5172
imaging sessions. 123 were visible but do not all
appear to be public. 34 public data were listed
under “Recent”
Open Connectome
Serial electron
Microscopy and
Magnetic Resonance 2011
Johns Hopkins University;
Maryland; USA (graphs) 9 9, 7 - image projects; 19 - graphs
UCSF DataShare
biomedical including
neuroimaging, MRI,
cognitive
impairment,
dementia, aging 2011
University of California at San
Francisco; California; USA 15
BrainLiner
various functional
data 2011 ATR; Kyoto; Japan 10
ModelDB neuron models 1996
Yale University; Connecticut;
USA 875
NeuroMorpho
digitally
reconstructed
neurons 2006
George Mason University;
Virginia; USA 10004
Cell Image
Library/Cell
Centered Database
images, videos, and
animations of cell
2002 CCDB
2010 CIL
American Society for Cell
Biology / University of
California at San Diego;
California; USA 10,360
The CCDB had 450 data sets when it merged with
CIL. CIL also contains large imaging data sets that
are not counted as separate images
CRCNS
computational
neuroscience
datasets 2008
University of California at
Berkeley; California; USA 38
OpenfMRI fMRI 2012
University of Texas at Austin;
Texas; USA 22
NeuroMorpho.org =
10,000 neuronal
reconstructions
from ~200 labs
Cell Image Library =
10,000 image sets
from 1500
individuals
“I finally gave NeuroMorpho my data so they would stop
Attitudes towards data sharing
“Pry it from my cold, dead
fingers”
“Done”
“You can have it if you really
want”
•Lack of time and resources
•Lack of incentives
•Fear of being scooped
•Fear of being criticized
•Fear that data will be misused
•Data sharing is a waste of time
AlwaysNever
Reasons for not making data available
Tenopir, C. et al. Data sharing by scientists: practices and perceptions. PLoS One 6,
e21101, doi:10.1371/journal.pone.0021101 (2011)
Many make data
available via web sites
or via supplementary
material
Multivariate analysis of the SCI syndrome using data from two research sites.
Ferguson AR, Irvine K-A, Gensel JC, Nielson JL, et al. (2013) Derivation of Multivariate Syndromic Outcome Metrics for Consistent Testing across Multiple
Models of Cervical Spinal Cord Injury in Rats. PLoS ONE 8(3): e59712. doi:10.1371/journal.pone.0059712
http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059712
Incentives: New solutions• New journals
for data,
where focus
is on data not
results
• Data must be
deposited in a
recognized
repository
– Persistent
identifier
assigned
• Standards for
metadata and
data types
Nature Scientific Data
Incentives: Data citations
• Many groups are
developing
guidelines for
creating a system
of citation for data
used in a study
• First step for
providing an
incentive system
for data sharing
• Currently, very
difficult to track
use of data in
articles
http://www.force11.or
g/datacitation
“Sound, reproducible scholarship rests upon
a foundation of robust, accessible data. Data
should be considered legitimate, citable
products of research. Data citation, like the
citation of other evidence and sources, is
good research practice.”
-Joint Declaration of Data Citation
Principles
Future of Research Communications and e-Scholarship; FORCE11
1. Importance
2. Credit and attribution
3. Evidence
4. Unique Identification
5. Access
6. Persistence
7. Specificity and verifiability
8. Interoperability and
flexibility
Unique ID’s for all! Resource Identification
Initiative
• It is currently impossible to
query the biomedical
literature to find out what
research resources have
been used to produce the
results of a study
-authors don’t provide enough
information to
unambiguously identify
key research resources
• Impossible to find all
studies that used a
resource
• Critical for reproducibility
and data mining
• Critical for trouble-
shooting
http://www.force11.org/resource_identification_initiative
Faulty Antibodies Continue to Enter US and
European Markets, Warns Top Clinical
Chemistry Researcher-Genome Web Daily,
October 11, 2013
Resource Identification Initiative
• Have authors supply
appropriate identifiers for
key resources used within
a study such that they
are:
– Machine processible (i.e.,
unique identifier that
resolves to a single
resource)
– Outside of the paywall
– Uniform across journals
and publishers
Launched February 2014: > 30 journals
participating
Anita Bandrowski, Nicole Vasilevsky,
Matthew Brush, Melissa Haendel and
the RINL group
Pilot Project
• Have authors identify 3 different
types of research resources:
– Software tools and databases
– Antibodies
– Genetically modified animals
• Include RRID in methods section
• RRID=RRID:Accession number
– Just a string at this point
• Voluntary for authors
• Journals did not have to modify
their submission system
• Journals have flexibility in
implementation. Send request to
author at:
– Submission
– During review
– After acceptance
Sources: NIF Registry, NIF Antibody Registry, Model Organism Databases
Resource Identification Portal: Aggregates
accession numbers from >10 different
databases that are the authorities for
registering research resources
First results are in the literature
Google Scholar: Search RRID; select since 2014
What studies used X?
To date:
•30 articles have appeared
•2 articles have disappeared, i.e.,
the RRID’s were removed at
copyediting
•195 RRID’s were reported
•14 were in error = 0.7%
•> 200 antibodies were added
•> 75 software tools/databases
were added
•A resolver service has been
created
•3rd party tools are being created
to provide linkage between
resources and papers
RRID:nif-0000-30467
Authors did not deliberately leave out identifying information; they
just hadn’t thought about it
What have we learned?
Utopia plug-in: Steve Pettifer
•Authors are willing to
adopt new types of
citations and citation
styles; you just have to
ask
•RRID = usage of
research resource
•Ideal: resolved by
search engines without
requiring specialized
citation services
•Citation drives
registration
•Clear role for
repositories as
authorities
Digital objects are a new beast
RRID: Provides foundation for establishing an
alerting service for research resources
Trust: Not just
who produced it
but what
produced it
Community
database:
beginning
Community
database:
End
Register your resource to NIF!
“How do I share my
data/tool?”
“There is no database
for my data”
1
2
3
4
Institutional
repositories
Cloud
INCF: Global
infrastructure
Government
Education
Industry
NIF provides the “glue” for a functioning ecosystem of data and tools
Tool repositories
Standards
Brokering
Archiving
Article
Code
Blogs
Workflows
Data
Persistent Identifiers Portals
Persistent Identifiers
Persistent Identifiers
Unique and persistent identifiers and a system for
referencing them allow an ecosystem to function
An ecosystem for research objects: the social network of
research resources
Data
Data
Code
Code
Blogs
Blogs
Workflows
Workflows
Portals
Portals
Search engines
Musings from the NIF
• Analytics let us to take a global view of data
– By bringing in a knowledge framework, we can look at positive and negative space
• Well-populated data resources are critical to moving analytics forward
– Comprehensive, i.e. they have most of the data that are available
– Much can be learned even from messy data, but reasonable standards help
– Active outreach is required
• Technological barriers to widespread data sharing are diminishing
– Best practices are emerging
– General and focused repositories are available, although sustainability of these is a problem
• There is a lot of neuroscience data available, but a culture of routine data sharing
does not yet exist in neuroscience
– But encouraging signs that it is largely due to lack of time and means, not lack of desire
– It is up to us to change the incentive system to support the best science possible
• Most scientists are not adept at managing or curating their own data
– Role for repositories and data curators
• Pieces of a functioning ecosystem are in place
– Think about how you fit into the ecosystem
NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Co-PI
Amarnath Gupta, UCSD, Co Investigator
Anita Bandrowski, NIF Project Leader
Gordon Shepherd, Yale University
Perry Miller
Luis Marenco
Rixin Wang
David Van Essen, Washington University
Erin Reid
Paul Sternberg, Cal Tech
Arun Rangarajan
Hans Michael Muller
Yuling Li
Giorgio Ascoli, George Mason University
Sridevi Polavarum
Yueling Li, UCSD
Trish Whetzel, UCSD
Fahim Imam
Larry Lui
Andrea Arnaud Stagg
Jonathan Cachat
Svetlana Sulima
Burak Ozyrt
Davis Banks
Vadim Astakhov
Xufei Qian
Chris Condit
Mark Ellisman
Stephen Larson
Willie Wong
Tim Clark, Harvard University
Paolo Ciccarese
Karen Skinner, NIH, Program Officer
(retired)
Jonathan Pollock, NIH, Program Officer
And my colleagues in Monarch, dkNet, 3DVC, Force 11
Melissa Haendel, OHSU**
Nicole Vasilevsky
Matthew Brush
**Monarch and
Resource
Identification
Initiative
Creating an on-line knowledge space for
neuroscience
Pages are related through properties
Red Links: Information is missing (or misspelled)
Neurolex Neuron
• Led by Dr. Gordon
Shepherd
• > 30 world wide
experts
• Simple set of
properties
• Consistent naming
scheme
• Integrated with
Structural Lexicon
• Used for annotation in
other resources, e.g.,
NeuroElectro
Location of Cell Soma
Location of dendrites
Location of local axon
arbor
Analysis of Red Links in the Neuron Registry
• INCF Project
– Neuron Registry
– > 30 experts
worldwide
– Fill out neuron
pages in Neurolex
Wiki
– Led by Dr. Gordon
Shepherd
Soma location
Dendrite location
Axon location
0
50
100
150
200
250
300
Number
Total
redlinks
easy fixes
hard fixes
Soma location
Dendrite location
Axon location
Social networks and community sites let us learn things from the
collective behavior of contributors  INCF/HBP Knowledge Space
Structural Lexicon in Neurolex
Brain
Region
Brain
Parcel
•Trans-species
•“Stateless”, i.e. no universal defining
criteria
•General structures and partonomies
based on Neuroanatomy 101
Partially overlaps
e.g., Hippocampus, Dentate gyrus
•Species specific
•Specific reference
•Defining criteria
•Sometimes partonomy;
sometimes not
e.g., Hippocampus of ABA2009
Standards support diversity
Is there a framework for neuroscience?
• Of the ~ 4000 columns
that NIF queries,
~1300 map to one of
our core categories:
– Organism
– Anatomical structure
– Cell
– Molecule
– Function
– Dysfunction
– Technique
• 30-50% of NIF’s
queries autocomplete
• When NIF combines
multiple sources, a set
of common fields
emerges
– >Basic information
models/semantic
models exist for
certain types of
entities
Biomedical science does have a conceptual framework
What would a 21st century platform for
scholarship look like?
D
K
Macroinformatics
NIF: Sensors and monitors for the resource ecosystem
Exposing knowledge to the web
Because they are static URL’s, Wikis are searchable by
Google
NIF provides a rich source of information on
digital resources
• Analytics let us to take a global view of data
– By bringing in a knowledge framework, we can look at positive and negative space
• Well-populated data resources are critical to moving analytics forward
– Comprehensive, i.e. they have most of the data that are available
– Much can be learned even from messy data, but reasonable standards help
– Active outreach is required
• Technological barriers to widespread data sharing are diminishing
– Best practices are emerging
– General and focused repositories are available, although sustainability of these is a
problem
• There is a lot of neuroscience data available, but a culture of routine data sharing
does not yet exist in neuroscience
– But encouraging signs that it is largely due to lack of time and means, not lack of
agreement
• Most scientists are not adept at managing or curating their own data
– Role for repositories and data curators
• Pieces of a functioning ecosystem are in place; think globally
Not just science, but data policy should be data driven
Same data: different analysis
• Gemma: Gene ID + Gene Symbol
• DRG: Gene name + Probe ID
• Gemma presented results relative to baseline chronic
morphine; DRG with respect to saline, so direction of change is
opposite in the 2 databases
Chronic vs acute morphine in striatum
• Analysis:
•1370 statements from Gemma regarding gene expression as a function of chronic
morphine
•617 were consistent with DRG;  over half of the claims of the paper were not
confirmed in this analysis
•Results for 1 gene were opposite in DRG and Gemma
•45 did not have enough information provided in the paper to make a judgment
Relatively simple standards would make it easier to
perform comparisons across the ecosystem
Musings from the NIF
• Every resource is resource limited: few have enough time, money, staff or
expertise required to do everything they would like
– If the market can support 11 MRI databases, fine
– Some consolidation, coordination is warranted
– How can industry help support the data space? How can they take them even further?
– Don’t let the data space become fractured
• Big, broad and messy beats small, narrow and neat
– Without trying to integrate a lot of data, we will not know what needs to be done
– Progressive refinement; addition of complexity through layers
• Be flexible and opportunistic: assume all will change
– A single optimal technology/container for all types of scientific data and information does not
exist; technology is changing
• Think globally; act locally:
– No source, not even NIF, is THE source; we are all a source
– System and culture to be able to learn from everyting
– Cooperative model for biomedicine

More Related Content

What's hot

Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemMaryann Martone
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information FrameworkMaryann Martone
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...Neuroscience Information Framework
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...Neuroscience Information Framework
 
Application and Implementation of different deep learning
Application and Implementation of different deep learningApplication and Implementation of different deep learning
Application and Implementation of different deep learningJIEJackyZOUChou
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementNeuroMat
 
Reusable Science: How not to slip from the shoulders of giants
Reusable Science: How not to slip from the shoulders of giantsReusable Science: How not to slip from the shoulders of giants
Reusable Science: How not to slip from the shoulders of giantsKrzysztof Gorgolewski
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...Maryann Martone
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...Maryann Martone
 

What's hot (17)

Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
Neuroscience as networked science
Neuroscience as networked scienceNeuroscience as networked science
Neuroscience as networked science
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information Framework
 
The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...The Neuroscience Information Framework: Making Resources Discoverable for the...
The Neuroscience Information Framework: Making Resources Discoverable for the...
 
Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Neuroinformatics
NeuroinformaticsNeuroinformatics
Neuroinformatics
 
Application and Implementation of different deep learning
Application and Implementation of different deep learningApplication and Implementation of different deep learning
Application and Implementation of different deep learning
 
NEUROINFORMATICS
NEUROINFORMATICSNEUROINFORMATICS
NEUROINFORMATICS
 
Data Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow ManagementData Provenance and Scientific Workflow Management
Data Provenance and Scientific Workflow Management
 
Reusable Science: How not to slip from the shoulders of giants
Reusable Science: How not to slip from the shoulders of giantsReusable Science: How not to slip from the shoulders of giants
Reusable Science: How not to slip from the shoulders of giants
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
 

Similar to How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
 
Open repositories for neuroimaging research
Open repositories for neuroimaging researchOpen repositories for neuroimaging research
Open repositories for neuroimaging researchCameron Craddock
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neuroscience Information Framework
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Neuroscience Information Framework
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicinePaul Groth
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Artificial Intelligence Institute at UofSC
 
What is it about the human brain that makes us smarter than other animals.pdf
What is it about the human brain that makes us smarter than other animals.pdfWhat is it about the human brain that makes us smarter than other animals.pdf
What is it about the human brain that makes us smarter than other animals.pdfRazaAliKhan10
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsPaul Groth
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
eLearning, Interactive Hypermedia, Neuroscience Icalt06 presentation
eLearning, Interactive Hypermedia, Neuroscience Icalt06 presentationeLearning, Interactive Hypermedia, Neuroscience Icalt06 presentation
eLearning, Interactive Hypermedia, Neuroscience Icalt06 presentationJaved Alam
 
Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Neuroscience Information Framework
 
Introduction to Neural Networks.pptx
Introduction to Neural Networks.pptxIntroduction to Neural Networks.pptx
Introduction to Neural Networks.pptxSowmiyaBaskar4
 

Similar to How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework (20)

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Open repositories for neuroimaging research
Open repositories for neuroimaging researchOpen repositories for neuroimaging research
Open repositories for neuroimaging research
 
INCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource LayerINCF 2013 - Uniform Resource Layer
INCF 2013 - Uniform Resource Layer
 
The Uniform Resource Layer
The Uniform Resource LayerThe Uniform Resource Layer
The Uniform Resource Layer
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
Paul Groth
Paul GrothPaul Groth
Paul Groth
 
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
Publishing for the 21st Century: Experiences from the NEUROSCIENCE INFORMATIO...
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
 
NIFSTD: A Comprehensive Ontology for Neuroscience
NIFSTD: A Comprehensive Ontology for NeuroscienceNIFSTD: A Comprehensive Ontology for Neuroscience
NIFSTD: A Comprehensive Ontology for Neuroscience
 
What is it about the human brain that makes us smarter than other animals.pdf
What is it about the human brain that makes us smarter than other animals.pdfWhat is it about the human brain that makes us smarter than other animals.pdf
What is it about the human brain that makes us smarter than other animals.pdf
 
Diversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domainsDiversity and Depth: Implementing AI across many long tail domains
Diversity and Depth: Implementing AI across many long tail domains
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
eLearning, Interactive Hypermedia, Neuroscience Icalt06 presentation
eLearning, Interactive Hypermedia, Neuroscience Icalt06 presentationeLearning, Interactive Hypermedia, Neuroscience Icalt06 presentation
eLearning, Interactive Hypermedia, Neuroscience Icalt06 presentation
 
What are Databases?
What are Databases?What are Databases?
What are Databases?
 
Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework. Where are the Data? Perspectives from the Neuroscience Information Framework.
Where are the Data? Perspectives from the Neuroscience Information Framework.
 
Introduction to Neural Networks.pptx
Introduction to Neural Networks.pptxIntroduction to Neural Networks.pptx
Introduction to Neural Networks.pptx
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 

How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework

  • 1. How do we know what we don't know? Exploring the data and knowledge space through the Neuroscience Information Framework Maryann E. Martone, Ph. D. University of California, San Diego Building Analytics for Integrated Neuroscience Data Ontario Brain Institute May 28-29, 2014
  • 2. We say this to each other all the time, but we set up systems for scholarly advancement and communication that are the antithesis of integrationWhole brain data (20 um microscopic MRI) Mosiac LM images (1 GB+) Conventional LM images Individual cell morphologies EM volumes & reconstructions Solved molecular structures No single technology serves these all equally well. Multiple data types; multiple scales; multiple databases A data integration problem
  • 3. • NIF is an initiative of the NIH Blueprint consortium of institutes – What types of resources (data, tools, materials, services) are available to the neuroscience community? – How many are there? – What domains do they cover? What domains do they not cover? – Where are they? • Web sites • Databases • Literature • Supplementary material – Who uses them? – Who creates them? – How can we find them? – How can we make them better in the future? http://neuinfo.org • PDF files • Desk drawers NIF has been surveying, cataloging and tracking the neuroscience resource landscape since < 2008
  • 4. Old Model: Single type of content; single mode of distribution Scholar Library Scholar Publisher Systems for cataloging, metadata standards, and citation in place
  • 6. The duality of modern scholarship Observation: Those who build information systems from the machine side don’t understand the requirements of the human very well Those who build information systems from the human side, don’t understand requirements of machines very well Scholarship requires the ability to cite and track usage of scholarly artifacts. In our current mode of working, there is no way to track artifacts as they move through the ecosystem; no way to incrementally add human expertise
  • 7. NIF: A New Type of Entity for New Modes of Scientific Dissemination • NIF’s mission is to maximize the awareness of, access to and utility of research resources produced worldwide to enable better science and promote efficient use – NIF unites neuroscience information without respect to domain, funding agency, institute or community – NIF is like a “Pub Med” for all biomedical resources and a “Pub Med Central” for databases – Makes them searchable from a single interface – Practical and cost-effective; tries to be sensible – Learned a lot about the effective data sharing The Neuroscience Information Framework provides a rich data source for understanding the current resource landscape
  • 8. But we have Google! • Current web is designed to share documents – Documents are unstructured data • Much of the content of digital resources is part of the “hidden web” • Wikipedia: The Deep Web (also called Deepnet, the invisible Web, DarkNet, Undernet or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.
  • 9. Surveying the resource landscape ~3000 databases and datasets
  • 10. Populate broadly and quickly with minimum overhead to resource providers •NIF curators •Nomination by the community •Semi-automated text mining pipelines NIF Registry Requires no special skills Site map available for local hosting •NIF Data Federation •DISCO interop (Yale) •Requires some programming skill •But designed for quick ingestion Bandrowski et al., Database, 2012
  • 11. Data Federation: Deep search http://neuinfo.org With the thousands of databases and other information sources available, simple descriptive metadata will not suffice Subthalamus
  • 12. Data about the subthalamus http://neuinfo.org
  • 13. NIF unifies look, feel and access
  • 14. What do you mean by data? Databases come in many shapes and sizes • Primary data: – Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) • Secondary data – Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) • Tertiary data – Claims and assertions about the meaning of data • E.g., gene upregulation/downregulation, brain activation as a function of task • Registries: – Metadata – Pointers to data sets or materials stored elsewhere • Data aggregators – Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede • Single source – Data acquired within a single context , e.g., Allen Brain Atlas Researchers are producing a variety of information artifacts using a multitude of technologies; many duplicate effort and content
  • 15. 0 50 100 150 200 250 0.01 0.1 1 10 100 1000 Jun-08 Dec-08 Jul-09 Jan-10 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 NumberofFederatedDatabases NumberofFederatedRecords(Millions) Data Federation Growth NIF searches the largest collation of neuroscience-relevant data on the web DISCO
  • 16. Purkinje Cell Axon Terminal Axon Dendritic Tree Dendritic Spine Dendrite Cell body Cerebellar cortex Bringing knowledge to data: Ontologies as framework There is little obvious connection between data sets taken at different scales using different microscopies without an explicit representation of the biological objects that the data represent
  • 17. NIF Semantic Framework: NIFSTD ontology • NIF uses ontologies to help navigate across and unify neuroscience resources • Ontologies are built from community ontologies  cross integration with other domains NIFSTD Organism NS FunctionMolecule Investigation Subcellular structure Macromolecule Gene Molecule Descriptors Techniques Reagent Protocols Cell Resource Instrument Dysfunction Quality Anatomical Structure NIF Ontologies provide standards for integration of diverse data; available through NIF vocabulary services
  • 18. NIF links neuroscience to other domains via community ontologies • NIF Subcellular = Gene Ontology Cell Component • NIF Anatomy = UBERON cross-species ontology (Includes FMA and Neuronames) • NIF Disease = Disease Ontology • NIF Organism = NCBI Taxonomy • NIF Molecule = Chemicals of Biological Interest (CHEBI); Protein Ontology • NIF Cell/Investigation/Function = Developed largely by neuroscience community Use of ontology identifiers within data sources creates linkage across databases and across domains; the more they are used, the better they become
  • 19. : C Neurolex: > 1 million triples Dr. Yi Zeng: Chinese neural knowledge base NIF Cell Graph This is your brain on computers
  • 20. Concept-based search: Query by meaning NIF provides formal definitions of many neuroscience terms = brain region without a blood brain barrier
  • 21. Ontologies as a data integration framework •NIF Connectivity: 7 databases containing connectivity primary data or claims from literature on connectivity between brain regions •Brain Architecture Management System (rodent) •Temporal lobe.com (rodent) •Connectome Wiki (human) •Brain Maps (various) •CoCoMac (primate cortex) •UCLA Multimodal database (Human fMRI) •Avian Brain Connectivity Database (Bird) •Total: 1800 unique brain terms (excluding Avian) •Number of exact terms used in > 1 database: 42 •Number of synonym matches: 99 •Number of 1st order partonomy matches: 385
  • 22. Building a knowledge space for neuroscience: Neurolex.org http://neurolex.org •Semantic MediWiki •Provide a simple interface for defining the concepts required •Light weight semantics •Community based: •Anyone can contribute their terms, concepts, things •Anyone can edit •Anyone can link •Accessible: searched by Google •Growing into a significant knowledge base for neuroscience •33,000 concepts 200,000 edits 150 contributors Larson and Martone Frontiers in Neuroinformatics, 2013
  • 23. “When I use a word...it means what I choose it to mean” Formalization lets us develop metrics for the precision of the terms we use
  • 24. Mapping the known unknowns Comprehensive ontologies provide an accounting of what we think we know Where are the data relative to what we think we know? Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brainregion Data source
  • 25. 0 1-10 11-100 >101 Open World-Closed World: Mapping the knowledge - data space Data Sources NIF lets us ask: where isn’t there data? What isn’t studied? Why?
  • 26. Forebrain Midbrain Hindbrain 0 1-10 11-100 >101 Data Sources Open World-Closed World: Mapping the knowledge - data space Junk brain regions?
  • 27. SW Oh et al. Nature 000, 1-8 (2014) doi:10.1038/nature13186 Adult mouse brain connectivity matrix: revenge of the midbrain
  • 28. The tale of the tail “Human neuroimaging typically is performed on a whole brain basis. However, for several reasons tail of the caudate activity can easily be missed. •One reason is limitations in the normalization algorithms, that typically are optimized to maximize accuracy for cortical rather than subcortical structures. ... •A second reason is that standard neuroimaging atlases such as the Harvard- Oxford structural atlas used with neuroimaging analysis programs such as FreeSurfer truncate the caudate at the body, and completely exclude the tail... •A final reason is that the tail of the caudate is close to the hippocampus, and could be misidentified as such especially in tasks involving learning and memory. Therefore, the tail of the caudate may be recruited in additional cognitive tasks, but yet not have been properly identified and reported in the neuroimaging literature” Seger CA. The visual corticostriatal loop through the tail of the caudate: circuitry and function. Front Syst Neurosci. 2013 Dec 6;7:104. doi: 10.3389/fnsys.2013.00104. eCollection 2013.
  • 29. fMRI Cerebellum When results contradict a current theory, they may be ignored
  • 30. “The Data Homunculus” Funding drives representation in the data space
  • 31. NIF Reports: Male vs Female circa 2012 Gender bias When data is not made available, the data space is an incomplete record of what is available
  • 32. How much information makes it into the data space? ∞ What is easily machine processable and accessible What is potentially knowable What is known: Literature, images, human knowledge Unstructured; Natural language processing, entity recognition, image processing and analysis; paywalls; file drawers Abstracts vs full text vs tables etc Estimates that > 50% scientific output is not recovered Chan et al. Lancet, 383, 2014
  • 33. Data sharing in the long tail of neurosciences
  • 34. A place for my data NIF lists over 350 data repositories=accept data contributions from the community
  • 35. “Empty Archives” Repository Type of Data Date started Host Public data Comments CARMEN neuroscience / electrophysiology 2008 Newcastle University; United Kingdom 100 Requires account INCF Dataspace various 2012 International Neuroinformatics Coordinating Facility ? Open Source Brain models 2014 University College London 47 Cells and Networks; 23 (Technology -showcases) XNAT Central Neuroimaging 2010 Washington University School of Medicine in St. Louis; Missouri; USA 34 States 370 projects, 3804 subjects, and 5172 imaging sessions. 123 were visible but do not all appear to be public. 34 public data were listed under “Recent” Open Connectome Serial electron Microscopy and Magnetic Resonance 2011 Johns Hopkins University; Maryland; USA (graphs) 9 9, 7 - image projects; 19 - graphs UCSF DataShare biomedical including neuroimaging, MRI, cognitive impairment, dementia, aging 2011 University of California at San Francisco; California; USA 15 BrainLiner various functional data 2011 ATR; Kyoto; Japan 10 ModelDB neuron models 1996 Yale University; Connecticut; USA 875 NeuroMorpho digitally reconstructed neurons 2006 George Mason University; Virginia; USA 10004 Cell Image Library/Cell Centered Database images, videos, and animations of cell 2002 CCDB 2010 CIL American Society for Cell Biology / University of California at San Diego; California; USA 10,360 The CCDB had 450 data sets when it merged with CIL. CIL also contains large imaging data sets that are not counted as separate images CRCNS computational neuroscience datasets 2008 University of California at Berkeley; California; USA 38 OpenfMRI fMRI 2012 University of Texas at Austin; Texas; USA 22 NeuroMorpho.org = 10,000 neuronal reconstructions from ~200 labs Cell Image Library = 10,000 image sets from 1500 individuals “I finally gave NeuroMorpho my data so they would stop
  • 36. Attitudes towards data sharing “Pry it from my cold, dead fingers” “Done” “You can have it if you really want” •Lack of time and resources •Lack of incentives •Fear of being scooped •Fear of being criticized •Fear that data will be misused •Data sharing is a waste of time AlwaysNever Reasons for not making data available Tenopir, C. et al. Data sharing by scientists: practices and perceptions. PLoS One 6, e21101, doi:10.1371/journal.pone.0021101 (2011) Many make data available via web sites or via supplementary material
  • 37. Multivariate analysis of the SCI syndrome using data from two research sites. Ferguson AR, Irvine K-A, Gensel JC, Nielson JL, et al. (2013) Derivation of Multivariate Syndromic Outcome Metrics for Consistent Testing across Multiple Models of Cervical Spinal Cord Injury in Rats. PLoS ONE 8(3): e59712. doi:10.1371/journal.pone.0059712 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059712
  • 38. Incentives: New solutions• New journals for data, where focus is on data not results • Data must be deposited in a recognized repository – Persistent identifier assigned • Standards for metadata and data types Nature Scientific Data
  • 39. Incentives: Data citations • Many groups are developing guidelines for creating a system of citation for data used in a study • First step for providing an incentive system for data sharing • Currently, very difficult to track use of data in articles http://www.force11.or g/datacitation “Sound, reproducible scholarship rests upon a foundation of robust, accessible data. Data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.” -Joint Declaration of Data Citation Principles Future of Research Communications and e-Scholarship; FORCE11 1. Importance 2. Credit and attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and verifiability 8. Interoperability and flexibility
  • 40. Unique ID’s for all! Resource Identification Initiative • It is currently impossible to query the biomedical literature to find out what research resources have been used to produce the results of a study -authors don’t provide enough information to unambiguously identify key research resources • Impossible to find all studies that used a resource • Critical for reproducibility and data mining • Critical for trouble- shooting http://www.force11.org/resource_identification_initiative Faulty Antibodies Continue to Enter US and European Markets, Warns Top Clinical Chemistry Researcher-Genome Web Daily, October 11, 2013
  • 41. Resource Identification Initiative • Have authors supply appropriate identifiers for key resources used within a study such that they are: – Machine processible (i.e., unique identifier that resolves to a single resource) – Outside of the paywall – Uniform across journals and publishers Launched February 2014: > 30 journals participating Anita Bandrowski, Nicole Vasilevsky, Matthew Brush, Melissa Haendel and the RINL group
  • 42. Pilot Project • Have authors identify 3 different types of research resources: – Software tools and databases – Antibodies – Genetically modified animals • Include RRID in methods section • RRID=RRID:Accession number – Just a string at this point • Voluntary for authors • Journals did not have to modify their submission system • Journals have flexibility in implementation. Send request to author at: – Submission – During review – After acceptance Sources: NIF Registry, NIF Antibody Registry, Model Organism Databases Resource Identification Portal: Aggregates accession numbers from >10 different databases that are the authorities for registering research resources
  • 43. First results are in the literature Google Scholar: Search RRID; select since 2014
  • 44. What studies used X? To date: •30 articles have appeared •2 articles have disappeared, i.e., the RRID’s were removed at copyediting •195 RRID’s were reported •14 were in error = 0.7% •> 200 antibodies were added •> 75 software tools/databases were added •A resolver service has been created •3rd party tools are being created to provide linkage between resources and papers RRID:nif-0000-30467 Authors did not deliberately leave out identifying information; they just hadn’t thought about it
  • 45. What have we learned? Utopia plug-in: Steve Pettifer •Authors are willing to adopt new types of citations and citation styles; you just have to ask •RRID = usage of research resource •Ideal: resolved by search engines without requiring specialized citation services •Citation drives registration •Clear role for repositories as authorities
  • 46. Digital objects are a new beast RRID: Provides foundation for establishing an alerting service for research resources Trust: Not just who produced it but what produced it
  • 47. Community database: beginning Community database: End Register your resource to NIF! “How do I share my data/tool?” “There is no database for my data” 1 2 3 4 Institutional repositories Cloud INCF: Global infrastructure Government Education Industry NIF provides the “glue” for a functioning ecosystem of data and tools Tool repositories Standards Brokering Archiving
  • 48. Article Code Blogs Workflows Data Persistent Identifiers Portals Persistent Identifiers Persistent Identifiers Unique and persistent identifiers and a system for referencing them allow an ecosystem to function An ecosystem for research objects: the social network of research resources Data Data Code Code Blogs Blogs Workflows Workflows Portals Portals Search engines
  • 49. Musings from the NIF • Analytics let us to take a global view of data – By bringing in a knowledge framework, we can look at positive and negative space • Well-populated data resources are critical to moving analytics forward – Comprehensive, i.e. they have most of the data that are available – Much can be learned even from messy data, but reasonable standards help – Active outreach is required • Technological barriers to widespread data sharing are diminishing – Best practices are emerging – General and focused repositories are available, although sustainability of these is a problem • There is a lot of neuroscience data available, but a culture of routine data sharing does not yet exist in neuroscience – But encouraging signs that it is largely due to lack of time and means, not lack of desire – It is up to us to change the incentive system to support the best science possible • Most scientists are not adept at managing or curating their own data – Role for repositories and data curators • Pieces of a functioning ecosystem are in place – Think about how you fit into the ecosystem
  • 50. NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Co-PI Amarnath Gupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University Sridevi Polavarum Yueling Li, UCSD Trish Whetzel, UCSD Fahim Imam Larry Lui Andrea Arnaud Stagg Jonathan Cachat Svetlana Sulima Burak Ozyrt Davis Banks Vadim Astakhov Xufei Qian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer (retired) Jonathan Pollock, NIH, Program Officer And my colleagues in Monarch, dkNet, 3DVC, Force 11 Melissa Haendel, OHSU** Nicole Vasilevsky Matthew Brush **Monarch and Resource Identification Initiative
  • 51. Creating an on-line knowledge space for neuroscience
  • 52. Pages are related through properties Red Links: Information is missing (or misspelled)
  • 53. Neurolex Neuron • Led by Dr. Gordon Shepherd • > 30 world wide experts • Simple set of properties • Consistent naming scheme • Integrated with Structural Lexicon • Used for annotation in other resources, e.g., NeuroElectro
  • 54. Location of Cell Soma Location of dendrites Location of local axon arbor
  • 55. Analysis of Red Links in the Neuron Registry • INCF Project – Neuron Registry – > 30 experts worldwide – Fill out neuron pages in Neurolex Wiki – Led by Dr. Gordon Shepherd Soma location Dendrite location Axon location 0 50 100 150 200 250 300 Number Total redlinks easy fixes hard fixes Soma location Dendrite location Axon location Social networks and community sites let us learn things from the collective behavior of contributors  INCF/HBP Knowledge Space
  • 56. Structural Lexicon in Neurolex Brain Region Brain Parcel •Trans-species •“Stateless”, i.e. no universal defining criteria •General structures and partonomies based on Neuroanatomy 101 Partially overlaps e.g., Hippocampus, Dentate gyrus •Species specific •Specific reference •Defining criteria •Sometimes partonomy; sometimes not e.g., Hippocampus of ABA2009
  • 58. Is there a framework for neuroscience? • Of the ~ 4000 columns that NIF queries, ~1300 map to one of our core categories: – Organism – Anatomical structure – Cell – Molecule – Function – Dysfunction – Technique • 30-50% of NIF’s queries autocomplete • When NIF combines multiple sources, a set of common fields emerges – >Basic information models/semantic models exist for certain types of entities Biomedical science does have a conceptual framework
  • 59.
  • 60. What would a 21st century platform for scholarship look like? D K Macroinformatics NIF: Sensors and monitors for the resource ecosystem
  • 61. Exposing knowledge to the web Because they are static URL’s, Wikis are searchable by Google
  • 62.
  • 63. NIF provides a rich source of information on digital resources • Analytics let us to take a global view of data – By bringing in a knowledge framework, we can look at positive and negative space • Well-populated data resources are critical to moving analytics forward – Comprehensive, i.e. they have most of the data that are available – Much can be learned even from messy data, but reasonable standards help – Active outreach is required • Technological barriers to widespread data sharing are diminishing – Best practices are emerging – General and focused repositories are available, although sustainability of these is a problem • There is a lot of neuroscience data available, but a culture of routine data sharing does not yet exist in neuroscience – But encouraging signs that it is largely due to lack of time and means, not lack of agreement • Most scientists are not adept at managing or curating their own data – Role for repositories and data curators • Pieces of a functioning ecosystem are in place; think globally Not just science, but data policy should be data driven
  • 64. Same data: different analysis • Gemma: Gene ID + Gene Symbol • DRG: Gene name + Probe ID • Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases Chronic vs acute morphine in striatum • Analysis: •1370 statements from Gemma regarding gene expression as a function of chronic morphine •617 were consistent with DRG;  over half of the claims of the paper were not confirmed in this analysis •Results for 1 gene were opposite in DRG and Gemma •45 did not have enough information provided in the paper to make a judgment Relatively simple standards would make it easier to perform comparisons across the ecosystem
  • 65. Musings from the NIF • Every resource is resource limited: few have enough time, money, staff or expertise required to do everything they would like – If the market can support 11 MRI databases, fine – Some consolidation, coordination is warranted – How can industry help support the data space? How can they take them even further? – Don’t let the data space become fractured • Big, broad and messy beats small, narrow and neat – Without trying to integrate a lot of data, we will not know what needs to be done – Progressive refinement; addition of complexity through layers • Be flexible and opportunistic: assume all will change – A single optimal technology/container for all types of scientific data and information does not exist; technology is changing • Think globally; act locally: – No source, not even NIF, is THE source; we are all a source – System and culture to be able to learn from everyting – Cooperative model for biomedicine