Martone acs presentation

Surveying the Biomedical Resource
Landscape
Maryann E. Martone, Ph. D.
Professor Emeritus
University of California, San Diego
and
Director of Biosciences
Hypothesis

Biomedical Data Discovery Index
Mapping and building a
biomedical research
resource ecosystem

Database
Software Application
Data Analysis Service
Topical Portal
Core Facility
Ontology
Software Resource
Years:
NIF is an initiative of the
NIH Blueprint
consortium of institutes
– NIF has been tracking
and cataloging the
biomedical resource
landscape since 2008

NIF: A New Type of Entity for New Modes
of Scientific Dissemination
• NIF’s mission is to maximize the awareness of, access to and utility of
digital resources produced worldwide to enable better science and
promote efficient use
– NIF was one of the first attempts to unite neuroscience information without
respect to domain, funding agency, institute or community
– Confront the scale, dynamism of domain and fluidity of technology
– Thought about global search across independently maintained
resources
– NIF is a library for scholarly output that is a web enabled resource and not a paper;
a Pub Med and Pub Med Central for things that aren’t articles
– Aggregates and tracks all the different databases, tools and resources now
produced by the scientific community
– Makes them searchable from a single interface
– Educate neuroscientists and students about effective data sharing
http://neuinfo.org

Organizing framework
and portal for data
dashed lines: mapping of
metadata, standards,
links to aggregators,
datasets
aggregators: repositories
or various indices whose
metadata are or can be
mapped into Commons
metadata
Data
Digital objects
A data discovery index for
biomedicineThereisworkforeveryone(andmore)
datamed.biocaddie.org (v0.5) alpha testing

Registry vs Data index: Metadata about
resource vs metadata/data in database
With the thousands of databases and other information sources
available, simple descriptive metadata will not suffice
Each source is
categorized
and presents
custom
facets;
integrated
views

SciCrunch: A “social network” for resources
• NIF is a general search engine
across neuroscience and
biomedicine
• Many communities want to
create more focused portals
• Own brand
• Own view
• How do we create a system that
satisfies community needs
without creating another silo?
• SciCrunch: Configurable portals
on top of shared resource pools

Breaking down silos: Community enrichment
Like a Mendeley for resources!

Semantic Information Framework
• Aggregate of community ontologies with some extensions for neuroscience
• Available as services through SciCrunch and BioPortal —> SciGraph Neo4J-based
Organism
Molecule InvestigationSubcellular
structure
Cell
Dysfunction
Quality
Anatomical
Structure
SciCrunch uses ontologies to enhance search and discovery but is not constrained by them
NS Function
NIFSTD

Forebrain
Midbrain
Hindbrain
0
1-10
11-100
>101
Data Sources
Knowledge gaps: If we can’t search you, do you exist?
~800 million records across ~200 databases or views

Domain
Knowledge
• Ontologies
• Atlases/Ma
ps
Claims,
assertions
• Registries
• Annotation
• Models and
simulations
• Analyses
Data
• Databases
• Data sets
• Derived
data
Literature
Search and Discovery
Cannot try to shoe-horn everything into a single representation or system, but figure
out how information (data + knowledge) can flow between them; Knowledge is fluid
and will continually update
Creating a Data and Resource Discovery
Environment

ORCID
RRID
Data
Digital world runs on globally unique and persistent identifiers; PID’s serve as a
“key” for identifying the same entity across different contexts
e-Science Ecosystem
Metadatastandards
People
Research resources
Ontology
Concepts
DOI
Protocols
Minimal Information Models
TranslationNon-digital
Repositories
and
Registries
CDE
No resource provider is an island: ensure your objects are FAIR
PID
Repositories,
Registries,
Aggregators, Social
platforms, Workflow
platforms
Searchanddiscovery
Citationstandards
articles
software
Digital

Analyzed
Curated
GSE13732
Analyzed
Mirrored
But…even our standards need standards
GSE13732
E-GEOD-13732
GEO:GSE13732
Identifiers:
• Standardize formats
• Agreements on re-use and persistence

Making research objects FAIR
– You (and the machine) have to be able to
find it
• Accessible through the web
• Annotations
• Stable links and unique identifiers
– You have to be able to use it
• Data type specified and in a usable form
– You have to know what the data mean
• Some semantics
• Context: Experimental metadata
–You have to be able to cite it:
• Provenance: Where did the data come from?
Make your data FAIR: Findable, Accessible, Interoperable, Reusable
https://www.force11.org/group/fairgroup
X
Research Resource

Resource Identification Initiative: Linking
resources to literature
• Have authors supply appropriate
identifiers for key resources used
within a study such that they are:
– Machine processible (i.e., unique
identifier that resolves to a single
resource)
– Outside of the paywall
– Uniform across journals and publishers
• Pilot project: SciCrunch portal
serving identifiers for
– Software/databases (NIF RR)
– Antibodies (NIF AB Registry)
– Genetically modified organisms (NIF
aggregation)
Absolutely reliant on comprehensive registries to enforce uniqueness, persistence and
consistent metadata

What studies used...
Type RRID into
Google Scholar;
return a list of
papers that use
that resource
>700 papers
>90 journals
1000’s of RRID’s

Resource IDs from NIF aggregated databases
•A single portal for
authors
•>15 authoritative
databases
•One search interface
•Not just my research
resource
•Thinking globally
about infrastructure
RII Portal
http://scicrunch.org/resources
Utilized NIF/SciCrunch infrastructure-NIH
Blueprint; NIDDK

Linking data to Literature: Joint Declaration of
Data Citation Principles
• Synthesis of data
citation principles
– >25 groups
participating
• Designed to be high
level and easy to
understand
• Supplemented with a
glossary, references
and examples
http://www.force11.org/datacitation
1. Importance
2. Credit and attribution
3. Evidence
4. Unique Identification
5. Access
6. Persistence
7. Specificity and
verifiability
8. Interoperability and
flexibility

From Principles to Practice
And
you
!
Data Citation Pilot

hypothes.is: Web annotation
• Works as an
independent layer
over any web page or
PDF *(images, video
and data coming)
• Open source
• Standards based
• Easy-to-use
https://hypothes.is/annotating-all-knowledge/

Neuroscientist annotating her own paper to provide updates and additional
information
An interactive knowledge layer

Conclusions
• Investments in infrastructures-successful and unsuccessful-
have laid the foundations for a functioning ecosystem
• Comprehensive registries, repositories and aggregators play
a key role in providing stable and useful representations of
key digital entities
• Persistence is a social contract
• Population is key
• i.e., people and organizations are in the mix!
• Need to think globally across the workflow
• FORCE11 coordinating, collating and organizing principles
that govern flow of research objects within the ecosystem
• New technologies are constantly arising

Martone acs presentation

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Martone acs presentation

Similar to Martone acs presentation (20)

More from Neuroscience Information Framework

More from Neuroscience Information Framework (20)

Recently uploaded

Recently uploaded (20)

Martone acs presentation

Editor's Notes