SlideShare a Scribd company logo
Methodologies for Long-Tail Data
Sharing: What Have We Learned?
Maryann E. Martone, Ph. D.
University of California, San Diego
and
Hypothesis
Jeffrey S. Grethe, Ph. D.
University of California, San Diego
Database
Software Application
Data Analysis Service
Topical Portal
Core Facility
Ontology
Software Resource
Years:
NIF is an initiative of the NIH Blueprint consortium of institutes
– NIF has been tracking and cataloging the biomedical resource landscape since 2008
The current “Addictome"
NIF searches across:
• Resource Registry
(13,000+)
• > 200 deeply
integrated data
sources (>800
million records)
• literature
Query: Addiction
N
ORCID
RRID
Data
Digital world runs on globally unique and persistent identifiers; PID’s serve as a
“key” for identifying the same entity across different contexts
e-Science Ecosystem
Metadatastandards
Aggregator
People
Research resources
Ontology
Concepts
DOI
Protocols
Minimal Information Models
TranslationNon-digital
Repositories
and
Registries
e.g. NIF, Monarch
NIH Data DIscovery
Index
CDE
E
eScience goal: Make data Findable, Accessible, Interoperable, Re-usable
(FAIR) for both human and machine
PID
Resource Identification Initiative: Supplying unique
identifiers for key research resources
“The following antibodies were used for
immunoblotting: -actin mAb (1:10,000
dilution, Sigma-Aldrich)…”
“The following antibodies were used for
immunoblotting: -actin mAb (1:10,000
dilution, Sigma-Aldrich,
RRID:AB_262137)…”
VS
https://scicrunch.org/resolver/RRID:AB_262137
Minimal Information Standards
http://precedings.nature.com/documents/1720/version/1
http://precedings.nature.com/documents/1720/version/1/files/npre20081720-1.pdf
A set of guidelines for reporting data that
ensures the data can be easily verified,
analysed and clearly interpreted by the
wider scientific community. The
recommendations also provide a foundation
for structured databases, public repositories
and development of data analysis tools.
https://en.wikipedia.org/wiki/Minimum_Information_Standards
MINI: Minimum Information about a Neuroscience
Investigation
MIM
CDE 1
CDE 2
CDE N
• • •
Value Set
Common Data Elements
https://cde.nlm.nih.gov/home
http://www.nlm.nih.gov/cde/
A data element that is common
to multiple datasets and is used
to improve data quality and
promote data sharing. CDEs
usually describe the following
data element properties: Name,
Definition, Instructions,
Provenance, Value Set.
Value Sets
The set of possible values or
responses. A Value Set often
includes concepts from established
Vocabularies, Ontologies or Data
Standards. A value set may also
include a range of permissible values
and indicate the required units. For a
survey question, the value set may
be a list of possible responses.
http://neurolex.org/wiki/Category:Hippocampus_CA1_pyramidal_cell
Neuroscience Information Framework
“a tool for analyzing and structuring information”
“a reduction in uncertainty”
• Ontologies are the major way that NIF searches for and organizes information
• Aggregate of community ontologies, e.g., Gene Ontology, Chebi, Protein Ontology
• Still significant gaps for behavioral and physiological concepts and techniques
• Available as services through NIF so they can be built into applications
Organism
Molecule
Macromolecule Gene
Molecule Descriptors
Cell
Resource Instrument
Dysfunction QualityAnatomical Structure
NS Function
Subcellular
structure
Investigation
ProtocolsReagent
Techniques
NIFSTD
Concept-based query
Remove synonyms
Ontologies and their relationships let us probe the data space for related concepts
What have we learned?
• The landscape is vibrant, dynamic and growing, but also littered
with abandoned and unrealized projects
• Data belongs in a data repository, not on your lab server
• People are important in this endeavor: Leaders, curators,
community engagement specialists
• Data and ontology resources become interesting when they
are comprehensive: populate!!!
• Assume that you will be resource limited and plan
accordingly: time, money, personnel
• Cost-benefit analysis; what to do now vs later
• Technology will improve
• Don’t start from square 1-resources exist to help; help
support them
Extra Slides
12
Dimensions of FAIR data sharing
• Discoverability
– Data can be found
– Data set has an identifier and links are stable
• Accessibility
– Data can be accessed programmatically
– Access rights are clear
• Assessability
– Provenance is known
– Reliability can be determined
• Understandability
– The data can be understood
• Usability
– The data are actionable
– Data are not in a proprietary format
?
?
Goodman, A. et al. Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10,
e1003542, doi:10.1371/journal.pcbi.1003542 (2014)
Science as an open enterprise, Royal Society: https://royalsociety.org/policy/projects/science-public-
enterprise/Report/
FORCE11: Future of Research Communications and
e-Scholarship
• Resource Identification Initiative:
https://www.force11.org/group/resource-identification-
initiative
• FAIR Data Guiding principles:
https://www.force11.org/group/fairgroup/fairprinciples
• Data Citation Principles:
https://www.force11.org/group/joint-declaration-data-
citation-principles-final
• On creating machine-readable data citations:
https://peerj.com/articles/cs-1/
• 10 Simple rules for design, provision, and reuse of persistent
identifiers for life science data:
https://zenodo.org/record/18003#.VeOxxLQjvyAFORCE11.org: Grass roots organization dedicated to transforming scholarship through
Forebrain
Midbrain
Hindbrain
0
1-10
11-100
>101
Data Sources
Mapping the data landscape: Anatomical framework
~800 million records across ~200 databases or views

More Related Content

What's hot

Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
Jian Qin
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
Jian Qin
 
Fair by design
Fair by designFair by design
Fair by design
Pistoia Alliance
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
Paul Groth
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
Rebekah Cummings
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
Varsha Khodiyar
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
Rajarshi Guha
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
Chris Dwan
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
Rajarshi Guha
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
Rebekah Cummings
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
Rebekah Cummings
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
Brett Tully
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
pvhead123
 
Data management (1)
Data management (1)Data management (1)
Data management (1)SM Lalon
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
Todd Vision
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
Amanda Whitmire
 
Data Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsData Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructions
IUPUI
 
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
National Cancer Institute National Cancer Informatics Program
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
Rajarshi Guha
 

What's hot (20)

Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...Functional and Architectural Requirements for Metadata: Supporting Discovery...
Functional and Architectural Requirements for Metadata: Supporting Discovery...
 
How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?How Portable Are the Metadata Standards for Scientific Data?
How Portable Are the Metadata Standards for Scientific Data?
 
Fair by design
Fair by designFair by design
Fair by design
 
Knowledge graph construction for research & medicine
Knowledge graph construction for research & medicineKnowledge graph construction for research & medicine
Knowledge graph construction for research & medicine
 
Next generation data services at the Marriott Library
Next generation data services at the Marriott LibraryNext generation data services at the Marriott Library
Next generation data services at the Marriott Library
 
Why should researchers care about data curation?
Why should researchers care about data curation?Why should researchers care about data curation?
Why should researchers care about data curation?
 
Pharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark GenomePharos – A Torch to Use in Your Journey In the Dark Genome
Pharos – A Torch to Use in Your Journey In the Dark Genome
 
No Free Lunch: Metadata in the life sciences
No Free Lunch:  Metadata in the life sciencesNo Free Lunch:  Metadata in the life sciences
No Free Lunch: Metadata in the life sciences
 
Pharos: Putting targets in context
Pharos: Putting targets in contextPharos: Putting targets in context
Pharos: Putting targets in context
 
Research Data Management
Research Data ManagementResearch Data Management
Research Data Management
 
Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...Who owns the data? Intellectual property considerations for academic research...
Who owns the data? Intellectual property considerations for academic research...
 
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer ProteogenomicsA FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
A FAIR Data Sharing Framework for Large-Scale Human Cancer Proteogenomics
 
Data Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn WoolfreyData Management for Postgraduate students by Lynn Woolfrey
Data Management for Postgraduate students by Lynn Woolfrey
 
Data management (1)
Data management (1)Data management (1)
Data management (1)
 
The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...The Dryad Digital Repository: Published evolutionary data as part of the gre...
The Dryad Digital Repository: Published evolutionary data as part of the gre...
 
Introduction to Data Management
Introduction to Data ManagementIntroduction to Data Management
Introduction to Data Management
 
Data Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructionsData Management Lab: Data management plan instructions
Data Management Lab: Data management plan instructions
 
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
Dr. Eliot Siegel: Watson and Deep QA Software in Pursuit of Personalized Medi...
 
Pharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark GenomePharos: A Torch to Use in Your Journey in the Dark Genome
Pharos: A Torch to Use in Your Journey in the Dark Genome
 
Jonathan Breeze, Symplectic
Jonathan Breeze, SymplecticJonathan Breeze, Symplectic
Jonathan Breeze, Symplectic
 

Viewers also liked

A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
Maryann Martone
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
Maryann Martone
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
Maryann Martone
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
Maryann Martone
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
Maryann Martone
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
Maryann Martone
 
Annotating research resources with rrid’s
Annotating research resources with rrid’sAnnotating research resources with rrid’s
Annotating research resources with rrid’s
Maryann Martone
 

Viewers also liked (7)

A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
 
Data-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystemData-knowledge transition zones within the biomedical research ecosystem
Data-knowledge transition zones within the biomedical research ecosystem
 
How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...How do we know what we don't know?  Exploring the data and knowledge space th...
How do we know what we don't know?  Exploring the data and knowledge space th...
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
FORCE11: Creating a data and tools ecosystem
FORCE11:  Creating a data and tools ecosystemFORCE11:  Creating a data and tools ecosystem
FORCE11: Creating a data and tools ecosystem
 
Annotating research resources with rrid’s
Annotating research resources with rrid’sAnnotating research resources with rrid’s
Annotating research resources with rrid’s
 

Similar to Martone grethe

Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
William Gunn
 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
ASIS&T
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
University of Arizona
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersIncisive_Events
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
Carole Goble
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
Maryann Martone
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information Framework
Maryann Martone
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
James Hendler
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
Neuroscience Information Framework
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
ICPSR
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
Kathy Fontaine
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
Neuroscience Information Framework
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523ORCID, Inc
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
Philip Bourne
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
EDINA, University of Edinburgh
 
Data as a service: a human-centered design approach/Retha de la Harpe
Data as a service: a human-centered design approach/Retha de la HarpeData as a service: a human-centered design approach/Retha de la Harpe
Data as a service: a human-centered design approach/Retha de la Harpe
African Open Science Platform
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
Karry Lu
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
Micah Altman
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
Philip Bourne
 

Similar to Martone grethe (20)

Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
Sci Know Mine 2013: What can we learn from topic modeling on 350M academic do...
 
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
Open Science
Open Science Open Science
Open Science
 
Research data life cycle
Research data life cycleResearch data life cycle
Research data life cycle
 
Alain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producersAlain Frey Research Data for universities and information producers
Alain Frey Research Data for universities and information producers
 
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
ISMB/ECCB 2013 Keynote Goble Results may vary: what is reproducible? why do o...
 
Databases and Ontologies: Where do we go from here?
Databases and Ontologies:  Where do we go from here?Databases and Ontologies:  Where do we go from here?
Databases and Ontologies: Where do we go from here?
 
Data Landscapes: The Neuroscience Information Framework
Data Landscapes:  The Neuroscience Information FrameworkData Landscapes:  The Neuroscience Information Framework
Data Landscapes: The Neuroscience Information Framework
 
Tragedy of the (Data) Commons
Tragedy of the (Data) CommonsTragedy of the (Data) Commons
Tragedy of the (Data) Commons
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523Borgman orcid dryadsymposiumoxford20130523
Borgman orcid dryadsymposiumoxford20130523
 
PhRMA Some Early Thoughts
PhRMA Some Early ThoughtsPhRMA Some Early Thoughts
PhRMA Some Early Thoughts
 
Engaging the Researcher in RDM
Engaging the Researcher in RDMEngaging the Researcher in RDM
Engaging the Researcher in RDM
 
Data as a service: a human-centered design approach/Retha de la Harpe
Data as a service: a human-centered design approach/Retha de la HarpeData as a service: a human-centered design approach/Retha de la Harpe
Data as a service: a human-centered design approach/Retha de la Harpe
 
ODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For GoodODSC East 2017: Data Science Models For Good
ODSC East 2017: Data Science Models For Good
 
Data Sharing & Data Citation
Data Sharing & Data CitationData Sharing & Data Citation
Data Sharing & Data Citation
 
Human Genome and Big Data Challenges
Human Genome and Big Data ChallengesHuman Genome and Big Data Challenges
Human Genome and Big Data Challenges
 

Recently uploaded

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
ChetanK57
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
silvermistyshot
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
Columbia Weather Systems
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
ronaldlakony0
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
muralinath2
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
Lokesh Patil
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
alishadewangan1
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
Richard Gill
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
Areesha Ahmad
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
zeex60
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Studia Poinsotiana
 

Recently uploaded (20)

PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATIONPRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
PRESENTATION ABOUT PRINCIPLE OF COSMATIC EVALUATION
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Lateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensiveLateral Ventricles.pdf very easy good diagrams comprehensive
Lateral Ventricles.pdf very easy good diagrams comprehensive
 
Orion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWSOrion Air Quality Monitoring Systems - CWS
Orion Air Quality Monitoring Systems - CWS
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
S.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary levelS.1 chemistry scheme term 2 for ordinary level
S.1 chemistry scheme term 2 for ordinary level
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
platelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptxplatelets_clotting_biogenesis.clot retractionpptx
platelets_clotting_biogenesis.clot retractionpptx
 
Nutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technologyNutraceutical market, scope and growth: Herbal drug technology
Nutraceutical market, scope and growth: Herbal drug technology
 
nodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptxnodule formation by alisha dewangan.pptx
nodule formation by alisha dewangan.pptx
 
Richard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlandsRichard's aventures in two entangled wonderlands
Richard's aventures in two entangled wonderlands
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
GBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture MediaGBSN - Microbiology (Lab 4) Culture Media
GBSN - Microbiology (Lab 4) Culture Media
 
Introduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptxIntroduction to Mean Field Theory(MFT).pptx
Introduction to Mean Field Theory(MFT).pptx
 
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
Salas, V. (2024) "John of St. Thomas (Poinsot) on the Science of Sacred Theol...
 

Martone grethe

  • 1. Methodologies for Long-Tail Data Sharing: What Have We Learned? Maryann E. Martone, Ph. D. University of California, San Diego and Hypothesis Jeffrey S. Grethe, Ph. D. University of California, San Diego
  • 2. Database Software Application Data Analysis Service Topical Portal Core Facility Ontology Software Resource Years: NIF is an initiative of the NIH Blueprint consortium of institutes – NIF has been tracking and cataloging the biomedical resource landscape since 2008
  • 3. The current “Addictome" NIF searches across: • Resource Registry (13,000+) • > 200 deeply integrated data sources (>800 million records) • literature Query: Addiction
  • 4. N ORCID RRID Data Digital world runs on globally unique and persistent identifiers; PID’s serve as a “key” for identifying the same entity across different contexts e-Science Ecosystem Metadatastandards Aggregator People Research resources Ontology Concepts DOI Protocols Minimal Information Models TranslationNon-digital Repositories and Registries e.g. NIF, Monarch NIH Data DIscovery Index CDE E eScience goal: Make data Findable, Accessible, Interoperable, Re-usable (FAIR) for both human and machine PID
  • 5. Resource Identification Initiative: Supplying unique identifiers for key research resources “The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich)…” “The following antibodies were used for immunoblotting: -actin mAb (1:10,000 dilution, Sigma-Aldrich, RRID:AB_262137)…” VS https://scicrunch.org/resolver/RRID:AB_262137
  • 6. Minimal Information Standards http://precedings.nature.com/documents/1720/version/1 http://precedings.nature.com/documents/1720/version/1/files/npre20081720-1.pdf A set of guidelines for reporting data that ensures the data can be easily verified, analysed and clearly interpreted by the wider scientific community. The recommendations also provide a foundation for structured databases, public repositories and development of data analysis tools. https://en.wikipedia.org/wiki/Minimum_Information_Standards MINI: Minimum Information about a Neuroscience Investigation MIM CDE 1 CDE 2 CDE N • • • Value Set
  • 7. Common Data Elements https://cde.nlm.nih.gov/home http://www.nlm.nih.gov/cde/ A data element that is common to multiple datasets and is used to improve data quality and promote data sharing. CDEs usually describe the following data element properties: Name, Definition, Instructions, Provenance, Value Set.
  • 8. Value Sets The set of possible values or responses. A Value Set often includes concepts from established Vocabularies, Ontologies or Data Standards. A value set may also include a range of permissible values and indicate the required units. For a survey question, the value set may be a list of possible responses. http://neurolex.org/wiki/Category:Hippocampus_CA1_pyramidal_cell
  • 9. Neuroscience Information Framework “a tool for analyzing and structuring information” “a reduction in uncertainty” • Ontologies are the major way that NIF searches for and organizes information • Aggregate of community ontologies, e.g., Gene Ontology, Chebi, Protein Ontology • Still significant gaps for behavioral and physiological concepts and techniques • Available as services through NIF so they can be built into applications Organism Molecule Macromolecule Gene Molecule Descriptors Cell Resource Instrument Dysfunction QualityAnatomical Structure NS Function Subcellular structure Investigation ProtocolsReagent Techniques NIFSTD
  • 10. Concept-based query Remove synonyms Ontologies and their relationships let us probe the data space for related concepts
  • 11. What have we learned? • The landscape is vibrant, dynamic and growing, but also littered with abandoned and unrealized projects • Data belongs in a data repository, not on your lab server • People are important in this endeavor: Leaders, curators, community engagement specialists • Data and ontology resources become interesting when they are comprehensive: populate!!! • Assume that you will be resource limited and plan accordingly: time, money, personnel • Cost-benefit analysis; what to do now vs later • Technology will improve • Don’t start from square 1-resources exist to help; help support them
  • 13. Dimensions of FAIR data sharing • Discoverability – Data can be found – Data set has an identifier and links are stable • Accessibility – Data can be accessed programmatically – Access rights are clear • Assessability – Provenance is known – Reliability can be determined • Understandability – The data can be understood • Usability – The data are actionable – Data are not in a proprietary format ? ? Goodman, A. et al. Ten simple rules for the care and feeding of scientific data. PLoS Comput Biol 10, e1003542, doi:10.1371/journal.pcbi.1003542 (2014) Science as an open enterprise, Royal Society: https://royalsociety.org/policy/projects/science-public- enterprise/Report/
  • 14. FORCE11: Future of Research Communications and e-Scholarship • Resource Identification Initiative: https://www.force11.org/group/resource-identification- initiative • FAIR Data Guiding principles: https://www.force11.org/group/fairgroup/fairprinciples • Data Citation Principles: https://www.force11.org/group/joint-declaration-data- citation-principles-final • On creating machine-readable data citations: https://peerj.com/articles/cs-1/ • 10 Simple rules for design, provision, and reuse of persistent identifiers for life science data: https://zenodo.org/record/18003#.VeOxxLQjvyAFORCE11.org: Grass roots organization dedicated to transforming scholarship through
  • 15. Forebrain Midbrain Hindbrain 0 1-10 11-100 >101 Data Sources Mapping the data landscape: Anatomical framework ~800 million records across ~200 databases or views

Editor's Notes

  1. Figure X: Resource types and year added to the registry. Research resources are each tagged with one or more resource types, the most common are represented in this graph (for all data see http://neurolex.org/wiki/Resource_Type_Hierarchy). The year that a resource was added to the registry is denoted by the color, note that 2009 and earlier data are lumped into 2010.