How do we know what we don't
know? Exploring the data and
knowledge space through the
Neuroscience Information
Framework
M...
We say this to each other all the
time, but we set up systems for
scholarly advancement and
communication that are the
ant...
• NIF is an initiative of the NIH Blueprint consortium of institutes
– What types of resources (data, tools, materials, se...
Old Model: Single type of content; single
mode of distribution
Scholar
Library
Scholar
Publisher
Systems for cataloging, m...
Scholar
Consumer
Libraries
Data Repositories
Code Repositories
Community
databases/platforms
OA
Curators
Social
Networks
S...
The duality of modern scholarship
Observation: Those who build information systems from the
machine side don’t understand ...
NIF: A New Type of Entity for New Modes of
Scientific Dissemination
• NIF’s mission is to maximize the awareness of, acces...
But we have Google!
• Current web is
designed to share
documents
– Documents are
unstructured data
• Much of the content
o...
Surveying the resource
landscape
~3000 databases
and datasets
Populate broadly and quickly with minimum
overhead to resource providers
•NIF curators
•Nomination by the community
•Semi-...
Data Federation: Deep search
http://neuinfo.org
With the thousands of databases and other information sources
available, s...
Data about the subthalamus
http://neuinfo.org
NIF unifies look, feel and access
What do you mean by data?
Databases come in many shapes and sizes
• Primary data:
– Data available for reanalysis, e.g.,
m...
0
50
100
150
200
250
0.01
0.1
1
10
100
1000
Jun-08 Dec-08 Jul-09 Jan-10 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13
Numberof...
Purkinje
Cell
Axon
Terminal
Axon
Dendritic
Tree
Dendritic
Spine
Dendrite
Cell body
Cerebellar
cortex
Bringing knowledge to...
NIF Semantic Framework: NIFSTD ontology
• NIF uses ontologies to help navigate across and unify neuroscience
resources
• O...
NIF links neuroscience to other domains via
community ontologies
• NIF Subcellular = Gene Ontology Cell Component
• NIF An...
: C
Neurolex: > 1 million triples
Dr. Yi Zeng: Chinese neural knowledge base
NIF Cell Graph
This is your brain on computers
Concept-based search: Query by meaning
NIF provides formal definitions of many neuroscience terms
= brain region without a...
Ontologies as a data integration framework
•NIF Connectivity: 7 databases containing connectivity primary data or claims
f...
Building a knowledge space for
neuroscience: Neurolex.org
http://neurolex.org
•Semantic MediWiki
•Provide a simple interfa...
“When I use a word...it means what I choose it
to mean”
Formalization lets us develop
metrics for the precision of the
ter...
Mapping the known unknowns
Comprehensive ontologies provide an accounting of what we
think we know
Where are the data rela...
0
1-10
11-100
>101
Open World-Closed World: Mapping the knowledge - data space
Data Sources
NIF lets us ask: where isn’t t...
Forebrain
Midbrain
Hindbrain
0
1-10
11-100
>101
Data Sources
Open World-Closed World: Mapping the knowledge - data space
J...
SW Oh et al. Nature 000, 1-8 (2014) doi:10.1038/nature13186
Adult mouse brain connectivity matrix: revenge of the
midbrain
The tale of the tail
“Human neuroimaging typically is performed on a whole brain basis.
However, for several reasons tail ...
fMRI Cerebellum
When results contradict a current theory, they may be ignored
“The Data Homunculus”
Funding drives representation in the data space
NIF Reports: Male
vs Female circa 2012
Gender bias
When data is not
made available, the
data space is an
incomplete record...
How much information makes it into
the data space?
∞
What is easily machine
processable and accessible
What is potentially...
Data sharing in the long tail of neurosciences
A place for my data
NIF lists over 350 data repositories=accept data
contributions from the community
“Empty Archives”
Repository Type of Data
Date
started Host
Public
data Comments
CARMEN
neuroscience /
electrophysiology 20...
Attitudes towards data sharing
“Pry it from my cold, dead
fingers”
“Done”
“You can have it if you really
want”
•Lack of ti...
Multivariate analysis of the SCI syndrome using data from two research sites.
Ferguson AR, Irvine K-A, Gensel JC, Nielson ...
Incentives: New solutions• New journals
for data,
where focus
is on data not
results
• Data must be
deposited in a
recogni...
Incentives: Data citations
• Many groups are
developing
guidelines for
creating a system
of citation for data
used in a st...
Unique ID’s for all! Resource Identification
Initiative
• It is currently impossible to
query the biomedical
literature to...
Resource Identification Initiative
• Have authors supply
appropriate identifiers for
key resources used within
a study suc...
Pilot Project
• Have authors identify 3 different
types of research resources:
– Software tools and databases
– Antibodies...
First results are in the literature
Google Scholar: Search RRID; select since 2014
What studies used X?
To date:
•30 articles have appeared
•2 articles have disappeared, i.e.,
the RRID’s were removed at
co...
What have we learned?
Utopia plug-in: Steve Pettifer
•Authors are willing to
adopt new types of
citations and citation
sty...
Digital objects are a new beast
RRID: Provides foundation for establishing an
alerting service for research resources
Trus...
Community
database:
beginning
Community
database:
End
Register your resource to NIF!
“How do I share my
data/tool?”
“There...
Article
Code
Blogs
Workflows
Data
Persistent Identifiers Portals
Persistent Identifiers
Persistent Identifiers
Unique and ...
Musings from the NIF
• Analytics let us to take a global view of data
– By bringing in a knowledge framework, we can look ...
NIF team (past and present)
Jeff Grethe, UCSD, Co Investigator, Co-PI
Amarnath Gupta, UCSD, Co Investigator
Anita Bandrows...
Creating an on-line knowledge space for
neuroscience
Pages are related through properties
Red Links: Information is missing (or misspelled)
Neurolex Neuron
• Led by Dr. Gordon
Shepherd
• > 30 world wide
experts
• Simple set of
properties
• Consistent naming
sche...
Location of Cell Soma
Location of dendrites
Location of local axon
arbor
Analysis of Red Links in the Neuron Registry
• INCF Project
– Neuron Registry
– > 30 experts
worldwide
– Fill out neuron
p...
Structural Lexicon in Neurolex
Brain
Region
Brain
Parcel
•Trans-species
•“Stateless”, i.e. no universal defining
criteria
...
Standards support diversity
Is there a framework for neuroscience?
• Of the ~ 4000 columns
that NIF queries,
~1300 map to one of
our core categories:
...
What would a 21st century platform for
scholarship look like?
D
K
Macroinformatics
NIF: Sensors and monitors for the resou...
Exposing knowledge to the web
Because they are static URL’s, Wikis are searchable by
Google
NIF provides a rich source of information on
digital resources
• Analytics let us to take a global view of data
– By bring...
Same data: different analysis
• Gemma: Gene ID + Gene Symbol
• DRG: Gene name + Probe ID
• Gemma presented results relativ...
Musings from the NIF
• Every resource is resource limited: few have enough time, money, staff or
expertise required to do ...
How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework
How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework
Upcoming SlideShare
Loading in...5
×

How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework

524
-1

Published on

Also includes results from the Resource Identification Initiative

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
524
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

How do we know what we don't know?  Exploring the data and knowledge space through the Neuroscience Information Framework

  1. 1. How do we know what we don't know? Exploring the data and knowledge space through the Neuroscience Information Framework Maryann E. Martone, Ph. D. University of California, San Diego Building Analytics for Integrated Neuroscience Data Ontario Brain Institute May 28-29, 2014
  2. 2. We say this to each other all the time, but we set up systems for scholarly advancement and communication that are the antithesis of integrationWhole brain data (20 um microscopic MRI) Mosiac LM images (1 GB+) Conventional LM images Individual cell morphologies EM volumes & reconstructions Solved molecular structures No single technology serves these all equally well. Multiple data types; multiple scales; multiple databases A data integration problem
  3. 3. • NIF is an initiative of the NIH Blueprint consortium of institutes – What types of resources (data, tools, materials, services) are available to the neuroscience community? – How many are there? – What domains do they cover? What domains do they not cover? – Where are they? • Web sites • Databases • Literature • Supplementary material – Who uses them? – Who creates them? – How can we find them? – How can we make them better in the future? http://neuinfo.org • PDF files • Desk drawers NIF has been surveying, cataloging and tracking the neuroscience resource landscape since < 2008
  4. 4. Old Model: Single type of content; single mode of distribution Scholar Library Scholar Publisher Systems for cataloging, metadata standards, and citation in place
  5. 5. Scholar Consumer Libraries Data Repositories Code Repositories Community databases/platforms OA Curators Social Networks Social NetworksSocial Networks Peer Reviewers Narrative Workflows Data Models Multimedia Nanopublications Code
  6. 6. The duality of modern scholarship Observation: Those who build information systems from the machine side don’t understand the requirements of the human very well Those who build information systems from the human side, don’t understand requirements of machines very well Scholarship requires the ability to cite and track usage of scholarly artifacts. In our current mode of working, there is no way to track artifacts as they move through the ecosystem; no way to incrementally add human expertise
  7. 7. NIF: A New Type of Entity for New Modes of Scientific Dissemination • NIF’s mission is to maximize the awareness of, access to and utility of research resources produced worldwide to enable better science and promote efficient use – NIF unites neuroscience information without respect to domain, funding agency, institute or community – NIF is like a “Pub Med” for all biomedical resources and a “Pub Med Central” for databases – Makes them searchable from a single interface – Practical and cost-effective; tries to be sensible – Learned a lot about the effective data sharing The Neuroscience Information Framework provides a rich data source for understanding the current resource landscape
  8. 8. But we have Google! • Current web is designed to share documents – Documents are unstructured data • Much of the content of digital resources is part of the “hidden web” • Wikipedia: The Deep Web (also called Deepnet, the invisible Web, DarkNet, Undernet or the hidden Web) refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines.
  9. 9. Surveying the resource landscape ~3000 databases and datasets
  10. 10. Populate broadly and quickly with minimum overhead to resource providers •NIF curators •Nomination by the community •Semi-automated text mining pipelines NIF Registry Requires no special skills Site map available for local hosting •NIF Data Federation •DISCO interop (Yale) •Requires some programming skill •But designed for quick ingestion Bandrowski et al., Database, 2012
  11. 11. Data Federation: Deep search http://neuinfo.org With the thousands of databases and other information sources available, simple descriptive metadata will not suffice Subthalamus
  12. 12. Data about the subthalamus http://neuinfo.org
  13. 13. NIF unifies look, feel and access
  14. 14. What do you mean by data? Databases come in many shapes and sizes • Primary data: – Data available for reanalysis, e.g., microarray data sets from GEO; brain images from XNAT; microscopic images (CCDB/CIL) • Secondary data – Data features extracted through data processing and sometimes normalization, e.g, brain structure volumes (IBVD), gene expression levels (Allen Brain Atlas); brain connectivity statements (BAMS) • Tertiary data – Claims and assertions about the meaning of data • E.g., gene upregulation/downregulation, brain activation as a function of task • Registries: – Metadata – Pointers to data sets or materials stored elsewhere • Data aggregators – Aggregate data of the same type from multiple sources, e.g., Cell Image Library ,SUMSdb, Brede • Single source – Data acquired within a single context , e.g., Allen Brain Atlas Researchers are producing a variety of information artifacts using a multitude of technologies; many duplicate effort and content
  15. 15. 0 50 100 150 200 250 0.01 0.1 1 10 100 1000 Jun-08 Dec-08 Jul-09 Jan-10 Aug-10 Feb-11 Sep-11 Apr-12 Oct-12 May-13 NumberofFederatedDatabases NumberofFederatedRecords(Millions) Data Federation Growth NIF searches the largest collation of neuroscience-relevant data on the web DISCO
  16. 16. Purkinje Cell Axon Terminal Axon Dendritic Tree Dendritic Spine Dendrite Cell body Cerebellar cortex Bringing knowledge to data: Ontologies as framework There is little obvious connection between data sets taken at different scales using different microscopies without an explicit representation of the biological objects that the data represent
  17. 17. NIF Semantic Framework: NIFSTD ontology • NIF uses ontologies to help navigate across and unify neuroscience resources • Ontologies are built from community ontologies  cross integration with other domains NIFSTD Organism NS FunctionMolecule Investigation Subcellular structure Macromolecule Gene Molecule Descriptors Techniques Reagent Protocols Cell Resource Instrument Dysfunction Quality Anatomical Structure NIF Ontologies provide standards for integration of diverse data; available through NIF vocabulary services
  18. 18. NIF links neuroscience to other domains via community ontologies • NIF Subcellular = Gene Ontology Cell Component • NIF Anatomy = UBERON cross-species ontology (Includes FMA and Neuronames) • NIF Disease = Disease Ontology • NIF Organism = NCBI Taxonomy • NIF Molecule = Chemicals of Biological Interest (CHEBI); Protein Ontology • NIF Cell/Investigation/Function = Developed largely by neuroscience community Use of ontology identifiers within data sources creates linkage across databases and across domains; the more they are used, the better they become
  19. 19. : C Neurolex: > 1 million triples Dr. Yi Zeng: Chinese neural knowledge base NIF Cell Graph This is your brain on computers
  20. 20. Concept-based search: Query by meaning NIF provides formal definitions of many neuroscience terms = brain region without a blood brain barrier
  21. 21. Ontologies as a data integration framework •NIF Connectivity: 7 databases containing connectivity primary data or claims from literature on connectivity between brain regions •Brain Architecture Management System (rodent) •Temporal lobe.com (rodent) •Connectome Wiki (human) •Brain Maps (various) •CoCoMac (primate cortex) •UCLA Multimodal database (Human fMRI) •Avian Brain Connectivity Database (Bird) •Total: 1800 unique brain terms (excluding Avian) •Number of exact terms used in > 1 database: 42 •Number of synonym matches: 99 •Number of 1st order partonomy matches: 385
  22. 22. Building a knowledge space for neuroscience: Neurolex.org http://neurolex.org •Semantic MediWiki •Provide a simple interface for defining the concepts required •Light weight semantics •Community based: •Anyone can contribute their terms, concepts, things •Anyone can edit •Anyone can link •Accessible: searched by Google •Growing into a significant knowledge base for neuroscience •33,000 concepts 200,000 edits 150 contributors Larson and Martone Frontiers in Neuroinformatics, 2013
  23. 23. “When I use a word...it means what I choose it to mean” Formalization lets us develop metrics for the precision of the terms we use
  24. 24. Mapping the known unknowns Comprehensive ontologies provide an accounting of what we think we know Where are the data relative to what we think we know? Striatum Hypothalamus Olfactory bulb Cerebral cortex Brain Brainregion Data source
  25. 25. 0 1-10 11-100 >101 Open World-Closed World: Mapping the knowledge - data space Data Sources NIF lets us ask: where isn’t there data? What isn’t studied? Why?
  26. 26. Forebrain Midbrain Hindbrain 0 1-10 11-100 >101 Data Sources Open World-Closed World: Mapping the knowledge - data space Junk brain regions?
  27. 27. SW Oh et al. Nature 000, 1-8 (2014) doi:10.1038/nature13186 Adult mouse brain connectivity matrix: revenge of the midbrain
  28. 28. The tale of the tail “Human neuroimaging typically is performed on a whole brain basis. However, for several reasons tail of the caudate activity can easily be missed. •One reason is limitations in the normalization algorithms, that typically are optimized to maximize accuracy for cortical rather than subcortical structures. ... •A second reason is that standard neuroimaging atlases such as the Harvard- Oxford structural atlas used with neuroimaging analysis programs such as FreeSurfer truncate the caudate at the body, and completely exclude the tail... •A final reason is that the tail of the caudate is close to the hippocampus, and could be misidentified as such especially in tasks involving learning and memory. Therefore, the tail of the caudate may be recruited in additional cognitive tasks, but yet not have been properly identified and reported in the neuroimaging literature” Seger CA. The visual corticostriatal loop through the tail of the caudate: circuitry and function. Front Syst Neurosci. 2013 Dec 6;7:104. doi: 10.3389/fnsys.2013.00104. eCollection 2013.
  29. 29. fMRI Cerebellum When results contradict a current theory, they may be ignored
  30. 30. “The Data Homunculus” Funding drives representation in the data space
  31. 31. NIF Reports: Male vs Female circa 2012 Gender bias When data is not made available, the data space is an incomplete record of what is available
  32. 32. How much information makes it into the data space? ∞ What is easily machine processable and accessible What is potentially knowable What is known: Literature, images, human knowledge Unstructured; Natural language processing, entity recognition, image processing and analysis; paywalls; file drawers Abstracts vs full text vs tables etc Estimates that > 50% scientific output is not recovered Chan et al. Lancet, 383, 2014
  33. 33. Data sharing in the long tail of neurosciences
  34. 34. A place for my data NIF lists over 350 data repositories=accept data contributions from the community
  35. 35. “Empty Archives” Repository Type of Data Date started Host Public data Comments CARMEN neuroscience / electrophysiology 2008 Newcastle University; United Kingdom 100 Requires account INCF Dataspace various 2012 International Neuroinformatics Coordinating Facility ? Open Source Brain models 2014 University College London 47 Cells and Networks; 23 (Technology -showcases) XNAT Central Neuroimaging 2010 Washington University School of Medicine in St. Louis; Missouri; USA 34 States 370 projects, 3804 subjects, and 5172 imaging sessions. 123 were visible but do not all appear to be public. 34 public data were listed under “Recent” Open Connectome Serial electron Microscopy and Magnetic Resonance 2011 Johns Hopkins University; Maryland; USA (graphs) 9 9, 7 - image projects; 19 - graphs UCSF DataShare biomedical including neuroimaging, MRI, cognitive impairment, dementia, aging 2011 University of California at San Francisco; California; USA 15 BrainLiner various functional data 2011 ATR; Kyoto; Japan 10 ModelDB neuron models 1996 Yale University; Connecticut; USA 875 NeuroMorpho digitally reconstructed neurons 2006 George Mason University; Virginia; USA 10004 Cell Image Library/Cell Centered Database images, videos, and animations of cell 2002 CCDB 2010 CIL American Society for Cell Biology / University of California at San Diego; California; USA 10,360 The CCDB had 450 data sets when it merged with CIL. CIL also contains large imaging data sets that are not counted as separate images CRCNS computational neuroscience datasets 2008 University of California at Berkeley; California; USA 38 OpenfMRI fMRI 2012 University of Texas at Austin; Texas; USA 22 NeuroMorpho.org = 10,000 neuronal reconstructions from ~200 labs Cell Image Library = 10,000 image sets from 1500 individuals “I finally gave NeuroMorpho my data so they would stop
  36. 36. Attitudes towards data sharing “Pry it from my cold, dead fingers” “Done” “You can have it if you really want” •Lack of time and resources •Lack of incentives •Fear of being scooped •Fear of being criticized •Fear that data will be misused •Data sharing is a waste of time AlwaysNever Reasons for not making data available Tenopir, C. et al. Data sharing by scientists: practices and perceptions. PLoS One 6, e21101, doi:10.1371/journal.pone.0021101 (2011) Many make data available via web sites or via supplementary material
  37. 37. Multivariate analysis of the SCI syndrome using data from two research sites. Ferguson AR, Irvine K-A, Gensel JC, Nielson JL, et al. (2013) Derivation of Multivariate Syndromic Outcome Metrics for Consistent Testing across Multiple Models of Cervical Spinal Cord Injury in Rats. PLoS ONE 8(3): e59712. doi:10.1371/journal.pone.0059712 http://www.plosone.org/article/info:doi/10.1371/journal.pone.0059712
  38. 38. Incentives: New solutions• New journals for data, where focus is on data not results • Data must be deposited in a recognized repository – Persistent identifier assigned • Standards for metadata and data types Nature Scientific Data
  39. 39. Incentives: Data citations • Many groups are developing guidelines for creating a system of citation for data used in a study • First step for providing an incentive system for data sharing • Currently, very difficult to track use of data in articles http://www.force11.or g/datacitation “Sound, reproducible scholarship rests upon a foundation of robust, accessible data. Data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice.” -Joint Declaration of Data Citation Principles Future of Research Communications and e-Scholarship; FORCE11 1. Importance 2. Credit and attribution 3. Evidence 4. Unique Identification 5. Access 6. Persistence 7. Specificity and verifiability 8. Interoperability and flexibility
  40. 40. Unique ID’s for all! Resource Identification Initiative • It is currently impossible to query the biomedical literature to find out what research resources have been used to produce the results of a study -authors don’t provide enough information to unambiguously identify key research resources • Impossible to find all studies that used a resource • Critical for reproducibility and data mining • Critical for trouble- shooting http://www.force11.org/resource_identification_initiative Faulty Antibodies Continue to Enter US and European Markets, Warns Top Clinical Chemistry Researcher-Genome Web Daily, October 11, 2013
  41. 41. Resource Identification Initiative • Have authors supply appropriate identifiers for key resources used within a study such that they are: – Machine processible (i.e., unique identifier that resolves to a single resource) – Outside of the paywall – Uniform across journals and publishers Launched February 2014: > 30 journals participating Anita Bandrowski, Nicole Vasilevsky, Matthew Brush, Melissa Haendel and the RINL group
  42. 42. Pilot Project • Have authors identify 3 different types of research resources: – Software tools and databases – Antibodies – Genetically modified animals • Include RRID in methods section • RRID=RRID:Accession number – Just a string at this point • Voluntary for authors • Journals did not have to modify their submission system • Journals have flexibility in implementation. Send request to author at: – Submission – During review – After acceptance Sources: NIF Registry, NIF Antibody Registry, Model Organism Databases Resource Identification Portal: Aggregates accession numbers from >10 different databases that are the authorities for registering research resources
  43. 43. First results are in the literature Google Scholar: Search RRID; select since 2014
  44. 44. What studies used X? To date: •30 articles have appeared •2 articles have disappeared, i.e., the RRID’s were removed at copyediting •195 RRID’s were reported •14 were in error = 0.7% •> 200 antibodies were added •> 75 software tools/databases were added •A resolver service has been created •3rd party tools are being created to provide linkage between resources and papers RRID:nif-0000-30467 Authors did not deliberately leave out identifying information; they just hadn’t thought about it
  45. 45. What have we learned? Utopia plug-in: Steve Pettifer •Authors are willing to adopt new types of citations and citation styles; you just have to ask •RRID = usage of research resource •Ideal: resolved by search engines without requiring specialized citation services •Citation drives registration •Clear role for repositories as authorities
  46. 46. Digital objects are a new beast RRID: Provides foundation for establishing an alerting service for research resources Trust: Not just who produced it but what produced it
  47. 47. Community database: beginning Community database: End Register your resource to NIF! “How do I share my data/tool?” “There is no database for my data” 1 2 3 4 Institutional repositories Cloud INCF: Global infrastructure Government Education Industry NIF provides the “glue” for a functioning ecosystem of data and tools Tool repositories Standards Brokering Archiving
  48. 48. Article Code Blogs Workflows Data Persistent Identifiers Portals Persistent Identifiers Persistent Identifiers Unique and persistent identifiers and a system for referencing them allow an ecosystem to function An ecosystem for research objects: the social network of research resources Data Data Code Code Blogs Blogs Workflows Workflows Portals Portals Search engines
  49. 49. Musings from the NIF • Analytics let us to take a global view of data – By bringing in a knowledge framework, we can look at positive and negative space • Well-populated data resources are critical to moving analytics forward – Comprehensive, i.e. they have most of the data that are available – Much can be learned even from messy data, but reasonable standards help – Active outreach is required • Technological barriers to widespread data sharing are diminishing – Best practices are emerging – General and focused repositories are available, although sustainability of these is a problem • There is a lot of neuroscience data available, but a culture of routine data sharing does not yet exist in neuroscience – But encouraging signs that it is largely due to lack of time and means, not lack of desire – It is up to us to change the incentive system to support the best science possible • Most scientists are not adept at managing or curating their own data – Role for repositories and data curators • Pieces of a functioning ecosystem are in place – Think about how you fit into the ecosystem
  50. 50. NIF team (past and present) Jeff Grethe, UCSD, Co Investigator, Co-PI Amarnath Gupta, UCSD, Co Investigator Anita Bandrowski, NIF Project Leader Gordon Shepherd, Yale University Perry Miller Luis Marenco Rixin Wang David Van Essen, Washington University Erin Reid Paul Sternberg, Cal Tech Arun Rangarajan Hans Michael Muller Yuling Li Giorgio Ascoli, George Mason University Sridevi Polavarum Yueling Li, UCSD Trish Whetzel, UCSD Fahim Imam Larry Lui Andrea Arnaud Stagg Jonathan Cachat Svetlana Sulima Burak Ozyrt Davis Banks Vadim Astakhov Xufei Qian Chris Condit Mark Ellisman Stephen Larson Willie Wong Tim Clark, Harvard University Paolo Ciccarese Karen Skinner, NIH, Program Officer (retired) Jonathan Pollock, NIH, Program Officer And my colleagues in Monarch, dkNet, 3DVC, Force 11 Melissa Haendel, OHSU** Nicole Vasilevsky Matthew Brush **Monarch and Resource Identification Initiative
  51. 51. Creating an on-line knowledge space for neuroscience
  52. 52. Pages are related through properties Red Links: Information is missing (or misspelled)
  53. 53. Neurolex Neuron • Led by Dr. Gordon Shepherd • > 30 world wide experts • Simple set of properties • Consistent naming scheme • Integrated with Structural Lexicon • Used for annotation in other resources, e.g., NeuroElectro
  54. 54. Location of Cell Soma Location of dendrites Location of local axon arbor
  55. 55. Analysis of Red Links in the Neuron Registry • INCF Project – Neuron Registry – > 30 experts worldwide – Fill out neuron pages in Neurolex Wiki – Led by Dr. Gordon Shepherd Soma location Dendrite location Axon location 0 50 100 150 200 250 300 Number Total redlinks easy fixes hard fixes Soma location Dendrite location Axon location Social networks and community sites let us learn things from the collective behavior of contributors  INCF/HBP Knowledge Space
  56. 56. Structural Lexicon in Neurolex Brain Region Brain Parcel •Trans-species •“Stateless”, i.e. no universal defining criteria •General structures and partonomies based on Neuroanatomy 101 Partially overlaps e.g., Hippocampus, Dentate gyrus •Species specific •Specific reference •Defining criteria •Sometimes partonomy; sometimes not e.g., Hippocampus of ABA2009
  57. 57. Standards support diversity
  58. 58. Is there a framework for neuroscience? • Of the ~ 4000 columns that NIF queries, ~1300 map to one of our core categories: – Organism – Anatomical structure – Cell – Molecule – Function – Dysfunction – Technique • 30-50% of NIF’s queries autocomplete • When NIF combines multiple sources, a set of common fields emerges – >Basic information models/semantic models exist for certain types of entities Biomedical science does have a conceptual framework
  59. 59. What would a 21st century platform for scholarship look like? D K Macroinformatics NIF: Sensors and monitors for the resource ecosystem
  60. 60. Exposing knowledge to the web Because they are static URL’s, Wikis are searchable by Google
  61. 61. NIF provides a rich source of information on digital resources • Analytics let us to take a global view of data – By bringing in a knowledge framework, we can look at positive and negative space • Well-populated data resources are critical to moving analytics forward – Comprehensive, i.e. they have most of the data that are available – Much can be learned even from messy data, but reasonable standards help – Active outreach is required • Technological barriers to widespread data sharing are diminishing – Best practices are emerging – General and focused repositories are available, although sustainability of these is a problem • There is a lot of neuroscience data available, but a culture of routine data sharing does not yet exist in neuroscience – But encouraging signs that it is largely due to lack of time and means, not lack of agreement • Most scientists are not adept at managing or curating their own data – Role for repositories and data curators • Pieces of a functioning ecosystem are in place; think globally Not just science, but data policy should be data driven
  62. 62. Same data: different analysis • Gemma: Gene ID + Gene Symbol • DRG: Gene name + Probe ID • Gemma presented results relative to baseline chronic morphine; DRG with respect to saline, so direction of change is opposite in the 2 databases Chronic vs acute morphine in striatum • Analysis: •1370 statements from Gemma regarding gene expression as a function of chronic morphine •617 were consistent with DRG;  over half of the claims of the paper were not confirmed in this analysis •Results for 1 gene were opposite in DRG and Gemma •45 did not have enough information provided in the paper to make a judgment Relatively simple standards would make it easier to perform comparisons across the ecosystem
  63. 63. Musings from the NIF • Every resource is resource limited: few have enough time, money, staff or expertise required to do everything they would like – If the market can support 11 MRI databases, fine – Some consolidation, coordination is warranted – How can industry help support the data space? How can they take them even further? – Don’t let the data space become fractured • Big, broad and messy beats small, narrow and neat – Without trying to integrate a lot of data, we will not know what needs to be done – Progressive refinement; addition of complexity through layers • Be flexible and opportunistic: assume all will change – A single optimal technology/container for all types of scientific data and information does not exist; technology is changing • Think globally; act locally: – No source, not even NIF, is THE source; we are all a source – System and culture to be able to learn from everyting – Cooperative model for biomedicine
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×