The Neuroscience Information Framework:The present and future of neuroscience data sharing


Published on

Maryann Martone

Neuroinformatics Graduate Course, Yale Unversity, CT

March, 3 2011

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Get demo numbers
  • Replace this slide with something better
  • Replace this example with a PRO or a true small molecule!!!!!!
  • The Neuroscience Information Framework:The present and future of neuroscience data sharing

    1. 1. The Neuroscience InformationFramework:The present and future of neurosciencedata sharingMaryann Martone, Ph. D.University of California, San Diego
    2. 2. TheEncyclopediaof LifeA…Access to data haschanged over theyearsTim Berner-s Lee: Web of dataWikipedia defines Linked Data as "a term usedto describe a recommended best practice forexposing, sharing, and connecting pieces ofdata, information, and knowledge on theSemanticWeb using URIs and RDF.”
    3. 3. The mountain of data problemWould like to be able to find: What is known****: What is the average diameter of a Purkinjeneuron Is GRM1 expressed In cerebral cortex? What are the projections of hippocampus What genes have been found to beupregulated in chronic drug abuse in adults What studies used my monoclonal mouseantibody against GAD in humans? Find all instances of spines thatcontain membrane-bound organelles ****by combining data from differentsources and different groups What is not known: Connections among data Gaps in knowledgeRequired Components:– Query interface– Search strategies– Data sources– Infrastructure– Results display– Trust– Context– Analysis tools– Tools for translating existingcontent into linkable form– Tools for creating new data readyto be linked
    4. 4. Where would you rather look?Unstructured vs structured dataPublishing data in the literature/ web pages vs databases and tables
    5. 5. ScaleWhole brain data(20 ummicroscopic MRI)Mosiac LMimages (1 GB+)Conventional LMimagesIndividual cellmorphologiesEM volumes &reconstructionsSolved molecularstructuresNo single technology serves these allequally well. Multiple data types; multiplescales; multiple databasesA multi-scale data problemA data federation problem
    6. 6. Two organizing frameworks forknowledgeKnowledge in space and spatial relationships(the “where”)Knowledge in words, terminologies andlogical relationships (the “what”)
    7. 7. Assembling data into coherentmodelsSnavely et al. Scene Reconstruction andVisualization from Community PhotoCollections
    8. 8. What if... The Matterhorn could be 15 different things? There were 6 billion Matterhorns, all more or less different from oneanother? The Roman Coliseum was called by 45 different names? The photo represented 1/1,000,000 of the whole with no context? Photos weren’t annotated at all or were tagged “1” or “mm45”? The statue of liberty was represented as a mathematical equation? Or ascatter plot?1
    9. 9. Cerebral peduncleInternal capsuleCorticospinal tractEvery brain isdifferent;terminology isusedinconsistently;there are manynames for thesame structure
    10. 10. Curators vs researchers• Example of segmented object names fromCCDB for a Node of Ranvier:• Mitochondria1• Shwannlowermerge• U.L.Cisternae• Crop• Loop7_lower• Blue• Alex• Lysosomme_3•Alex left•Program used tocreateannotationsobsolete
    11. 11. The Neuroscience Information Framework: Discovery andutilization of web-based resources for neuroscience A portal for finding andusing neuroscienceresources A consistent framework fordescribing resources Provides simultaneoussearch of multiple types ofinformation, organized bycategory Supported by an expansiveontology for neuroscience Utilizes advancedtechnologies to search the“hidden web”http://neuinfo.orgUCSD,Yale, CalTech, George Mason, Washington UnivSupported by NIH Blueprint
    12. 12. ScaleWhole brain data(20 ummicroscopic MRI)Mosiac LMimages (1 GB+)Conventional LMimagesIndividual cellmorphologiesEM volumes &reconstructionsSolved molecularstructuresNo single technology serves these allequally well. Multiple data types; multiplescales; multiple databasesA data federation problem
    13. 13. How many resources are there?•NIF Registry: Acatalog ofneuroscience-relevantresources•> 3500 currentlydescribed•> 1500 databases•Another 4000awaiting curation•And we are findingmore every day
    14. 14. NIF Data Federation Too many databases to visit Capturing content in a few keywords is difficult if not impossible Each is organized differently; different UI’s, data models and tools NIF provides tools for databases to register their content to NIF Access to deep content; currently searches over 35 million recordsfrom > 65 different databases Web services, schema registration,XML-based description, RDF Organized according to level of nervous system and data type, e.g.,brain activation foci Enhanced keyword query interface Link to host resource Accompanied by a tutorial Defines common data models for similar data
    15. 15. HippocampusOR “CornuAmmonis” OR“Ammon’s horn” Query expansion: Synonymsand related conceptsBoolean queriesData sourcescategorized by“data type” andlevel of nervoussystemSimplified views ofcomplex datasourcesTutorials for usingfull resource whengetting there fromNIFLink back torecord inoriginalsource
    16. 16. NIF data federation... Simultaneous access to multiple sources of information through aconcept-based interface Unique resource for asking certain types of questions e.g., what rat strains have been most commonly used in research Indexes content in the hidden web not currently well served by search engines A set of tools for making resources available through the NIF A platform for data integration Simplified and neuroscience-centered views of very complicated resources An ontology for enhanced query and integration A wealth of real information on the practical issues of search across andintegration of data in the neurosciences Share experiences through publications, presentations, blogs and with other projects Developing annotation standards that help with search Provide best practices for resource creators
    17. 17. What are the connections of thehippocampus?Connects toSynapsed withSynapsed byInput regioninnervatesAxon innervatesProjects toCellular contactSubcellular contactSource siteTarget siteEach resource implements a different, though related model;systems are complex and difficult to learn, in many cases
    18. 18. Is GRM1 in cerebral cortex? NIF system allows easy search over multiple sources of information But, we have difficulty finding data Well known difficulties in search Inconsistent and sparse annotation of scientific data Many different names for the same thing The same name means many things “Hidden semantics”: 1 = male; 1 = present; 1=mouseAllen Brain AtlasMGDGensat
    19. 19. Cerebral CortexAtlas Children ParentGenepaint Neocortex, Olfactory cortex (Olfactorybulb; piriform cortex), hippocampusTelencephalonAllen Brain Atlas Cortical plate, Olfactory areas,Hippocampal FormationCerebrumMBAT (cortex) Hippocampus, Olfactory, Frontal,Perirhinal cortex, entorhinal cortexForebrainGENSAT Not defined TelencephalonBrainInfo frontal lobe, insula, temporal lobe,limbic lobe, occipital lobeTelencephalonBrainmapsEntorhinal, insular, 6, 8, 4, A SII 17,Prp, SITelencephalon
    20. 20. Result•We are not publishing data in aform that is easy to integrate•What we mean isn’t clear to asearch engine (or even to ahuman)•We use many different datastructures to say the samething•We don’t provide crucialinformation•Searching and navigating acrossindividual resources takes aninordinate amount of human effortTempus PecuniaEst Painting by RichardHarpum
    21. 21. NIF: Minimum requirements to use shareddata You have to be able to find it Accessible through the web Structured or semi-structured Annotations You have to be able to use it Data type specified and in a usable form You have to know what the data mean Semantics Identity 1 = integer, time scale, male, left hemisphere Context: Experimental metadataReporting neuroscience data within a consistent framework helps enormously
    22. 22. Whole Brain CatalogStephen Larson, Mark Ellisman http://wholebraincatalog.orgUses 3Dgameengine tobringtogethermultipledata typeswithin acommonframework
    23. 23. PurkinjeCellAxonTerminalAxonDendriticTreeDendriticSpineDendriteCell bodyCerebellarcortexMultiscale integration is not obviousThere is little obvious connectionbetween data sets taken atdifferent scales using differentmicroscopies without an explicitrepresentation of the biologicalobjects that the data represent
    24. 24. What is an ontology?BrainCerebellumPurkinje Cell LayerPurkinje cellneuronhas ahas ahas ais a Ontology: an explicit, formalrepresentation of concepts andrelationships among themwithin a particular domain thatexpresses human knowledge in amachine readable form Branch of philosophy: a theoryof what is e.g., Gene ontologies
    25. 25. What ontology isn’t(or shouldn’t be) A rigid top-down fixed hierarchy forlimiting expression in theneurosciences Not about restricting expression buthow to express meaning clearly andin a machine readable form A bottomless resource-eating pitthat consumes dollars and returnsnothing A cure-all for all our problems A completely solved area Applied vs theoretical Easy to understand Mike Bergman
    26. 26. What can ontology do for us? Express neuroscience concepts in a way that is machine readable Synonyms, lexical variants Definitions Provide means of disambiguation of strings Nucleus part of cell; nucleus part of brain; nucleus part of atom Rules by which a class is defined, e.g., a GABAergic neuron is neuron thatreleases GABA as a neurotransmitter Properties Quantities Provide universals for navigating across different data sources Semantic “index” Perform reasoning Link data through relationships not just one-to-one mappings Provide the basis for concept-based queries to probe and mine data As a branch of philosophy, make us think about the nature of thethings we are trying to describe, e.g., synapse is a site
    27. 27. Linking datatypes to semantics: What isthe average diameter of a Purkinjeneuron dendrite? Branch structure not a tree,not a set of blood vessels, nota road map but a DENDRITE Because anyone who usesNeurolucida uses the sameconcepts: axon, dendrite, cellbody, dendritic spine,information systems cancombine the data together inmeaningful ways Neurolucida doesn’t, however,tell you that dendrite belongsto a neuron of a particulartype or whether this dendriteis a neural dendrite at all( (Color Yellow) ; [10,1](Dendrite)( 5.04 -44.40 -89.00 1.32) ; Root( 3.39 -44.40 -89.00 1.32) ; R, 1(( 2.81 -45.10 -90.00 0.91) ; R-1, 1( 2.81 -45.18 -90.00 0.91) ; R-1, 2( 1.90 -46.01 -90.00 0.91) ; R-1, 3( 1.82 -46.09 -90.00 0.91) ; R-1, 4( 0.91 -46.59 -90.00 0.91) ; R-1, 5( 0.41 -46.83 -92.50 0.91) ; R-1, 6(( -0.66 -46.92 -88.50 0.74) ; R-1-1, 1( -0.74 -46.92 -88.50 0.74) ; R-1-1, 2( -2.15 -47.25 -88.00 0.74) ; R-1-1, 3( -2.15 -47.33 -88.00 0.74) ; R-1-1, 4( -3.06 -47.00 -87.00 0.74) ; R-1-1, 5( -4.05 -46.92 -86.00 0.74) ; R-1-1, 6Output of Neurolucida neuron trace
    28. 28. “A rose by any other name...”: Identity: Entities are uniquely identifiable Name is a meaningless numerical identifier (URI: Uniform resource identifier) Any number of human readable labels can be assigned to it Definition: Genera: is a type of (cell, anatomical structure, cell part) Differentia: “has a” A set of properties that distinguish among members of thatclass Can include necessary and sufficient conditions Implementation: How is this definition expressed Depending on the nature of the concept or entity and the needs of theinformation system, we can say more or fewer things Different languages; can express different things about the concept that can becomputed upon OWLW3C standard, RDF
    29. 29. Entity recognition: Are you the M Martonewho...The Gene Wiki: community intelligence applied to human gene annotation.Huss JW 3rd, Lindenbaum P, Martone M, Roberts D, Pizarro A, Valafar F, HogeneschJB, Su AI. Nucleic Acids Res. 2010 Jan;38(Database issue):D633-9.Ontologies for Neuroscience:What are they and What are they Good for? Larson SD,Martone ME. Front Neurosci. 2009 May;3(1):60-7. Epub 2009 May 1.Three-dimensional electron microscopy reveals new details of membrane systems forCa2+ signaling in the heart. HayashiT, Martone ME,Yu Z,Thor A, Doi M, Holst MJ,Ellisman MH, Hoshijima M. J Cell Sci. 2009 Apr 1;122(Pt 7):1005-13.Traumatic brain injury and the goals of care.Martone M. Hastings Cent Rep. 2006 Mar-Apr;36(2):3.Three-dimensional pattern of enkephalin-like immunoreactivity in the caudate nucleus of thecat.Groves PM, Martone M,Young SJ, Armstrong DM. J Neurosci. 1988 Mar;8(3):892-900.Some analyses of forgetting of pictorial material in amnesic and dementedpatients.Martone M, Butters N,Trauner D. J Clin Exp Neuropsychol. 1986 Jun;8(3):161-78.
    30. 30. ID: 555 55 5555 Full URI-http://usagov/ss#555555555 Label: Maryann ElizabethMartone Synonym: ME Martone, MMartone, Maryann Abbreviation: MEM Is a Has a Is that entity which has thesepropertiesM MartoneDept ofPsychiatry,UCSDMHEllismanPublicationsBostonVAHospitalText mining algorithms can discover a lot of thingsabout me
    31. 31. NIFSTD: Comprehensive Ontology NIF covers multiple structural scales and domains of relevance to neuroscience Aggregate of community ontologies with some extensions for neuroscience, e.g., Gene Ontology, Chebi,Protein Ontology Simple, basic “is a : hierarchies that can be used “as is” or to form the building blocks for more complexrepresentationsNIFSTDOrganismNS FunctionMolecule InvestigationSubcellularstructureMacromolecule GeneMolecule DescriptorsTechniquesReagent ProtocolsCellResource InstrumentDysfunction QualityAnatomicalStructure
    32. 32. Query across resources: Sncaand striatumNIF uses the NIFSTD ontologies to query across sources that use verydifferent terminologies, symbolic notations and levels of granularity
    33. 33. Entity mappingBIRNLex_435 Brodmann.3Explicit mapping of database content helps disambiguate non-unique andcustom terminology
    34. 34. Concept-based search: search by meaning SearchGoogle: GABAergic neuron Search NIF: GABAergic neuron NIF automatically searches for types of GABAergicneuronsTypes of GABAergicneurons
    35. 35. NIF #1: You have to be able tofind it... What genes are upregulated by drugs of abuse in the adultmouse?MorphineIncreasedexpressionAdult Mouse
    36. 36. Integration of knowledge based on relationshipsLooking for commonalities and distinctions among animalmodels and human conditions based on phenotypesSarah Maynard, Chris Mungall, Suzie Lewis NINDSThalamusCellular inclusionMidline nucleargroupLewy BodyParacentral nucleusCellular inclusion
    37. 37. Building ontologies: modifiedOBO Foundry principles NIF has adopted certain practices which we have foundmake it easier to build and work with ontologies inneuroscience Unique numerical identifers for class names Single asserted hierarchies Avoid multiple inheritance Use community ontologies One ontology per domainOpen Bio Ontologies
    38. 38. Asserted vs defined classes: thepower of explicit semantics Asserted class: Purkinje cell is a type of neuron Why? Because I said so! Defined class: Purkinje cell is a GABAergic neuron Why? Because it is a member of the class Neuron that releasesneurotransmitter GABA Logical definition based on properties Membership in the class is computed by reasoners based on thesatisfaction of a set of conditions Makes building ontologies tractable because you don’t have tocreate multiple hierarchies; you can infer them
    39. 39. Reclassification of a flat hierarchy based on logical definitionsThe principle ofsingle inheritance•Each class belongs toonly a single assertedhierarchy that isgenerally fairlyuninteresing•Through theassignments ofproperties andrestrictions, each classmay belong to manydefined hierarchies•The criteria formembership in thatclass is explicit•Easier bookkeeping
    40. 40. The case for shared ontologiesBrainCerebellumCerebellarCortexCerebellar PurkinjecellPurkinje neuronPurkinje cellsomaPurkinje celllayerCerebellarcortexIP3Cerebellum•To create thelinkages requiresmapping•Mapping isusuallyincomplete andnot alwayspossible•Can’t takeadvantage ofothers’ workTop down anatomy ontology Cell centered anatomy ontology
    41. 41. CerebellumPurkinje cellsomaCerebellumPurkinje celldendriteCerebellumPurkinje cell axon(Cell partontology)Cerebellum granule celllayer (Anatomy ontology)Cerebellum Purkinjecell layerCerebellummolecular layerHaspartHaspartHaspartIs part ofIs part ofIs part ofShared building blocks: Knowledge base is enrichedCalbindin IP3(CHEBI:16595)CerebellumPurkinje neuron(Cell Ontology)Cerebellar cortexHas partHas partHas part
    42. 42. Access to shared ontologies Neuroscience Information Framework( Ontologies available asOWL file, RDF and throughWeb Services NCBO Bioportal( Repositoryof ontologies for biomedical research 199 ontologies (including NIFSTD) Contains many mappings Provides annotation services INCF Program on Ontologies for NeuralStructures Neuronal RegistryTask Force Description of neural properties Structural Lexicon Description of properties across scales
    43. 43. Building or expanding ontologiesMichael Bergman
    44. 44. NeuroLexWiki Stephen LarsonSemanticWiki: provides communityinterface for viewing, enhancing andmodifying NIFSTD ontologies•Provide a simpleframework fordefining theconcepts required•Cell, Part ofbrain,subcellularstructure,molecule•On demand•Assign permanentURI•Ontologists/knowledge engineers buildin complexity•Tries to teach andadhere to basic bestpractices
    45. 45. Define by rules: Generate multipleclassifications programmatically
    46. 46. Enriching the knowledge baseMembers of this class automaticallygenerated according to a rule expressed ina standard query language
    47. 47. Inferring the Mesoscale The NIFSTD is expressed inOWL (Web OntologyLanguage) Supports reasoning and inference Through integration withother ontologies coveringgross anatomy and molecularentities, we are working tocreate inferences across scales Analyze locally; infer globallyLarson and Martone, 2007Stephen Larson
    48. 48. Inferencing across scales: Comparestatements1. Look brain region up in NeuroLex2. Look up cells contained in the brain region3. Find those cells that are known to project outof that brain region4. Look up the neurotransmitters for those cells5. Determine whether those neurotransmittersare known to be excitatory or inhibitory6. Report the projection as excitatory orinhibitory, and report the entire chain of logicwith links back to the wiki pages where theywere made7. Make sure user can get back to each statementin the logic chain to edit it if they think it iswrongStephen LarsonCHEBI:18243
    49. 49. A semantic web for neuroscience? Good idea.So all I have to do is... Express your data in RDF? Well... Which RDF Bio2RDF, BioRDF, Linked Data, Open Data, SemanticWeb Use URI’s for all data elements Well... What exactly does that mean? Shared Names, BioRDF, my own? Use shared ontologies? Well... Which ones? I don’t have one They’re not stable They take too long I’d rather share your toothbrush Wait forWatson 3.0Effective data sharing is still an act of will
    50. 50. We do know some thingsNIF Blog1. Register your resourcewith NIF!!!!2: Mindfulness Resource providers: Mindfulness that yourresource is contributing data to a globalfederation Link to shared ontology identifiers wherepossible Stable and unique identifiers for data Explicit semantics Database, model, atlas Researchers: Mindfulness when publishingdata that it is to be consumed by machinesand not just your colleagues Accession numbers for genes and species Catalog numbers for reagents Provide supplemental data in a form where it isis easy to re-use
    51. 51. Many thanks to...Amarnath Gupta, UCSD, Co InvestigatorJeff Grethe, UCSD, Co InvestigatorAnita Bandrowski, NIF CuratorGordon Shepherd,Yale UniversityPerry MillerLuis MarencoDavidVan Essen,Washington UniversityErin ReidPaul Sternberg, CalTechArunRangarajanHans Michael MullerGiorgioAscoli,George Mason UniversitySrideviPolavarumFahimImam, NIF Ontology EngineerKaren Skinner, NIH, Program OfficerMark EllismanLee HornbrookKara LuVadimAstakhovXufeiQianChris ConditStephen LarsonSarah MaynardBill Bug