The Neuroscience InformationFrameworkEstablishing a practical semantic framework forneuroscienceMaryann Martone, Ph. D.Uni...
NIF TeamAmarnath Gupta, UCSD, Co InvestigatorJeff Grethe, UCSD, Co InvestigatorGordon Shepherd, Yale UniversityPerry Mille...
What does thismean?•3D Volumes•2D Images•Surface meshes•Tree structure•Ball and stick models•Little squiggly linesData Peo...
The Neuroscience Information Framework: Discovery andutilization of web-based resources for neurosciencehttp://neuinfo.org...
Where do I find…• Data• Software tools• Materials• Services• Training• Jobs• Funding opportunities• Websites• Databases• C...
NIF in action
Query expansion: Synonymsand related conceptsBoolean queriesData sourcescategorized by“data type” andlevel of nervoussyste...
NIF searches across multiple sources ofinformation•NIF data federation•Independentdatabasesregistered with NIFor through w...
Guiding principles of NIF• Builds heavily on existing technologies (open source tools)• Information resources come in many...
Registering a Resource toNIFLevel 1NIF Registry: high level descriptions fromNIF vocabularies supplied by humancuratorsLev...
The NIF Registry•Very, very simplemodel•Annotated with NIFvocabularies•Resource type•Organism•Reviewed by NIFcurators
Level 2: Updates and deeper integration• DISCO involves a collection of files that reside on each participatingresource. T...
DISCO Level 2 Interoperation• Level 2 interoperation is designed for resources thathave only Web interfaces (no database A...
DrugBank ExampleThe DrugBank Web interface showing dataabout a specific drug (Phentoin).
DrugBank Example (continued)This DISCO Interoperation file specifies how to extract data from the DrugBankWeb interface au...
DrugBank Example (continued)A NIF user views data retrieved from DrugBank in response to a query in atransparent, integrat...
Level 3• Deep query of federated databases withprogrammatic interface• Register schema with NIF– Expose views of database–...
Integrated views and gene search
Is GRM1 in cerebral cortex?• NIF system allows easy search over multiple sources of information• Well known difficulties i...
Cerebral CortexAtlas Children ParentGenepaint Neocortex, Olfactory cortex (Olfactorybulb; piriform cortex), hippocampusTel...
Modular ontologies for neuroscience NIF covers multiple structural scales and domains of relevance to neuroscience Incor...
How are ontologies used?• Search: query expansion– Synonyms– Related classes– “concept based queries”• Annotation:– Resour...
Concept-based search: Entity mappingBrodmann area 3 Brodmann.3Synonyms and explicit mapping of database content help smoot...
Concept-based query: GABAergic neuron•Simple search will not return examplesof GABAergic neurons unless the dataare explic...
Building NIF ontologies: Balancing act• Different schools of thought as to how to build vocabularies andontologies• NIF is...
Ontologies, etcMike Bergman
What we’ve learned• Strategy: Create modular building blocks that can be knitinto many things– Step 1: Build core lexicon ...
Neurolex• More human centric• More stable class structure– Ontologies take time; many versions are retired-not good for in...
Reclassification based on logical definitions: GABA neuron is any member ofclass neuron that has neurotransmitter GABANIF ...
Maintaining multiple versions• NIF maintains the NIF vocabularies in differentforms for different purposes– Neurolex Wiki:...
NeuroLex Wikihttp://neurolex.org Stephen Larson“The human interface”Semantic media wiki
NIF ArchitectureGupta et al., Neuroinformatics, 2008 Sep;6(3):205-17
Summary• NIF has tried to adopt a flexible, practical approach to assembling,extending and using community ontologies– We ...
Where do we go from here?• NIF 2.5– Automatic expansionof logically definedterms• More services– NIF vocabularyservices ar...
Musings from the NIF…•No single approach, technology, philosophy, tool, platformwill solve everything•The value of making ...
NIF EvolutionV1.0: NIFNIF 1.5+++++Then Now Later
The Neuroscience Information Framework: Establishing a practical semantic framework for neuroscience
Upcoming SlideShare
Loading in …5
×

The Neuroscience Information Framework: Establishing a practical semantic framework for neuroscience

1,241 views
1,188 views

Published on

Maryann Martone
Seminar for Johnson and Johnson Pharmaceutical Research and Development, Ambler, PA
April 30, 2010

Published in: Education, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,241
On SlideShare
0
From Embeds
0
Number of Embeds
785
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

The Neuroscience Information Framework: Establishing a practical semantic framework for neuroscience

  1. 1. The Neuroscience InformationFrameworkEstablishing a practical semantic framework forneuroscienceMaryann Martone, Ph. D.University of California, San Diego
  2. 2. NIF TeamAmarnath Gupta, UCSD, Co InvestigatorJeff Grethe, UCSD, Co InvestigatorGordon Shepherd, Yale UniversityPerry MillerLuis MarencoDavid Van Essen, Washington UniversityErin ReidPaul Sternberg, Cal TechArun RangarajanHans Michael MullerGiorgio Ascoli, George Mason UniversitySridevi PolavarumAnita Bandrowski, NIF CuratorFahim Imam, NIF Ontology EngineerKaren Skinner, NIH, Program OfficerLee HornbrookKara LuVadim AstakhovXufei QianChris ConditStephen LarsonSarah MaynardBill BugKaren Skinner, NIH
  3. 3. What does thismean?•3D Volumes•2D Images•Surface meshes•Tree structure•Ball and stick models•Little squiggly linesData PeopleInformation systems
  4. 4. The Neuroscience Information Framework: Discovery andutilization of web-based resources for neurosciencehttp://neuinfo.orgUCSD, Yale, Cal Tech, George Mason, Washington UnivSupported by NIH Blueprint A portal for findingand usingneuroscienceresources A consistentframework fordescribing resources Provides simultaneoussearch of multipletypes of information,organized by category Supported by anexpansive ontology forneuroscience Utilizes advancedtechnologies to searchthe “hidden web”
  5. 5. Where do I find…• Data• Software tools• Materials• Services• Training• Jobs• Funding opportunities• Websites• Databases• Catalogs• Literature• Supplementarymaterial• Information portals...And how many are there?
  6. 6. NIF in action
  7. 7. Query expansion: Synonymsand related conceptsBoolean queriesData sourcescategorized by“data type” andlevel of nervoussystemSimplified views ofcomplex datasourcesTutorials for usingfull resource whengetting there fromNIFHippocampus OR “CornuAmmonis” OR“Ammon’s horn”
  8. 8. NIF searches across multiple sources ofinformation•NIF data federation•Independentdatabasesregistered with NIFor through webservices•NIF registry: catalog•NIF Web: custom webindex (work inprogress)•NIF literature:neuroscience-centeredliterature corpus(Textpresso)
  9. 9. Guiding principles of NIF• Builds heavily on existing technologies (open source tools)• Information resources come in many sizes and flavors• Framework has to work with resources as they are, not as wewish them to be– Federated system; resources will be independently maintained– Very few use standard terminology or map to ontologies• No single strategy will work for the current diversity ofneuroscience resources• Trying to design the framework so it will be as broadlyapplicable as possible to those who are trying to developtechnologies• Interface neuroscience to the broader life science community• Take advantage of emerging conventions in search and inbuilding web communities
  10. 10. Registering a Resource toNIFLevel 1NIF Registry: high level descriptions fromNIF vocabularies supplied by humancuratorsLevel 2Access to deeper content; mechanisms forquery and discovery; DISCO protocolLevel 3Direct query of web accessible databaseAutomated registrationMapping of database content to NIFvocabulary by human
  11. 11. The NIF Registry•Very, very simplemodel•Annotated with NIFvocabularies•Resource type•Organism•Reviewed by NIFcurators
  12. 12. Level 2: Updates and deeper integration• DISCO involves a collection of files that reside on each participatingresource. These files store information describing:- attributes of the resource, e.g., description, contact person,content of the resource, etc. -> updates NIF registry- how to implement DISCO capabilities for the resource• These files are maintained locally by the resource developers and are“harvested” by the central DISCO server.• In this way, central NIF capabilities can be updated automatically asresources evolve over time.• The developers of each resource choose which DISCO capabilities theirresource will utilizeLuis Marenco, MD, Rixin Wang, PhD, Perry L. Miller, MD, PhD, GordonShepherd, MD, DPhilYale University School of Medicine
  13. 13. DISCO Level 2 Interoperation• Level 2 interoperation is designed for resources thathave only Web interfaces (no database API).• Different resources require different approaches toachieve Level 2 interoperation. Examples are:1. CRCNS - requires metadata tagging of Web pages2. DrugBank - requires directed traversal of Webpages to extract data into a NIF data repository3. GeneNetwork - requires Web-based queries toachieve “relational-like” views using “wrappers”
  14. 14. DrugBank ExampleThe DrugBank Web interface showing dataabout a specific drug (Phentoin).
  15. 15. DrugBank Example (continued)This DISCO Interoperation file specifies how to extract data from the DrugBankWeb interface automatically.
  16. 16. DrugBank Example (continued)A NIF user views data retrieved from DrugBank in response to a query in atransparent, integrated fashion.
  17. 17. Level 3• Deep query of federated databases withprogrammatic interface• Register schema with NIF– Expose views of database– Map vocabulary to NIFSTD• Currently works with relational and XMLdatabases– RDF capability planned for NIF 2.5 (April 2010)• Works with NIF registry: databases alsoannotated according to data type and biologicalarea
  18. 18. Integrated views and gene search
  19. 19. Is GRM1 in cerebral cortex?• NIF system allows easy search over multiple sources of information• Well known difficulties in search• Inconsistent and sparse annotation of scientific data• Many different names for the same thing• No standards for data exchange or annotation at the semantic level– Lack of standards in data annotation require a lot of human investment inreconciling information from different sourcesAllen Brain AtlasMGDGensat
  20. 20. Cerebral CortexAtlas Children ParentGenepaint Neocortex, Olfactory cortex (Olfactorybulb; piriform cortex), hippocampusTelencephalonABA Cortical plate, Olfactory areas,Hippocampal FormationCerebrumMBAT (cortex) Hippocampus, Olfactory, Frontal,Perirhinal cortex, entorhinal cortexForebrainMBL Doesn’t appearGENSAT Not defined TelencephalonBrainInfo frontal lobe, insula, temporal lobe,limbic lobe, occipital lobeTelencephalonBrainmapsEntorhinal, insular, 6, 8, 4, A SII 17,Prp, SITelencephalon
  21. 21. Modular ontologies for neuroscience NIF covers multiple structural scales and domains of relevance to neuroscience Incorporated existing ontologies where possible; extending them for neuroscience where necessary Normalized under the Basic Formal Ontology: an upper ontology used by the OBO Foundry Based on BIRNLex: Neuroscientists didn’t like too many choices Cross-domain relationships are being built in separate files Encoded in OWL-DL, but also maintained in a Wiki form, a relational database form and any other way it is neededNIFSTDNS FunctionMolecule InvestigationSubcellularAnatomyMacromolecule GeneMolecule DescriptorsTechniquesReagent ProtocolsCellInstrumentsBill BugNS Dysfunction QualityMacroscopicAnatomyOrganismResource
  22. 22. How are ontologies used?• Search: query expansion– Synonyms– Related classes– “concept based queries”• Annotation:– Resource categorization– Entity mapping• Ranking of results– NIF Registry; NIF Web
  23. 23. Concept-based search: Entity mappingBrodmann area 3 Brodmann.3Synonyms and explicit mapping of database content help smooth overterminology differences and custom terminologies
  24. 24. Concept-based query: GABAergic neuron•Simple search will not return examplesof GABAergic neurons unless the dataare explicitly tagged as such•Too many possible classifications to getthem all by keywords•Classes are logically defined in NIFontology•i.e., a GABAergic neuron is anyneuron that uses GABA as aneurotransmitter
  25. 25. Building NIF ontologies: Balancing act• Different schools of thought as to how to build vocabularies andontologies• NIF is trying to navigate these waters, keeping in mind:– NIF is for both humans and machines– Our primary concern is data– We have to meet the needs of the community– We have a budget and deadlines• Building ontologies is difficult even for limited domains, nevermind all of neuroscience, but we’ve learned a few things– Reuse what’s there: trying to re-use URI’s rather than map when possible– Make what you do reusable: adopt best practices where feasible• Numerical identifiers, unique labels, single asserted simple hierarchies– Engage the community– Avoid “religious” wars: separate the science from the informatics– Start simple and add more complexity• Create modular building blocks from which other things can be built
  26. 26. Ontologies, etcMike Bergman
  27. 27. What we’ve learned• Strategy: Create modular building blocks that can be knitinto many things– Step 1: Build core lexicon (NeuroLex)• Classes and their definitions• Simple single inheritance and non-controversial hierarchies• Each module covers only a single domain• Understandable by an average human– Step 2: NIFSTD: standardize modules under same upper ontology• OBO compliant in OWL– Step 3: Create intra-domain and more useful hierarchies using properties andrestrictions– Brain partonomy– Step 4: Bridge two or more domains using a standard set of relations– Neuron to brain region– Neuron to molecule, e.g., GABAergic neuron
  28. 28. Neurolex• More human centric• More stable class structure– Ontologies take time; many versions are retired-not good for informationsystems that are using identifiers• Synonyms and abbreviations were essential for users• Can’t annotate if they can’t find it• Can’t use it for search if they can’t find it• Facilitates semi-automated mapping• Contains subsets of ontologies that are useful to neuroscientists– e.g., only classes in Chebi that neuroscientists use• Wanted the community to be able to see it and use it– Simple understandable hierarchies• Removed the independent continuants, entity etc• Used labels that humans could understand– Even if they were plural (meningesvsmeninx)
  29. 29. Reclassification based on logical definitions: GABA neuron is any member ofclass neuron that has neurotransmitter GABANIF Bridge File•NIF Cell•NIF Molecule•Define a set ofproperties that relateneuron classes tomoleculeclasses, e.g.,•Neuron•Hasneurotransmitter•Purkinje cell is aNeuron•Purkinje cell hasneurotransmitterGABA
  30. 30. Maintaining multiple versions• NIF maintains the NIF vocabularies in differentforms for different purposes– Neurolex Wiki: Lexicon for community review andcomment– NIFSTD: set of modular OWL files normalizedunder BFO and available for download– NCBO Bioportal for visibility and mapping services– Ontoquest: NIF’s ontology server• Relational store customized for OWL ontologies• Materialized inferred hierarchies for more efficientqueries
  31. 31. NeuroLex Wikihttp://neurolex.org Stephen Larson“The human interface”Semantic media wiki
  32. 32. NIF ArchitectureGupta et al., Neuroinformatics, 2008 Sep;6(3):205-17
  33. 33. Summary• NIF has tried to adopt a flexible, practical approach to assembling,extending and using community ontologies– We use a combination of search strategies: string based, lexicon-based,ontology-based– We believe in modularity– We believe in starting simple and adding complexity– We believe in single asserted hierarchies and multiple inferred hierarchies– We believe in balancing practicality and rigor• NIF is working through the International NeuroinformaticsCoordinating Facility (INCF) to engage the community to help buildout the Neurolex and start adding additional relations– Neuron registry task force– Structural lexicon
  34. 34. Where do we go from here?• NIF 2.5– Automatic expansionof logically definedterms• More services– NIF vocabularyservices are availablenow• More content• More, more, more NIF BlogNIF is learning lessons in practical data integration
  35. 35. Musings from the NIF…•No single approach, technology, philosophy, tool, platformwill solve everything•The value of making resources discoverable is notappreciated•Developing resources (tools, databases, data) that areinteroperable at this point is an act of will•Decisions can be made at the outset that will make it easier or harder tointegrate•We build resources for ourselves and our constituents, notautomated agents•We get mad when commercial providers don’t make their productsinteroperable•Many times the choice of terminology is based on expediency orwho taught you biology rather than deep philosophical differences•Ontologies, terminologies, lexicons, thesauri•Developing a semantic framework on top of unstable resources isdifficult•Need to adopt a flexible approach
  36. 36. NIF EvolutionV1.0: NIFNIF 1.5+++++Then Now Later

×