The Neuroscience Information Framework: Establishing a practical semantic framework for neuroscience
The Neuroscience InformationFrameworkEstablishing a practical semantic framework forneuroscienceMaryann Martone, Ph. D.University of California, San Diego
NIF TeamAmarnath Gupta, UCSD, Co InvestigatorJeff Grethe, UCSD, Co InvestigatorGordon Shepherd, Yale UniversityPerry MillerLuis MarencoDavid Van Essen, Washington UniversityErin ReidPaul Sternberg, Cal TechArun RangarajanHans Michael MullerGiorgio Ascoli, George Mason UniversitySridevi PolavarumAnita Bandrowski, NIF CuratorFahim Imam, NIF Ontology EngineerKaren Skinner, NIH, Program OfficerLee HornbrookKara LuVadim AstakhovXufei QianChris ConditStephen LarsonSarah MaynardBill BugKaren Skinner, NIH
What does thismean?•3D Volumes•2D Images•Surface meshes•Tree structure•Ball and stick models•Little squiggly linesData PeopleInformation systems
The Neuroscience Information Framework: Discovery andutilization of web-based resources for neurosciencehttp://neuinfo.orgUCSD, Yale, Cal Tech, George Mason, Washington UnivSupported by NIH Blueprint A portal for findingand usingneuroscienceresources A consistentframework fordescribing resources Provides simultaneoussearch of multipletypes of information,organized by category Supported by anexpansive ontology forneuroscience Utilizes advancedtechnologies to searchthe “hidden web”
Where do I find…• Data• Software tools• Materials• Services• Training• Jobs• Funding opportunities• Websites• Databases• Catalogs• Literature• Supplementarymaterial• Information portals...And how many are there?
Query expansion: Synonymsand related conceptsBoolean queriesData sourcescategorized by“data type” andlevel of nervoussystemSimplified views ofcomplex datasourcesTutorials for usingfull resource whengetting there fromNIFHippocampus OR “CornuAmmonis” OR“Ammon’s horn”
NIF searches across multiple sources ofinformation•NIF data federation•Independentdatabasesregistered with NIFor through webservices•NIF registry: catalog•NIF Web: custom webindex (work inprogress)•NIF literature:neuroscience-centeredliterature corpus(Textpresso)
Guiding principles of NIF• Builds heavily on existing technologies (open source tools)• Information resources come in many sizes and flavors• Framework has to work with resources as they are, not as wewish them to be– Federated system; resources will be independently maintained– Very few use standard terminology or map to ontologies• No single strategy will work for the current diversity ofneuroscience resources• Trying to design the framework so it will be as broadlyapplicable as possible to those who are trying to developtechnologies• Interface neuroscience to the broader life science community• Take advantage of emerging conventions in search and inbuilding web communities
Registering a Resource toNIFLevel 1NIF Registry: high level descriptions fromNIF vocabularies supplied by humancuratorsLevel 2Access to deeper content; mechanisms forquery and discovery; DISCO protocolLevel 3Direct query of web accessible databaseAutomated registrationMapping of database content to NIFvocabulary by human
The NIF Registry•Very, very simplemodel•Annotated with NIFvocabularies•Resource type•Organism•Reviewed by NIFcurators
Level 2: Updates and deeper integration• DISCO involves a collection of files that reside on each participatingresource. These files store information describing:- attributes of the resource, e.g., description, contact person,content of the resource, etc. -> updates NIF registry- how to implement DISCO capabilities for the resource• These files are maintained locally by the resource developers and are“harvested” by the central DISCO server.• In this way, central NIF capabilities can be updated automatically asresources evolve over time.• The developers of each resource choose which DISCO capabilities theirresource will utilizeLuis Marenco, MD, Rixin Wang, PhD, Perry L. Miller, MD, PhD, GordonShepherd, MD, DPhilYale University School of Medicine
DISCO Level 2 Interoperation• Level 2 interoperation is designed for resources thathave only Web interfaces (no database API).• Different resources require different approaches toachieve Level 2 interoperation. Examples are:1. CRCNS - requires metadata tagging of Web pages2. DrugBank - requires directed traversal of Webpages to extract data into a NIF data repository3. GeneNetwork - requires Web-based queries toachieve “relational-like” views using “wrappers”
DrugBank ExampleThe DrugBank Web interface showing dataabout a specific drug (Phentoin).
DrugBank Example (continued)This DISCO Interoperation file specifies how to extract data from the DrugBankWeb interface automatically.
DrugBank Example (continued)A NIF user views data retrieved from DrugBank in response to a query in atransparent, integrated fashion.
Level 3• Deep query of federated databases withprogrammatic interface• Register schema with NIF– Expose views of database– Map vocabulary to NIFSTD• Currently works with relational and XMLdatabases– RDF capability planned for NIF 2.5 (April 2010)• Works with NIF registry: databases alsoannotated according to data type and biologicalarea
Is GRM1 in cerebral cortex?• NIF system allows easy search over multiple sources of information• Well known difficulties in search• Inconsistent and sparse annotation of scientific data• Many different names for the same thing• No standards for data exchange or annotation at the semantic level– Lack of standards in data annotation require a lot of human investment inreconciling information from different sourcesAllen Brain AtlasMGDGensat
Modular ontologies for neuroscience NIF covers multiple structural scales and domains of relevance to neuroscience Incorporated existing ontologies where possible; extending them for neuroscience where necessary Normalized under the Basic Formal Ontology: an upper ontology used by the OBO Foundry Based on BIRNLex: Neuroscientists didn’t like too many choices Cross-domain relationships are being built in separate files Encoded in OWL-DL, but also maintained in a Wiki form, a relational database form and any other way it is neededNIFSTDNS FunctionMolecule InvestigationSubcellularAnatomyMacromolecule GeneMolecule DescriptorsTechniquesReagent ProtocolsCellInstrumentsBill BugNS Dysfunction QualityMacroscopicAnatomyOrganismResource
How are ontologies used?• Search: query expansion– Synonyms– Related classes– “concept based queries”• Annotation:– Resource categorization– Entity mapping• Ranking of results– NIF Registry; NIF Web
Concept-based search: Entity mappingBrodmann area 3 Brodmann.3Synonyms and explicit mapping of database content help smooth overterminology differences and custom terminologies
Concept-based query: GABAergic neuron•Simple search will not return examplesof GABAergic neurons unless the dataare explicitly tagged as such•Too many possible classifications to getthem all by keywords•Classes are logically defined in NIFontology•i.e., a GABAergic neuron is anyneuron that uses GABA as aneurotransmitter
Building NIF ontologies: Balancing act• Different schools of thought as to how to build vocabularies andontologies• NIF is trying to navigate these waters, keeping in mind:– NIF is for both humans and machines– Our primary concern is data– We have to meet the needs of the community– We have a budget and deadlines• Building ontologies is difficult even for limited domains, nevermind all of neuroscience, but we’ve learned a few things– Reuse what’s there: trying to re-use URI’s rather than map when possible– Make what you do reusable: adopt best practices where feasible• Numerical identifiers, unique labels, single asserted simple hierarchies– Engage the community– Avoid “religious” wars: separate the science from the informatics– Start simple and add more complexity• Create modular building blocks from which other things can be built
What we’ve learned• Strategy: Create modular building blocks that can be knitinto many things– Step 1: Build core lexicon (NeuroLex)• Classes and their definitions• Simple single inheritance and non-controversial hierarchies• Each module covers only a single domain• Understandable by an average human– Step 2: NIFSTD: standardize modules under same upper ontology• OBO compliant in OWL– Step 3: Create intra-domain and more useful hierarchies using properties andrestrictions– Brain partonomy– Step 4: Bridge two or more domains using a standard set of relations– Neuron to brain region– Neuron to molecule, e.g., GABAergic neuron
Neurolex• More human centric• More stable class structure– Ontologies take time; many versions are retired-not good for informationsystems that are using identifiers• Synonyms and abbreviations were essential for users• Can’t annotate if they can’t find it• Can’t use it for search if they can’t find it• Facilitates semi-automated mapping• Contains subsets of ontologies that are useful to neuroscientists– e.g., only classes in Chebi that neuroscientists use• Wanted the community to be able to see it and use it– Simple understandable hierarchies• Removed the independent continuants, entity etc• Used labels that humans could understand– Even if they were plural (meningesvsmeninx)
Reclassification based on logical definitions: GABA neuron is any member ofclass neuron that has neurotransmitter GABANIF Bridge File•NIF Cell•NIF Molecule•Define a set ofproperties that relateneuron classes tomoleculeclasses, e.g.,•Neuron•Hasneurotransmitter•Purkinje cell is aNeuron•Purkinje cell hasneurotransmitterGABA
Maintaining multiple versions• NIF maintains the NIF vocabularies in differentforms for different purposes– Neurolex Wiki: Lexicon for community review andcomment– NIFSTD: set of modular OWL files normalizedunder BFO and available for download– NCBO Bioportal for visibility and mapping services– Ontoquest: NIF’s ontology server• Relational store customized for OWL ontologies• Materialized inferred hierarchies for more efficientqueries
NeuroLex Wikihttp://neurolex.org Stephen Larson“The human interface”Semantic media wiki
NIF ArchitectureGupta et al., Neuroinformatics, 2008 Sep;6(3):205-17
Summary• NIF has tried to adopt a flexible, practical approach to assembling,extending and using community ontologies– We use a combination of search strategies: string based, lexicon-based,ontology-based– We believe in modularity– We believe in starting simple and adding complexity– We believe in single asserted hierarchies and multiple inferred hierarchies– We believe in balancing practicality and rigor• NIF is working through the International NeuroinformaticsCoordinating Facility (INCF) to engage the community to help buildout the Neurolex and start adding additional relations– Neuron registry task force– Structural lexicon
Where do we go from here?• NIF 2.5– Automatic expansionof logically definedterms• More services– NIF vocabularyservices are availablenow• More content• More, more, more NIF BlogNIF is learning lessons in practical data integration
Musings from the NIF…•No single approach, technology, philosophy, tool, platformwill solve everything•The value of making resources discoverable is notappreciated•Developing resources (tools, databases, data) that areinteroperable at this point is an act of will•Decisions can be made at the outset that will make it easier or harder tointegrate•We build resources for ourselves and our constituents, notautomated agents•We get mad when commercial providers don’t make their productsinteroperable•Many times the choice of terminology is based on expediency orwho taught you biology rather than deep philosophical differences•Ontologies, terminologies, lexicons, thesauri•Developing a semantic framework on top of unstable resources isdifficult•Need to adopt a flexible approach
NIF EvolutionV1.0: NIFNIF 1.5+++++Then Now Later