Navigating the Neuroscience DataLandscapeMaryann Martone, Ph. D.University of California, San Diego
“Neural Choreography”“A grand challenge in neuroscience is to elucidate brain function in relationto its multiple layers o...
 NIF is an initiative of the NIH Blueprint consortium of institutes What types of resources (data, tools, materials, ser...
How many resources arethere?•NIF Registry: Acatalog ofneuroscience-relevantresources•> 4800 currentlylisted•> 2000 databas...
The Neuroscience Information Framework: Discovery andutilization of web-based resources for neuroscience A portal for fin...
What are the connections of thehippocampus?HippocampusOR “CornuAmmonis” OR“Ammon’s horn” Query expansion: Synonymsand rela...
Results are organized within a commonframeworkConnects toSynapsed withSynapsed byInput regioninnervatesAxon innervatesProj...
The scourge of neuroanatomicalnomenclature•NIFConnectivity: 6 databases containing connectivity primary data or claims•Bra...
What is an ontology?BrainCerebellumPurkinje Cell LayerPurkinje cellneuronhas ahas ahas ais a Ontology: an explicit, forma...
PONS program Structural LexiconTaskforce Concentrate on Human, Non-humanPrimate, Rat and Mouse Define structural concep...
NeuroLexWikihttp://neurolex.org Stephen Larson•Provide a simple frameworkfor defining the conceptsrequired•Cell, Part of b...
Comparison of traffic to NIF PortalvsNeurolex5000 hits 15000 hitsWiki is readily indexed by search engines
Neurons in Neurolex INCF building aknowledge base ofneurons and theirproperties via theNeurolex Wiki Led by Dr. GordonSh...
NIF data federationImagesDrugsAntibodiesGrantsPathwaysAnimalsPercentage of data records perdata typeconnectivityBrain acti...
What do you mean by data?Databases come in many shapes and sizes Primary data: Data available forreanalysis, e.g., micro...
StriatumHypothalamusOlfactory bulbCerebral cortexBrainBrainregionData sourceVadimAstakhov, KepplerWorkflow EngineNIF lands...
How much of the landscape do we have?Query for “reference” brain structures and their parts in NIF Connectivity database
NIF Reports:Male vs FemaleGender biasNIF can start toanswer interestingquestions aboutneuroscienceresearch, not justabout ...
Embracing duplication: Data Mash ups•~300 PMID’s were common between Brede and SUMSdb•Same information; value addedSame da...
Same data: different analysisChronic vs acutemorphine in striatum Drug Related Gene database:extracted statements fromfig...
How easy was it to compare? Gemma: Gene ID + Gene Symbol DRG: Gene name + Probe ID Gemma: Increased expression/decrease...
Grabbing the long tail of smalldata Analysis of NIF showsmultiple databases withsimilar scope and content Many contain p...
Phases of NIF 2006-2008: A survey of what was out there 2008-2009: Strategy for resource discovery NIF Registry vs NIF ...
Data, not just stories about them!47/50 major preclinicalpublished cancer studiescould not be replicated “The scientific ...
A global view of data You (and the machine) have to be able tofind it Accessible through the web Annotations You have ...
NIF team (past and present)Jeff Grethe, UCSD, Co Investigator, Interim PIAmarnathGupta, UCSD, Co InvestigatorAnita Bandrow...
Concept-based search: search by meaning Search Google: GABAergic neuron Search NIF: GABAergic neuron NIF automatically ...
Upcoming SlideShare
Loading in …5
×

Navigating the Neuroscience Data Landscape

1,369 views

Published on

Maryann Martone
Canadian INCF Worksho, Vancouver, CA

May 24, 2012

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,369
On SlideShare
0
From Embeds
0
Number of Embeds
794
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Navigating the Neuroscience Data Landscape

  1. 1. Navigating the Neuroscience DataLandscapeMaryann Martone, Ph. D.University of California, San Diego
  2. 2. “Neural Choreography”“A grand challenge in neuroscience is to elucidate brain function in relationto its multiple layers of organization that operate at different spatialand temporal scales. Central to this effort is tackling “neuralchoreography” -- the integrated functioning of neurons into braincircuits--their spatial organization, local and long-distance connections,their temporal orchestration, and their dynamic features. Neuralchoreography cannot be understood via a purely reductionist approach.Rather, it entails the convergent use of analytical and synthetic tools togather, analyze and mine information from each level of analysis, andcapture the emergence of new layers of function (or dysfunction) as wemove from studying genes and proteins, to cells, circuits, thought, andbehavior....However, the neuroscience community is not yet fully engaged in exploitingthe rich array of data currently available, nor is it adequately poised tocapitalize on the forthcoming data explosion. “Akil et al., Science, Feb 11, 2011
  3. 3.  NIF is an initiative of the NIH Blueprint consortium of institutes What types of resources (data, tools, materials, services) areavailable to the neuroscience community? How many are there? What domains do they cover? What domains do they not cover? Where are they? Web sites Databases Literature Supplementary material Who uses them? Who creates them? How can we find them? How can we make them better in the future? http://neuinfo.org• PDF files• Desk drawers
  4. 4. How many resources arethere?•NIF Registry: Acatalog ofneuroscience-relevantresources•> 4800 currentlylisted•> 2000 databases•And we are findingmore every day
  5. 5. The Neuroscience Information Framework: Discovery andutilization of web-based resources for neuroscience A portal for finding andusing neuroscienceresources A consistent framework fordescribing resources Provides simultaneoussearch of multiple types ofinformation, organized bycategory Supported by an expansiveontology for neuroscience Utilizes advancedtechnologies to search the“hidden web”http://neuinfo.orgUCSD,Yale, CalTech, George Mason, Washington UnivSupported by NIH BlueprintLiteratureDatabaseFederationRegistry
  6. 6. What are the connections of thehippocampus?HippocampusOR “CornuAmmonis” OR“Ammon’s horn” Query expansion: Synonymsand related conceptsBoolean queriesData sourcescategorized by“data type” andlevel of nervoussystemCommon viewsacross multiplesourcesTutorials for usingfull resource whengetting there fromNIFLink back torecord inoriginalsource
  7. 7. Results are organized within a commonframeworkConnects toSynapsed withSynapsed byInput regioninnervatesAxon innervatesProjects toCellular contactSubcellular contactSource siteTarget siteEach resource implements a different, though related model;systems are complex and difficult to learn, in many cases
  8. 8. The scourge of neuroanatomicalnomenclature•NIFConnectivity: 6 databases containing connectivity primary data or claims•BrainArchitecture Management System (rodent)•ConnectomeWiki (human)•Brain Maps (various)•CoCoMac (primate cortex)•UCLA Multimodal database (Human fMRI)•Avian Brain Connectivity Database (Bird)•Total: 1800 unique brain terms (exluding Avian)•Number of exact terms used in > 1 database: 42•Number of synonym matches: 99•Number of partonomy matches: 385The INCF is working with NIF to develop semantic and spatial strategies for translatinganatomy across information systems
  9. 9. What is an ontology?BrainCerebellumPurkinje Cell LayerPurkinje cellneuronhas ahas ahas ais a Ontology: an explicit, formal representationof concepts relationships among themwithin a particular domain that expresseshuman knowledge in a machine readableform Branch of philosophy: a theory of what is e.g., Gene ontologies Provide universals for navigating acrossdifferent data sources Semantic “index” Provide the basis for concept-basedqueries to probe and mine data Perform reasoning Link data through relationships not just one-to-one mappings
  10. 10. PONS program Structural LexiconTaskforce Concentrate on Human, Non-humanPrimate, Rat and Mouse Define structural concepts from level oforgan to macromolecular complexes Provide a set of criteria by whichstructures can be identified Neuronal RegistryTaskforce Establish conventions for naming newtypes of neurons Establish a standard set of properties todefine neurons Create a Neuron Registry for registeringnew types of neurons Deployment and representation (AlanRuttenberg) Brought together ontologists workingacross scalesCourtesy of Chris Mungall, LawrenceBerkeley Labs***Not about imposing asingle view of anatomy;about making conceptscomputable and beingable to translate amongviews
  11. 11. NeuroLexWikihttp://neurolex.org Stephen Larson•Provide a simple frameworkfor defining the conceptsrequired•Cell, Part of brain,subcellular structure,molecule•Community based:•Avian neuroanatomy•Fly neurons (England)•Neuroimaging terms•Brain regions identifiedby text mining•Creating a computableindex for neuroscience data•INCF working to coordinateWiki efforts underway atAllen Institute, Blue Brainand NeurolexDemo D03
  12. 12. Comparison of traffic to NIF PortalvsNeurolex5000 hits 15000 hitsWiki is readily indexed by search engines
  13. 13. Neurons in Neurolex INCF building aknowledge base ofneurons and theirproperties via theNeurolex Wiki Led by Dr. GordonShepherd Consistent andparseable namingscheme Knowledge is readilyaccessible, editableand computableStephen Larson
  14. 14. NIF data federationImagesDrugsAntibodiesGrantsPathwaysAnimalsPercentage of data records perdata typeconnectivityBrain activation fociMicroarray98%Primary data, secondary data, claims,repositoriesRecently added: BioNOT literaturemining tool; Retraction Watch blog
  15. 15. What do you mean by data?Databases come in many shapes and sizes Primary data: Data available forreanalysis, e.g., microarray datasets from GEO; brain images fromXNAT; microscopic images(CCDB/CIL) Secondary data Data features extracted throughdata processing and sometimesnormalization, e.g, brain structurevolumes (IBVD), gene expressionlevels (Allen Brain Atlas); brainconnectivity statements (BAMS) Tertiary data Claims and assertions about themeaning of data E.g., geneupregulation/downregulation, Registries: Metadata Pointers to data sets ormaterials stored elsewhere Data aggregators Aggregate data of the sametype from multiplesources, e.g., Cell ImageLibrary ,SUMSdb, Brede Single source Data acquired within a singlecontext , e.g., Allen Brain Atlas
  16. 16. StriatumHypothalamusOlfactory bulbCerebral cortexBrainBrainregionData sourceVadimAstakhov, KepplerWorkflow EngineNIF landscape analysis
  17. 17. How much of the landscape do we have?Query for “reference” brain structures and their parts in NIF Connectivity database
  18. 18. NIF Reports:Male vs FemaleGender biasNIF can start toanswer interestingquestions aboutneuroscienceresearch, not justabout neuroscience
  19. 19. Embracing duplication: Data Mash ups•~300 PMID’s were common between Brede and SUMSdb•Same information; value addedSame data; different aspects
  20. 20. Same data: different analysisChronic vs acutemorphine in striatum Drug Related Gene database:extracted statements fromfigures, tables and supplementarydata from published article Gemma: Reanalyzed microarrayresults from GEO using differentalgorithms Both provide results of increasedor decreased expression as afunction of experimentalparadigm 4 strains of mice 3 conditions: chronic morphine,acute morphine, salineMined NIF for all references to GEOID’s: found small number where thesame dataset was represented in twoor more databaseshttp://www.chibi.ubc.ca/Gemma/home.html
  21. 21. How easy was it to compare? Gemma: Gene ID + Gene Symbol DRG: Gene name + Probe ID Gemma: Increased expression/decreased expression DRG: Increased expression/decreased expression But...Gemma presented results relative to baseline chronic morphine; DRG withrespect to saline, so direction of change is opposite in the 2 databases Analysis: 1370 statements from Gemma regarding gene expression as a function ofchronicmorphine 617 were consistent with DRG; over half of the claims of the paper were notconfirmed in this analysis Results for 1 gene were opposite in DRG and Gemma 45 did not have enough information provided in the paper to make a judgmentNIF annotationstandard
  22. 22. Grabbing the long tail of smalldata Analysis of NIF showsmultiple databases withsimilar scope and content Many contain partiallyoverlapping data Data “flows” from oneresource to the next Data isreinterpreted, reanalyzedor added to When does it becomesomething else?
  23. 23. Phases of NIF 2006-2008: A survey of what was out there 2008-2009: Strategy for resource discovery NIF Registry vs NIF data federation Ingestion of data contained within different technologyplatforms, e.g., XML vs relational vs RDF Effective search across semantically diverse sources NIFSTD ontologies 2009-2011: Strategy for data integration Unified views across common sources Mapping of content to NIF vocabularies 2011-present: Data analytics Uniform external data references
  24. 24. Data, not just stories about them!47/50 major preclinicalpublished cancer studiescould not be replicated “The scientific communityassumes that the claims in apreclinical study can be takenat face value-that althoughthere might be some errors indetail, the main message ofthe paper can be relied on andthe data will, for the mostpart, stand the test of time.Unfortunately, this is notalways the case.” Getting data out sooner in aform where they can be exposedto many eyes and manyanalyses, and easily compared,may allow us to expose errorsand develop better metrics toevaluate the validity of dataBegley and Ellis, 29 MARCH 2012 |VOL 483 |NATURE | 531 “There are no guidelines thatrequire all data sets to bereported in a paper; often,original data are removedduring the peer review andpublication process. “
  25. 25. A global view of data You (and the machine) have to be able tofind it Accessible through the web Annotations You have to be able to use it Data type specified and in a usable form You have to know what the data mean Some semantics Context: Experimental metadata Provenance: Where did the data come from?Reporting neuroscience data within a consistent framework helps enormously
  26. 26. NIF team (past and present)Jeff Grethe, UCSD, Co Investigator, Interim PIAmarnathGupta, UCSD, Co InvestigatorAnita Bandrowski, NIF Project LeaderGordon Shepherd,Yale UniversityPerry MillerLuis MarencoRixinWangDavidVan Essen,Washington UniversityErin ReidPaul Sternberg, CalTechArunRangarajanHans Michael MullerYuling LiGiorgioAscoli,George Mason UniversitySrideviPolavarumFahim Imam, NIF Ontology EngineerLarry LuiAndrea Arnaud StaggJonathan CachatJennifer LawrenceLee HornbrookBinh NgoVadimAstakhovXufeiQianChris ConditMark EllismanStephen LarsonWillieWongTimClark, Harvard UniversityPaolo CiccareseKaren Skinner, NIH, Program Officer
  27. 27. Concept-based search: search by meaning Search Google: GABAergic neuron Search NIF: GABAergic neuron NIF automatically searches for types ofGABAergic neuronsTypes of GABAergicneurons

×