Your SlideShare is downloading. ×
0
Maryann	
  E.	
  	
  Martone,	
  Ph.	
  D.	
  
University	
  of	
  California,	
  San	
  Diego	
  
Neuroscience	
  is	
  unlikely	
  to	
  be	
  served	
  by	
  
a	
  few	
  large	
  databases	
  like	
  the	
  genomics	
...
hPp://neuinfo.org	
  
•  NIF’s	
  mission	
  is	
  to	
  maximize	
  the	
  awareness	
  of,	
  access	
  to	
  
and	
  uNlity	
  of	
  research...
h=p://neuinfo.org	
  
June10,	
  2013	
   dkCOIN	
  InvesNgator's	
  Retreat	
   6	
  
•  A	
  portal	
  for	
  finding	
  ...
We’d	
  like	
  to	
  be	
  able	
  to	
  find:	
  
•  What	
  is	
  known****:	
  
–  What	
  are	
  the	
  projecNons	
  ...
With	
  the	
  thousands	
  of	
  databases	
  and	
  other	
  informaNon	
  sources	
  
available,	
  simple	
  descripNv...
• NIF	
  curators	
  
• NominaNon	
  by	
  the	
  
community	
  
• Semi-­‐automated	
  text	
  
mining	
  pipelines	
  
 ...
Current	
  
Planned	
  
DISCO	
  Dashboard	
  Func6ons	
  
•  Ingest	
  Script	
  Manager	
  
•  Public	
  Script	
  Repos...
NIF	
  was	
  designed	
  to	
  be	
  populated	
  rapidly	
  
with	
  progressive	
  refinement	
  
Databases	
  come	
  in	
  many	
  shapes	
  and	
  sizes	
  
•  Primary	
  data:	
  
–  Data	
  available	
  for	
  reana...
Hippocampus	
  OR	
  “Cornu	
  Ammonis”	
  OR	
  
“Ammon’s	
  horn”	
   Query	
  expansion:	
  	
  Synonyms	
  
and	
  rel...
Connects	
  to	
  
Synapsed	
  with	
  
Synapsed	
  by	
  
Input	
  region	
  
innervates	
  
Axon	
  innervates	
  
Proje...
•  You	
  (and	
  the	
  machine)	
  have	
  to	
  be	
  able	
  to	
  find	
  it	
  
–  Accessible	
  through	
  the	
  we...
Knowledge	
  in	
  space	
  and	
  spaNal	
  relaNonships	
  
(the	
  “where”)	
  
Knowledge	
  in	
  words,	
  terminolog...
Purkinje	
  
Cell	
  
Axon	
  
Terminal	
  
Axon	
  
DendriNc	
  
Tree	
  
DendriNc	
  
Spine	
  
Dendrite	
  
Cell	
  bod...
•  NIF	
  covers	
  mulNple	
  structural	
  scales	
  and	
  domains	
  of	
  relevance	
  to	
  neuroscience	
  
•  Aggr...
Brain	
  
Cerebellum	
  
Purkinje	
  Cell	
  Layer	
  
Purkinje	
  cell	
  
neuron	
  
has	
  a	
  
has	
  a	
  
has	
  a	...
•  Express	
  neuroscience	
  concepts	
  in	
  a	
  way	
  that	
  is	
  machine	
  readable	
  	
  
–  Synonyms,	
  lexi...
birnlex_1732	
   Brodmann.1	
  
Explicit	
  mapping	
  of	
  database	
  content	
  helps	
  disambiguate	
  non-­‐unique	...
June10,	
  2013	
   24	
  
Aligns	
  sources	
  to	
  the	
  NIF	
  semanNc	
  framework	
  
•  Search	
  Google:	
  	
  GABAergic	
  neuron	
  
•  Search	
  NIF:	
  	
  GABAergic	
  neuron	
  
–  NIF	
  automaNcall...
Equivalence	
  classes;	
  	
  restricNons	
  
Arbitrary	
  but	
  defensible	
  
• Neurons	
  classified	
  by	
  
• Circu...
What	
  genes	
  are	
  upregulated	
  by	
  drugs	
  of	
  abuse	
  in	
  the	
  
adult	
  mouse?	
  (show	
  me	
  the	
...
• NIF	
  ConnecNvity:	
  	
  7	
  databases	
  containing	
  connecNvity	
  primary	
  data	
  or	
  claims	
  
from	
  li...
hPp://neurolex.org	
  
• SemanNc	
  MediWiki	
  
• Provide	
  a	
  simple	
  interface	
  
for	
  defining	
  the	
  concep...
•  Neurolex	
  provides	
  an	
  
on-­‐line	
  computable	
  
index	
  for	
  expressing	
  
models	
  in	
  semanNc	
  
t...
•  >	
  1000	
  Dicom	
  Terms	
  
–  Karl	
  Helmer	
  
–  Data	
  Sharing	
  Task	
  Force	
  
•  Tasks	
  and	
  CogniN...
Because	
  they	
  are	
  staNc	
  URL’s,	
  Wikis	
  are	
  searchable	
  by	
  
Google	
  
Neurolex:	
  	
  >	
  1	
  million	
  triples
Dr.	
  Yi	
  Zeng:	
  	
  Chinese	
  neural	
  knowledge	
  base	
  
NIF	
  ...
1.  Look	
  brain	
  region	
  up	
  in	
  NeuroLex	
  
2.  Look	
  up	
  cells	
  contained	
  in	
  the	
  brain	
  
reg...
•  INCF	
  Project	
  
–  Neuron	
  Registry	
  
–  >	
  30	
  experts	
  
worldwide	
  
–  Fill	
  out	
  neuron	
  
page...
37	
  
neurolex.org: Semantic Wiki
• INCF Community encyclopedia
• Define all vocabulary, terms,
protocols, brain structure...
MarNn	
  Telefont,	
  HBP:	
  	
  Lab	
  Space	
  connecNng	
  to	
  Knowledge	
  Space	
  
•  NIF	
  can	
  be	
  used	
  to	
  survey	
  the	
  
data	
  landscape	
  
•  Analysis	
  of	
  NIF	
  shows	
  mulNple	...
NIF	
  is	
  in	
  a	
  unique	
  posiNon	
  to	
  answer	
  quesNons	
  about	
  the	
  neuroscience	
  
landscape:	
  	
...
∞	
  
What	
  is	
  easily	
  machine	
  
processable	
  and	
  accessible	
  
What	
  is	
  potenNally	
  knowable	
  
Wh...
Closed	
  world	
  vs	
  open	
  world	
  
We	
  know	
  a	
  lot	
  about	
  some	
  things	
  and	
  less	
  about	
  ot...
Neocortex	
  
Olfactory	
  bulb	
  
Neostriatum	
  
Cochlear	
  nucleus	
  
All	
  neurons	
  with	
  cell	
  bodies	
  in...
Exposing	
  knowledge	
  gaps	
  and	
  biases	
  
Where	
  are	
  the	
  data?	
  
Striatum	
  
Hypothalamus	
  
Olfactor...
•  Gemma:	
  	
  Gene	
  ID	
  	
  +	
  Gene	
  Symbol	
  
•  DRG:	
  	
  Gene	
  name	
  +	
  Probe	
  ID	
  
•  Gemma	
 ...
NIF	
  favors	
  a	
  hybrid,	
  Nered,	
  
federated	
  system	
  
•  Domain	
  knowledge	
  
–  Ontologies	
  
•  Claims...
Scholar	
  
Library	
  
Scholar	
  
Publisher	
  
FORCE11.org:	
  	
  Future	
  of	
  research	
  communicaNons	
  and	
  ...
Scholar	
  
Consumer	
  
Libraries	
  
Data	
  Repositories	
  
Code	
  Repositories	
  
Community	
  databases/
pla}orms	...
•  Of	
  the	
  ~	
  4000	
  columns	
  
that	
  NIF	
  queries,	
  
~1300	
  map	
  to	
  one	
  of	
  
our	
  core	
  ca...
•  Several	
  powerful	
  trends	
  should	
  change	
  the	
  way	
  we	
  think	
  about	
  our	
  
data:	
  	
  One	
  ...
Jeff	
  Grethe,	
  UCSD,	
  Co	
  InvesNgator,	
  Interim	
  PI	
  
Amarnath	
  Gupta,	
  UCSD,	
  Co	
  InvesNgator	
  
An...
Data	
  Space	
  
Laboratory	
  
Space	
  
Knowledge	
  
Space	
  
BAMS	
  
Lexicon	
  
Encyclopedia	
  
47/50	
  major	
  preclinical	
  
published	
  cancer	
  studies	
  
could	
  not	
  be	
  replicated	
  
•  “The	
  scien...
•  Every	
  resource	
  is	
  resource	
  limited:	
  	
  few	
  have	
  enough	
  Nme,	
  money,	
  
staff	
  or	
  	
  ex...
Regional	
  part	
  of	
  
nervous	
  system	
   ParcellaNon	
  
scheme	
  parcel	
  
ParcellaNon	
  
scheme	
  parcel	
  ...
 1200	
  parts	
  of	
  nervous	
  
system	
  characterized	
  
(mostly)	
  	
  according	
  to	
  
CUMBO	
  terms	
  
 ...
Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework
Upcoming SlideShare
Loading in...5
×

Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework

617

Published on

Presentation on the NIF project to Sandia Labs, with an in depth look into NIF's data federation and strategies for creating on-line knowledge spaces

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
617
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
17
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Big data from small data:  A survey of the neuroscience landscape through the Neuroscience Information Framework"

  1. 1. Maryann  E.    Martone,  Ph.  D.   University  of  California,  San  Diego  
  2. 2. Neuroscience  is  unlikely  to  be  served  by   a  few  large  databases  like  the  genomics   and  proteomics  community   Whole  brain  data   (20  um   microscopic  MRI)   Mosiac  LM   images  (1  GB+)   ConvenNonal  LM   images   Individual  cell   morphologies   EM  volumes  &   reconstrucNons   Solved  molecular   structures   No  single  technology  serves   these  all  equally  well.    Mul6ple  data  types;     mul6ple  scales;    mul6ple   databases  
  3. 3. hPp://neuinfo.org  
  4. 4. •  NIF’s  mission  is  to  maximize  the  awareness  of,  access  to   and  uNlity  of  research  resources  produced  worldwide  to   enable  bePer  science  and  promote  efficient  use   –  NIF  unites  neuroscience  informaNon  without  respect  to  domain,   funding  agency,  insNtute  or  community   –  NIF  is  like  a  “Pub  Med”  for  all  biomedical  resources  and  a  “Pub   Med  Central”  for  databases   –  Makes  them  searchable  from  a  single  interface   –  PracNcal  and  cost-­‐effecNve;    tries  to  be  sensible   –  Learned  a  lot  about  current  data  prac6ces   The  Neuroscience  InformaNon  Framework  is  an  iniNaNve  of  the   NIH  Blueprint  consorNum  of  insNtutes        hPp://neuinfo.org  
  5. 5. h=p://neuinfo.org   June10,  2013   dkCOIN  InvesNgator's  Retreat   6   •  A  portal  for  finding  and  using   neuroscience  resources     A  consistent  framework  for   describing  resources     Provides  simultaneous   search  of  mulNple  types  of   informaNon,  organized  by   category     Supported  by  an  expansive   ontology  for  neuroscience     UNlizes  advanced   technologies  to  search  the   “hidden  web”   UCSD,  Yale,  Cal  Tech,  George  Mason,  Washington  Univ   Literature   Database   FederaNon   Registry  
  6. 6. We’d  like  to  be  able  to  find:   •  What  is  known****:   –  What  are  the  projecNons  of  hippocampus?   –  Is  GRM1  expressed  In  cerebral  cortex?   –  What  genes  have  been  found  to  be  upregulated  in   chronic  drug  abuse  in  adults   –  What  animal  models  have  similar  phenotypes  to   Parkinson’s  disease?   –  What  studies  used  my  polyclonal  anNbody  against   GABA  in  humans?   •  What  is  not  known:   –  ConnecNons  among  data   –  Gaps  in  knowledge   A  framework  makes  it  easier  to  address  these  quesNons  
  7. 7. With  the  thousands  of  databases  and  other  informaNon  sources   available,  simple  descripNve  metadata  will  not  suffice  
  8. 8. • NIF  curators   • NominaNon  by  the   community   • Semi-­‐automated  text   mining  pipelines    NIF  Registry    Requires  no  special   skills    Site  map  available   for  local  hosNng   • NIF  Data  FederaNon   • DISCO  interop   • Requires  some   programming  skill   • Open  Source  Brain  <   2  hr   Two  Nered  system:    low  barrier  to  entry  
  9. 9. Current   Planned   DISCO  Dashboard  Func6ons   •  Ingest  Script  Manager   •  Public  Script  Repository   •  Data  &  Event  Tracker   •  Versioning  System   •  Curator  Tool     •  Data  Transformer  Manager   June10,  2013   dkCOIN  InvesNgator's  Retreat   11  Luis  Marenco,  Rixin  Wang,  Perrry  Miller,  Gordon  Shepherd   Yale  University  
  10. 10. NIF  was  designed  to  be  populated  rapidly   with  progressive  refinement  
  11. 11. Databases  come  in  many  shapes  and  sizes   •  Primary  data:   –  Data  available  for  reanalysis,  e.g.,   microarray  data  sets  from  GEO;     brain  images  from  XNAT;     microscopic  images  (CCDB/CIL)   •  Secondary  data   –  Data  features  extracted  through   data  processing  and  someNmes   normalizaNon,  e.g,  brain  structure   volumes  (IBVD),  gene  expression   levels  (Allen  Brain  Atlas);    brain   connecNvity  statements  (BAMS)   •  TerNary  data   –  Claims  and  asserNons  about  the   meaning  of  data   •  E.g.,  gene  upregulaNon/ downregulaNon,  brain   acNvaNon  as  a  funcNon  of  task   •  Registries:   –  Metadata   –  Pointers  to  data  sets  or   materials  stored  elsewhere   •  Data  aggregators   –  Aggregate  data  of  the  same   type  from  mulNple  sources,   e.g.,  Cell  Image   Library  ,SUMSdb,  Brede   •  Single  source   –  Data  acquired  within  a  single   context  ,  e.g.,  Allen  Brain  Atlas   Researchers  are  producing  a  variety  of   informaNon  arNfacts  using  a  mulNtude  of   technologies  
  12. 12. Hippocampus  OR  “Cornu  Ammonis”  OR   “Ammon’s  horn”   Query  expansion:    Synonyms   and  related  concepts   Boolean  queries   Data  sources   categorized  by   “data  type”  and   level  of  nervous   system   Common  views   across  mulNple   sources   Tutorials  for  using   full  resource  when   geong  there  from   NIF   Link  back  to   record  in   original  source  
  13. 13. Connects  to   Synapsed  with   Synapsed  by   Input  region   innervates   Axon  innervates   Projects  to  Cellular  contact   Subcellular  contact   Source  site   Target    site   Each  resource  implements  a  different,  though  related  model;     systems  are  complex  and  difficult  to  learn,  in  many  cases  
  14. 14. •  You  (and  the  machine)  have  to  be  able  to  find  it   –  Accessible  through  the  web   –  Structured  or  semi-­‐structured   –  AnnotaNons   •  You  (and  the  machine)    have  to  be  able  to  use  it   –  Data  type  specified  and  in  an  acNonable  form   •  You  (and  the  machine)  have  to  know  what  the  data   mean   •  SemanNcs   •  Context:    Experimental  metadata   •  Provenance:    where  did  they  come  from  
  15. 15. Knowledge  in  space  and  spaNal  relaNonships   (the  “where”)   Knowledge  in  words,  terminologies  and   logical  relaNonships  (the  “what”)  
  16. 16. Purkinje   Cell   Axon   Terminal   Axon   DendriNc   Tree   DendriNc   Spine   Dendrite   Cell  body   Cerebellar   cortex   There  is  liPle  obvious  connecNon  between   data  sets  taken  at  different  scales  using   different  microscopies  without  an  explicit   representaNon  of  the  biological  objects  that   the  data  represent  
  17. 17. •  NIF  covers  mulNple  structural  scales  and  domains  of  relevance  to  neuroscience   •  Aggregate  of  community  ontologies  with  some  extensions  for  neuroscience,  e.g.,  Gene   Ontology,  Chebi,  Protein  Ontology   NIFSTD   Organism   NS  FuncNon  Molecule   InvesNgaNon   Subcellular   structure   Macromolecule   Gene   Molecule  Descriptors   Techniques   Reagent   Protocols   Cell   Resource   Instrument   DysfuncNon   Quality   Anatomical   Structure  
  18. 18. Brain   Cerebellum   Purkinje  Cell  Layer   Purkinje  cell   neuron   has  a   has  a   has  a   is  a   •  Ontology:  an  explicit,  formal   representaNon  of  concepts     relaNonships  among  them  within   a  parNcular  domain  that   expresses  human  knowledge  in  a   machine  readable  form   •  Branch  of  philosophy:    a  theory   of  what  is   •  e.g.,  Gene  ontologies  
  19. 19. •  Express  neuroscience  concepts  in  a  way  that  is  machine  readable     –  Synonyms,  lexical  variants   –  DefiniNons   •  Provide  means  of  disambiguaNon  of  strings   –  Nucleus  part  of  cell;    nucleus  part  of  brain;    nucleus  part  of  atom   •  Rules  by  which  a  class  is  defined,  e.g.,  a  GABAergic  neuron  is  neuron  that  releases  GABA  as  a   neurotransmiPer   •  ProperNes   –  Support  reasoning   •  Provide  universals  for  navigaNng  across  different  data  sources   –  SemanNc  “index”   –  Link  data  through  relaNonships  not  just  one-­‐to-­‐one  mappings   •  Provide  the  basis  for  concept-­‐based  queries  to  probe  and  mine  data   •  Establish  a  semanNc  framework  for  landscape  analysis   MathemaNcs,  Computer  code  or  Esperanto  
  20. 20. birnlex_1732   Brodmann.1   Explicit  mapping  of  database  content  helps  disambiguate  non-­‐unique  and  custom   terminology  
  21. 21. June10,  2013   24   Aligns  sources  to  the  NIF  semanNc  framework  
  22. 22. •  Search  Google:    GABAergic  neuron   •  Search  NIF:    GABAergic  neuron   –  NIF  automaNcally  searches  for  types  of   GABAergic  neurons   Types  of  GABAergic   neurons   Search by meaning not by string
  23. 23. Equivalence  classes;    restricNons   Arbitrary  but  defensible   • Neurons  classified  by   • Circuit  role:    principal  neuron  vs   interneuron   • Molecular  consNtuent:    Parvalbumin-­‐ neurons,  calbindin-­‐neurons   • Brain  region:    Cerebellar  neuron   • Morphology:    Spiny  neuron   •   Molecule  Roles:    Drug  of  abuse,  anterograde   tracer,  retrograde  tracer   • Brain  parts:    Circumventricular  organ   • Organisms:    Non-­‐human  primate,  non-­‐human   vertebrate   • QualiNes:    Expression  level   • Techniques:    Neuroimaging  
  24. 24. What  genes  are  upregulated  by  drugs  of  abuse  in  the   adult  mouse?  (show  me  the  data!)   Morphine   Increased   expression   Adult  Mouse  
  25. 25. • NIF  ConnecNvity:    7  databases  containing  connecNvity  primary  data  or  claims   from  literature  on  connecNvity  between  brain  regions   • Brain  Architecture  Management  System  (rodent)   • Temporal  lobe.com  (rodent)   • Connectome  Wiki  (human)   • Brain  Maps  (various)   • CoCoMac  (primate  cortex)   • UCLA  MulNmodal  database  (Human  fMRI)   • Avian  Brain  ConnecNvity  Database  (Bird)   • Total:    1800  unique  brain  terms  (excluding  Avian)   • Number  of  exact  terms  used  in  >  1  database:    42   • Number  of  synonym  matches:    99   • Number  of  1st  order  partonomy  matches:    385  
  26. 26. hPp://neurolex.org   • SemanNc  MediWiki   • Provide  a  simple  interface   for  defining  the  concepts   required   • Light  weight  semanNcs   • Good  teaching  tool  for   learning  about  semanNc   integraNon  and  the  benefits  of   a  consistent  semanNc   framework   • Community  based:   • Anyone  can  contribute  their   terms,  concepts,  things   • Anyone  can  edit   • Anyone  can  link   • Accessible:    searched  by  Google   • Growing  into  a  significant   knowledge  base  for   neuroscience   • InternaNonal  NeuroinformaNcs   CoordinaNng  Facility     Demo    D03   Larson  et  al,  FronNers  in  NeuroinformaNcs,  in  press  
  27. 27. •  Neurolex  provides  an   on-­‐line  computable   index  for  expressing   models  in  semanNc   terms,  and  linking  to   other  knowledge  and   data   •  Implemented  forms   for  certain  types  of   enNNes   •  Neuroscience   knowledge  in  the  web   Pages  are  linked  through  properNes;    Knowledge-­‐base  built  through  cross-­‐ modular  relaNons  and  links  to  data;    red  links  
  28. 28. •  >  1000  Dicom  Terms   –  Karl  Helmer   –  Data  Sharing  Task  Force   •  Tasks  and  CogniNve  Concepts   from  CogniNve  Atlas   –  Russ  Poldrack   •  >280  Neurons   –  Gordon  Shepherd  and  30  world   wide  experts   •  ~500  fly  neurons  from  Fly   Anatomy  Ontology   –  David  Osumi-­‐Sutherland   •  >1200  Brain  parcellaNons   `20,000  concepts:      Spreadsheet  downloads,  through  NIF  Web  Services,   SPARQL  endpoint    200,000   edits    150   contributors  
  29. 29. Because  they  are  staNc  URL’s,  Wikis  are  searchable  by   Google  
  30. 30. Neurolex:    >  1  million  triples Dr.  Yi  Zeng:    Chinese  neural  knowledge  base   NIF  Cell  Graph  
  31. 31. 1.  Look  brain  region  up  in  NeuroLex   2.  Look  up  cells  contained  in  the  brain   region   3.  Find  those  cells  that  are  known  to  project   out  of  that  brain  region   4.  Look  up  the  neurotransmiPers  for  those   cells   5.  Determine  whether  those   neurotransmiPers  are  known  to  be   excitatory  or  inhibitory   6.  Report  the  projecNon  as  excitatory  or   inhibitory,  and  report  the  enNre  chain  of   logic  with  links  back  to  the  wiki  pages   where  they  were  made   7.  Make  sure  user  can  get  back  to  each   statement  in  the  logic  chain  to  edit  it  if   they  think  it  is  wrong   Stephen  Larson   CHEBI:18243   Are  projecNons  from  the  VTA  excitatory   or  inhibitory?  
  32. 32. •  INCF  Project   –  Neuron  Registry   –  >  30  experts   worldwide   –  Fill  out  neuron   pages  in  Neurolex   Wiki   –  Led  by  Dr.  Gordon   Shepherd   Soma  locaNon   Dendrite  locaNon   Axon  locaNon   0   50   100   150   200   250   300   Number   Total   redlinks   easy  fixes   hard  fixes   Soma  locaNon   Dendrite  locaNon   Axon  locaNon   Social  networks  and  community  sites  let  us  learn  things  from  the   collecNve  behavior  of  contributors  
  33. 33. 37   neurolex.org: Semantic Wiki • INCF Community encyclopedia • Define all vocabulary, terms, protocols, brain structures, diseases, etc • Living review articles • Links to data, models and literature • Semantic organization, search, analysis and integration • Searchable via the web • Global directory of all shared vocabularies, CDEs, etc Slide  courtesy  of  Sean  Hill:    InternaNonal  NeuroinformaNcs  CoordinaNng  Facility  
  34. 34. MarNn  Telefont,  HBP:    Lab  Space  connecNng  to  Knowledge  Space  
  35. 35. •  NIF  can  be  used  to  survey  the   data  landscape   •  Analysis  of  NIF  shows  mulNple   databases  with  similar  scope   and  content   •  Many  contain  parNally   overlapping  data   •  Data  “flows”  from  one   resource  to  the  next   –  Data  is  reinterpreted,  reanalyzed  or   added  to   •  Is  duplicaNon  good  or  bad?   NIF  is  trying  to  make  it  easier  to  work  with  diverse  data  
  36. 36. NIF  is  in  a  unique  posiNon  to  answer  quesNons  about  the  neuroscience   landscape:    Kepler  Workflow  engine  +  NIF  semanNcs   Where  are  the  data?   Striatum   Hypothalamus   Olfactory  bulb   Cerebral  cortex   Brain   Brain  region   Data  source  
  37. 37. ∞   What  is  easily  machine   processable  and  accessible   What  is  potenNally  knowable   What  is  known:   Literature,  images,  human   knowledge   Unstructured;     Natural  language   processing,  enNty   recogniNon,  image   processing  and   analysis;  paywalls   communicaNon   Abstracts  vs  full   text  vs  tables  etc  
  38. 38. Closed  world  vs  open  world   We  know  a  lot  about  some  things  and  less  about  others;    some   of  NIF’s  sources  are  comprehensive;    others  are  highly  biased   But...NIF  has  >  2M  anNbodies,   338,000  model  organisms,  and  3   million  microarray  records  
  39. 39. Neocortex   Olfactory  bulb   Neostriatum   Cochlear  nucleus   All  neurons  with  cell  bodies  in  the  same  brain  region  are  grouped   together   ProperNes  in  Neurolex  
  40. 40. Exposing  knowledge  gaps  and  biases   Where  are  the  data?   Striatum   Hypothalamus   Olfactory  bulb   Cerebral  cortex   Brain   Brain  region   Data  source   Funding  
  41. 41. •  Gemma:    Gene  ID    +  Gene  Symbol   •  DRG:    Gene  name  +  Probe  ID   •  Gemma  presented  results  relaNve  to  baseline  chronic   morphine;    DRG  with  respect  to  saline,  so  direcNon  of  change  is   opposite  in  the  2  databases   •           Analysis:   • 1370  statements  from  Gemma  regarding  gene  expression  as  a  funcNon  of  chronic   morphine   • 617  were  consistent  with  DRG;      over  half    of  the  claims  of  the  paper  were  not   confirmed  in  this  analysis   • Results  for  1  gene  were  opposite  in  DRG  and  Gemma   • 45  did  not  have  enough  informaNon  provided  in  the  paper  to  make  a  judgment   RelaNvely  simple  standards  would  make  life  easier  
  42. 42. NIF  favors  a  hybrid,  Nered,   federated  system   •  Domain  knowledge   –  Ontologies   •  Claims,  models  and   observaNons   –  Virtuoso  RDF  triples     –  Model  repositories   •  Data   –  Data  federaNon   –  SpaNal  data   –  Workflows   •  NarraNve   –  Full  text  access   Neuron   Brain  part   Disease   Organism   Gene   Caudate  projects  to   Snpc   Grm1  is  upregulated  in   chronic  cocaine   Betz  cells   degenerate  in  ALS   NIF  provides  the  tentacles  that  connect  the  pieces:    a   new  type  of  enNty  for  21st  century  science   Technique   People  
  43. 43. Scholar   Library   Scholar   Publisher   FORCE11.org:    Future  of  research  communicaNons  and  e-­‐scholarship  
  44. 44. Scholar   Consumer   Libraries   Data  Repositories   Code  Repositories   Community  databases/ pla}orms   OA   Curators   Social   Networks   Social   Networks  Social   Networks   Peer  Reviewers   NarraNve   Workflows   Data   Models   MulNmedia   NanopublicaNons   Code  
  45. 45. •  Of  the  ~  4000  columns   that  NIF  queries,   ~1300  map  to  one  of   our  core  categories:   –  Organism   –  Anatomical  structure   –  Cell   –  Molecule   –  FuncNon   –  DysfuncNon   –  Technique   •  30-­‐50%  of  NIF’s   queries  autocomplete   •  When  NIF  combines   mulNple  sources,  a  set   of  common  fields   emerges   –  >Basic  informaNon   models/semanNc   models  exist  for   certain  types  of   enNNes   SemanNc  frameworks  create  spaces  in  which  to  compare  the  current  state  of   data  and  knowledge  
  46. 46. •  Several  powerful  trends  should  change  the  way  we  think  about  our   data:    One    Many   –  Many  data   •  GeneraNon  of  data  is  geong  easier    shared  data   •  Data  space  is  geong  richer:    more  –omes  everyday   •  But...compared  to  the  biological  space,  sNll  sparse   –  Many  resources:    everyone  wants  to  be  “the”  one  but  e  pluribus  unum   –  Many  eyes   •  Wisdom  of  crowds   •  More  than  one  way  to  interpret  data   –  Many  algorithms   •  Not  a  single  way  to  analyze  data   –  Many  analyNcs   •  “Signatures”  in  data  may  not  be  directly  related  to  the  quesNon  for  which  they   were  acquired  but  tell  us  something  really  interesNng   New  works  need  to  be  created  with  an  eye   towards  the  web  and  interoperability  
  47. 47. Jeff  Grethe,  UCSD,  Co  InvesNgator,  Interim  PI   Amarnath  Gupta,  UCSD,  Co  InvesNgator   Anita  Bandrowski,  NIF  Project  Leader   Gordon  Shepherd,  Yale  University   Perry  Miller   Luis  Marenco   Rixin  Wang   David  Van  Essen,  Washington  University   Erin  Reid   Paul  Sternberg,  Cal  Tech   Arun  Rangarajan   Hans  Michael  Muller   Yuling  Li   Giorgio  Ascoli,  George  Mason  University   Sridevi  Polavarum   Fahim  Imam   Larry  Lui   Andrea  Arnaud  Stagg   Jonathan  Cachat   Jennifer  Lawrence   Svetlana  Sulima   Davis  Banks   Vadim  Astakhov   Xufei  Qian   Chris  Condit   Mark  Ellisman   Stephen  Larson   Willie  Wong   Tim  Clark,  Harvard  University   Paolo  Ciccarese   Karen  Skinner,  NIH,  Program  Officer   (reNred)   Jonathan  Pollock,  NIH,  Program  Officer   And  my  colleagues  in  Monarch,  dkNet,  3DVC,  Force  11  
  48. 48. Data  Space   Laboratory   Space   Knowledge   Space   BAMS   Lexicon   Encyclopedia  
  49. 49. 47/50  major  preclinical   published  cancer  studies   could  not  be  replicated   •  “The  scienNfic  community   assumes  that  the  claims  in  a   preclinical  study  can  be  taken  at   face  value-­‐that  although  there   might  be  some  errors  in  detail,   the  main  message  of  the  paper   can  be  relied  on  and  the  data   will,  for  the  most  part,  stand  the   test  of  Nme.    Unfortunately,  this   is  not  always  the  case.”     •  Geong  data  out  sooner  in  a   form  where  they  can  be   exposed  to  many  eyes  and   many  analyses  may  allow  us   to  expose  errors  and  develop   bePer  metrics  to  evaluate  the   validity  of  data   Begley  and  Ellis,  29  MARCH  2012  |  VOL  483  |   NATURE  |  531  
  50. 50. •  Every  resource  is  resource  limited:    few  have  enough  Nme,  money,   staff  or    experNse  required  to  do  everything  they  would  like   –  If  the  market  can  support  11  MRI  databases,  fine   –  Some  consolidaNon,  coordinaNon  is  usually  warranted   •  Big,  broad  and  messy  beats  small,  narrow  and  neat   –  Without  trying  to  integrate  a  lot  of  data,  we  will  not  know  what  needs  to  be  done   –  Progressive  refinement;    addiNon  of  complexity  through  layers   •  Be  flexible  and  opportunisNc   –  A  single    opNmal  technology/container  for  all  types  of  scienNfic  data  and  informaNon   does  not  exist;    technology  is  changing   •  Think  globally;    act  locally:   –  No  source,  not  even  NIF,  is  THE  source;    we  are  all  a  source   –  Think  about  interoperaNon  from  the  incepNon  
  51. 51. Regional  part  of   nervous  system   ParcellaNon   scheme  parcel   ParcellaNon   scheme  parcel   Single  species  or  strain   ParcellaNon  scheme   Precise  definiNon   Technique   INCF  Task  Force:    Alan  Rutenberg,    Seth  Ruffins     FuncNonal  part  of   nervous  system   ParNally  overlaps   Taxon  rank   General  hierarchy  
  52. 52.  1200  parts  of  nervous   system  characterized   (mostly)    according  to   CUMBO  terms    1200  “parcels”  from   individual  atlases/papers    700  neurons    280  via  Neuron   Registry    Available  via  NIF   vocabulary  services  (REST)    Hosted  in  a  Virtuoso   triple  store  via  SPARQL  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×