Where	  are	  the	  Data?	  	  Perspec.ves	  from	  the	  Neuroscience	  Informa.on	  Framework	                          ...
Introduc*on	  
“Neural	  Choreography”	  “A	  grand	  challenge	  in	  neuroscience	  is	  to	  elucidate	  brain	  func3on	  in	  rela3o...
“We	   speak	   piously	   of	   taking	  measurements	   and	   making	  small	   	   studies	   that	   will	   add	  an...
The	  	  Data	  Federa*on	  Problem	                                                                                      ...
Where	  are	  the	  data?	  
What	  do	  you	  mean	  by	  data?	  Databases	  come	  in	  many	  shapes	  and	  sizes	  •  Primary	  data:	           ...
Data,	  not	  just	  stories	  about	  them!	  47/50	  major	  preclinical	  published	                          •      “T...
In	  an	  ideal	  world...	  We’d	  like	  to	  be	  able	  to	  find	  •  What	  is	  known:	         –  What	  is	  the	 ...
The	  Problems	  Researchers	  Face	                      • 	  We	  are	  not	  publishing	  data	  in	  a	  form	        ...
But	  we	  have	  Google!	  •  Current	  web	  is	                 •  Wikipedia:	  	  The	  Deep	     designed	  to	  shar...
But	  we	  have	  Pub	  Med!	         •  Bulk	  of	  neuroscience	                      •  Structured	  vs.	            da...
NIF:	  A	  New	  Type	  of	  En*ty	  for	  New	       Modes	  of	  Scien*fic	  Dissemina*on	  •  NIF’s	  mission	  is	  to	...
People	  use	  NIF	  to...	  •  Find	  resources	       –  “Where	  can	  I	  find	  a	  translaEon	  of	  Talaraich	  to	 ...
An	  Overview	  of	  NIF	  •  Assembled	  the	  largest	  searchable	     colla3on	  of	  neuroscience	  data	  on	  the	 ...
NIF	  services	  for	  data	  providers	  •  NIF	  ensures	  that	  all	  data	  are	  discoverable,	     accessible	  and...
Registering	  a	  resource	  in	  NIF	  NIF	  provides	  a	  set	  of	  tools	  and	  services	  for	  easy	  sharing	  of...
NIF	  Registry	  •  NIF	  Registry:	  	  each	     resource	  gets	  its	  own	  URI	     and	  own	  Wiki	  page	       –...
The	  NeuroLex	  Wiki:	  	  A	  lexicon	  for	                       neuroscience	  •  Seman3c	  wiki	     tracking	  >	  ...
A	  dynamic	  index	  for	  neuroscience	                    Parts	  of	  rodent	  brain	                                 ...
A	  Seman*cally	  Enabled	  Search	  Engine	  •  NIF	  has	  developed	  a	  produc3on	  technology	  planorm	     for	  r...
NIF	  Data	  Federa*on	                                               1000	                                               ...
NIF	  Search	  Interface	  
NIF	  Search	  Interface	  
Making	  common	  neuroscience	  concepts	  computable:	  	                    concept-­‐based	  queries	  •  Search	  Goo...
“Search	  compu*ng”	  What	  genes	  are	  upregulated	  by	  drugs	  of	  abuse	                     in	  the	  adult	  m...
NIF	  STANDARD	  ONTOLOGIES	  (NIFSTD)	  •      Set	  of	  modular	  ontologies	  	                                       ...
Data	  Services	  for	  Users	  Vocabulary	  	  •  NITRC	  (autocomplete)	  •  Neuroscience.com	  (annotate)	  •  INCF	  A...
NIF	  Link	  Out	  Broker:	  	  Connec*ng	                   Resources	                                       NIF	  insert...
Grabbing	  the	  long	  tail	  of	  small	  data	  •  Analysis	  of	  NIF	  shows	  mul3ple	  databases	  with	     simila...
NIF	  Analy*cs:	  	  The	  Neuroscience	  Ecosystem	                                                              Where	  ...
How	  much	  of	  the	  landscape	  do	  we	  have?	                                   Query	  for	  “reference”	  brain	 ...
Embracing	  duplica*on:	  	  Data	  Mash	  ups	      • 	  ~300	  PMID’s	  were	  common	  between	  Brede	  and	  SUMSdb	 ...
Same	  data:	  	  different	  analysis	  •  Drug	  Related	  Gene	  database:	  	             Chronic	  vs	  acute	  morphi...
How	  easy	  was	  it	  to	  compare?	  •         Gemma:	  	  Gene	  ID	  	  +	  Gene	  Symbol	  •         DRG:	  	  Gene	...
A	  global	  view	  of	  data	  Informa*cs	  should	  not	  be	  an	  aherthought	     –  You	  (and	  the	  machine)	  ha...
Compe**on	                     Coopera*on	                     Coordina*on	                     Collabora*on	  •    We	  l...
NIF	  team	  (past	  and	  present)	  Maryann	  Martone,	  UCSD,	  Principal	  Inves3gator	     Vadim	  Astakhov	  Jeffrey	...
Thank	  You…	  
Upcoming SlideShare
Loading in …5
×

Where are the Data? Perspectives from the Neuroscience Information Framework.

1,480 views

Published on

Presented during the 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC'12). Part of the workshop 'New Models and Modes for Data Sharing: Experiences from Neuroscience'. Presented by Jeffrey S. Grethe, Ph.D. from the Center for Research in Biological Systems at the University of California, San Diego.

This workshop featured several large scale efforts to establish data sharing platforms, standards and tools to promote data intensive analysis in the neurosciences. As we head into the second decade of the 21st century, many scientists realize that current methods for publishing and accessing data are outmoded and inefficient. Neuroscience, with its large diverse and highly competitive community, has been slow to adopt more open sharing of data and has lacked effective tools to do so. There has been a significant investment in databases and tools for biological science, and frequent calls for more of them, but few calls to the biological community to adopt practices and frameworks for making their resources more easily discoverable and data more accessible. Data are contained within diverse sources, from web pages, databases, literature to personal lab systems, making for a haphazard mechanism for data and tool discovery. Although these mechanisms are effective for small communities, they are parochial for the totality of resources available, leading to fragmentation in the resource ecosystem. Neuroscience, with its diverse subdisciplines, complex data types and broad domain, presents the perfect exemplar of the current practices, bottlenecks and issues surrounding open access to data. This situation is changing, however, as groups have started to work together to define new models and tools for sharing and analyzing neuroscience data on an international scale. In this workshop, we bring together experts from national and international projects to discuss issues of data access and progress towards establishing platforms and best practices for effective sharing of neuroscience data in support of basic and clinical neuroscience.

  • Be the first to comment

  • Be the first to like this

Where are the Data? Perspectives from the Neuroscience Information Framework.

  1. 1. Where  are  the  Data?    Perspec.ves  from  the  Neuroscience  Informa.on  Framework     Jeffrey  S.  Grethe,  Ph.  D.   Center  for  Research  in  Biological  Systems   University  of  California,  San  Diego  
  2. 2. Introduc*on  
  3. 3. “Neural  Choreography”  “A  grand  challenge  in  neuroscience  is  to  elucidate  brain  func3on  in  rela3on  to   its  mul3ple  layers  of  organiza3on  that  operate  at  different  spa3al  and   temporal  scales.    Central  to  this  effort  is  tackling  “neural  choreography”  -­‐-­‐   the  integrated  func3oning  of  neurons  into  brain  circuits-­‐-­‐their  spa3al   organiza3on,  local  and  long-­‐distance  connec3ons,  their  temporal   orchestra3on,  and  their  dynamic  features.  Neural  choreography  cannot  be   understood  via  a  purely  reduc3onist  approach.  Rather,  it  entails  the   convergent  use  of  analy3cal  and  synthe3c  tools  to  gather,  analyze  and   mine  informa*on  from  each  level  of  analysis,  and  capture  the  emergence   of  new  layers  of  func3on  (or  dysfunc3on)  as  we  move  from  studying  genes   and  proteins,  to  cells,  circuits,  thought,  and  behavior....    However,  the  neuroscience  community  is  not  yet  fully  engaged  in  exploiEng   the  rich  array  of  data  currently  available,  nor  is  it  adequately  poised  to   capitalize  on  the  forthcoming  data  explosion.  “   Akil  et  al.,  Science,  Feb  11,  2011      
  4. 4. “We   speak   piously   of   taking  measurements   and   making  small     studies   that   will   add  another   brick   to   the   temple   of  science.     Most   such   bricks   just  lie  around  the  brickyard.”   "We   now   have   unprecedented   PlaO,  J.R.  (1964)  Strong   ability   to   collect   data   about   Inference.  Science.  146:   nature…but  there  is  now  a  crisis   347-­‐353.   developing   in   biology,   in   that     c o m p l e t e l y   u n s t r u c t u r e d   informa*on   does   not   enhance   understanding”         Sidney  Brenner    
  5. 5. The    Data  Federa*on  Problem   No  single  technology  serves  these  all   equally  well.   à Mul*ple  data  types;    mul*ple   scales;    mul*ple  databases  Whole  brain  data     (20  um  microscopic  MRI)   Mosiac  LM   images  (1  GB+)   Conven3onal  LM   images   Individual  cell   morphologies  Neuroscience  is  unlikely  to  be  served  by  a  few  large  databases   EM  volumes  &   reconstruc3ons  like  the  genomics  and  proteomics  community   Solved  molecular   structures  
  6. 6. Where  are  the  data?  
  7. 7. What  do  you  mean  by  data?  Databases  come  in  many  shapes  and  sizes  •  Primary  data:   •  Registries:   –  Data  available  for  reanalysis,  e.g.,   –  Metadata   microarray  data  sets  from  GEO;     –  Pointers  to  data  sets  or   brain  images  from  XNAT;     materials  stored  elsewhere   microscopic  images  (CCDB/CIL)   •  Data  aggregators  •  Secondary  data   –  Aggregate  data  of  the  same   –  Data  features  extracted  through   type  from  mul3ple  sources,   data  processing  and  some3mes   e.g.,  Cell  Image   normaliza3on,  e.g,  brain  structure   Library  ,SUMSdb,  Brede   volumes  (IBVD),  gene  expression   •  Single  source   levels  (Allen  Brain  Atlas);    brain   –  Data  acquired  within  a  single   connec3vity  statements  (BAMS)   context  ,  e.g.,  Allen  Brain  Atlas  •  Ter3ary  data   –  Claims  and  asser3ons  about  the   meaning  of  data   •  E.g.,  gene  upregula3on/ downregula3on,  brain   ac3va3on  as  a  func3on  of  task  
  8. 8. Data,  not  just  stories  about  them!  47/50  major  preclinical  published   •  “There  are  no  guidelines  that  cancer  studies  could  not  be  replicated   require  all  data  sets  to  be   reported  in  a  paper;  oeen,  •  “The  scien3fic  community   original  data  are  removed  during   assumes  that  the  claims  in  a   the  peer  review  and  publicaEon   preclinical  study  can  be  taken   process.  “   at  face  value-­‐that  although   there  might  be  some  errors  in   detail,  the  main  message  of   •  GeQng  data  out  sooner  in  a   the  paper  can  be  relied  on   form  where  they  can  be   exposed  to  many  eyes  and   and  the  data  will,  for  the   many  analyses,  and  easily   most  part,  stand  the  test  of   compared,    may  allow  us  to   3me.    Unfortunately,  this  is   expose  errors  and  develop   not  always  the  case.”     beSer  metrics  to  evaluate  the   validity  of  data   Begley  and  Ellis,  29  MARCH  2012  |  VOL  483  |  NATURE  |  531  
  9. 9. In  an  ideal  world...  We’d  like  to  be  able  to  find  •  What  is  known:   –  What  is  the  average  diameter  of  a    Purkinje  neuron   –  Is  GRM1  expressed  In  cerebral  cortex?   –  What  are  the  projec3ons  of  hippocampus?   –  What  genes  have  been  found  to  be  upregulated  in   chronic  drug  abuse  in  adults   –  Find  images  showing  dendri3c  spines  containing   membrane  bound  organelles   –  What  animal  models  have  similar  phenotypes  to   Parkinson’s  disease?   –  What  studies  used  my  polyclonal  an3body  against   GABA  in  humans?  •  What  is  not  known:   –  Connec3ons  among  data   –  Gaps  in  knowledge       Without  some  sort  of  framework,  very  difficult  to  do  
  10. 10. The  Problems  Researchers  Face   •   We  are  not  publishing  data  in  a  form   that  is  easy  to  find  or  integrate   •   What  we  mean  isn’t  clear  to  a   search  engine  (or  even  to  a   human)   •   NIF  Registry:    A  catalog  of   neuroscience-­‐relevant  resources   >  4700  currently  described   >  2000  databases   •   Searching    and  naviga*ng  across   individual  resources  takes  an   inordinate  amount  of  human  effort  
  11. 11. But  we  have  Google!  •  Current  web  is   •  Wikipedia:    The  Deep   designed  to  share   Web  (also  called   documents   Deepnet,  the  invisible   –  Documents  are   Web,  DarkNet,   unstructured  data   Undernet  or  the  hidden   Web)  refers  to  World  •  Much  of  the  content  of   Wide  Web  content  that   digital  resources  is  part   is  not  part  of  the   of  the  “hidden  web”   Surface  Web,  which  is     indexed  by  standard   search  engines.  
  12. 12. But  we  have  Pub  Med!   •  Bulk  of  neuroscience   •  Structured  vs.   data  is  published  as   unstructured   part  of  papers   informa3on   –  >  20,000,000      “...it  is  a  growing  challenge  to  ensure  that  data  produced  during  the  course  of  reported  research  are  appropriately  described,  standardized,  archived,  and  available  to  all.”    Lead  Science  editorial  (Science  11  February  2011:  Vol.  331  no.  6018  p.  649  )     Author,  year,  journal,  keywords    
  13. 13. NIF:  A  New  Type  of  En*ty  for  New   Modes  of  Scien*fic  Dissemina*on  •  NIF’s  mission  is  to  maximize  the  awareness  of,  access  to  and   u3lity  of  digital  resources  produced  worldwide  to  enable  beher   science  and  promote  efficient  use   –  NIF  is  the  only  neuroscience  informa3on  en3ty  that  views  resources   globally  without  respect  to  domain,  funding  agency,  ins3tute  or   community   –  NIF  is  like  a  “Pub  Med”  for  all  neuroscience  resources   –  Aggregates  all  the  different  databases,  tools  and  resources  now   produced  by  the  scien3fic  community   –  Makes  them  searchable  from  a  single  interface   –  A  prac3cal  approach  to  the  data  deluge   –  The  “authority”  on  resources  for  neuroscience   –  Educate  neuroscien*sts  and  students  about  effec*ve  data  sharing    
  14. 14. People  use  NIF  to...  •  Find  resources   –  “Where  can  I  find  a  translaEon  of  Talaraich  to  MNI  coordinates-­‐  NIF  Forum   –  “What  biospecimen  banks  are  available  with  Essues  from  opiate  addicts?”-­‐NIH  •  Find  answers   –  What  is  the  amount  of  data  published  on  males  vs  females-­‐  NIH   –  “What  projects  to  the  ventral  lateral  geniculate  nucleus”-­‐UCSD  researcher   –  “What  is  known  about  the  choroid  plexus?”-­‐Small  business  owner   •  NIF  is  listed  in  the  library  guides  of  >  85  research  universi3es  worldwide  (ñ  70%  from  last  year)   •  NIF  receives  hits  from  >  350  colleges  and  universi3es  every  month   •  NIF  receives  hits  from  pharmaceu3cal  companies   •  Listed  as  link  on  4  socie3es:    Society  for  Neuroscience,  American  Associa3on  of  Anatomists,   Society  of  Immune  Pharmacology,  American  Academy  of  Neurology  •  Track  resource  u3liza3on   –  What  projects  are  using  my  an3body/mouse/database?  •  Serve  as  a  springboard   –  NIF  ontologies,  tools  and  data  resources  are  used  by  many  groups  (>80,000  hits/ month  on  NIF  services)   –  NIF  technologies  and  exper3se  jumpstart  related  efforts   •  One  Mind  for  Research  
  15. 15. An  Overview  of  NIF  •  Assembled  the  largest  searchable   colla3on  of  neuroscience  data  on  the   web  •  The  largest  catalog  of  biomedical   resources  (data,  tools,  materials,   services)  available  •  The  largest  ontology  for  neuroscience  •  NIF  search  portal:    simultaneous  search   over  data,  NIF  catalog  and  biomedical   literature  •  Neurolex  Wiki:    a  community  wiki   serving  neuroscience  concepts  •  A  unique  technology  planorm    •  Cross-­‐neuroscience  analy3cs  •  A  reservoir  of  cross-­‐disciplinary   biomedical  data  exper.se    
  16. 16. NIF  services  for  data  providers  •  NIF  ensures  that  all  data  are  discoverable,   accessible  and  understandable   –  If  data  are  already  in  a  database,  NIF  federates  them   •  Aligns  data  to  common  framework   •  Makes  them  collec3vely  searchable   •  Provides  uniform  data  access  services  for  linking  resources   –  If  data  are  not  in  a  database:   •  NIF  locates  a  suitable  database  within  its  federa3on  and   facilitates  inges3on   •  If  no  database  is  available,  NIF  creates  a  reasonable   structure  using  its  database  tools;    stores  data  in  available   data  repositories  (currently  UCSD  CRBS/SDSC)  and  makes  it   available  through  the  NIF  portal   –  Assigns  a  URI  for  data  iden3fica3on   NIF  uses  manual,  semi-­‐automated  and  automated  tools  for  inges3on   and  cura3on  
  17. 17. Registering  a  resource  in  NIF  NIF  provides  a  set  of  tools  and  services  for  easy  sharing  of  data  and  linking  of  data  to  ar3cles,  web  sites  etc.   What  users  are  searching  for:   –  NIF  makes  it  easy  to  add  and  manage   resources  through  NIF   •  Need  to  respect  resource  and  3me   constraints  of  resource  providers   –  Different  levels  of  access   •  NIF  Registry  (basic)   •  NIF  Site  Map   •  NIF  level  2     –  create  web  access  and  basic  structure   for  resources  without  API   –  U3lizes  DISCO  tools  developed  at  Yale   •  NIF  level  3:    Web  service  access,  schema   registra3on  
  18. 18. NIF  Registry  •  NIF  Registry:    each   resource  gets  its  own  URI   and  own  Wiki  page   –  Insert  maps,  Twiher  feeds  •  NIF  site  map:    manage   updates  to  your  resource   page   –  U3lizes  DISCO  protocol   (Luis  Marenco,  Rixin  Wang,   Yale  U)   –  NIF  also  consumes  other   sitemaps  for  bioscience,   e.g.,  Biositemaps  
  19. 19. The  NeuroLex  Wiki:    A  lexicon  for   neuroscience  •  Seman3c  wiki   tracking  >  18,000   neuroscience   concepts  •  Built  from  and  for   NIF  ontologies  •  Supports   integra3on  of  tools   and  widgets  
  20. 20. A  dynamic  index  for  neuroscience   Parts  of  rodent  brain   Parts  of  white  maher   Parts  of  human  brain  
  21. 21. A  Seman*cally  Enabled  Search  Engine  •  NIF  has  developed  a  produc3on  technology  planorm   for  researchers  to  discover,  share,  access,  analyze,   and  integrate  neuroscience-­‐relevant  informa3on   –  Seman3cally-­‐enabled  search  engine  and  interface  that   customizes  results  for  neuroscience   –  System  that  searches  the  “hidden  web”,  i.e.,  content  not  well   served  by  search  engines   –  Automated  data  harves3ng  technologies  that  produce  dynamic   indices  of  data  content  including  databases,  web  pages,  text,   xml  etc.   –  Easy  to  use  tools  to  make  products  and  data  available  •  NIF  has  developed  a  wealth  of  knowledge  about  data   resources  and  data  integra3on  in  the  life  sciences  
  22. 22. NIF  Data  Federa*on   1000   160   NIF  provides  access  to  the  largest  collec3on   of  neuroscience  relevant  data  on  the  web,   140  Number  of  Federated  Records  (Millions)   all  from  a  single  interface  –already  have   100   Number  of  Federated  Databases   surpassed  year  4  cumula3ve  targets   120   100   10   RDP   80   1   60   Resource  Registry:    4700      ...   40   0.1   An3bodies:    935,000   Brain  connec3vity:    66,000   20   Animal  models:    270,000   DISCO   Brain  ac3va3on  foci:    56,000   0.01   0   Jun-­‐08   Dec-­‐08   Jul-­‐09   Jan-­‐10   Aug-­‐10   Feb-­‐11   Sep-­‐11   Apr-­‐12  
  23. 23. NIF  Search  Interface  
  24. 24. NIF  Search  Interface  
  25. 25. Making  common  neuroscience  concepts  computable:     concept-­‐based  queries  •  Search  Google:    GABAergic  neuron  •  Search  NIF:    GABAergic  neuron   –  NIF  automa3cally  searches  for   types  of  GABAergic  neurons  
  26. 26. “Search  compu*ng”  What  genes  are  upregulated  by  drugs  of  abuse   in  the  adult  mouse?   Morphine   Increased   expression   Adult  Mouse   Some  concepts,  e.g.,  age  category,  are  quan3ta3ve  but   s3ll  must  be  interpreted  in  a  global  query  system  
  27. 27. NIF  STANDARD  ONTOLOGIES  (NIFSTD)  •  Set  of  modular  ontologies     Bill  Bug  et  al.   –  Covering    neuroscience  relevant   terminologies   –  Comprehensive  50,000+  dis3nct   concepts  +  synonyms    •  Expressed  in  OWL-­‐DL  language    •  Closely  follows    OBO  community               best  prac3ces     –  As  long  as  they  seem  prac3cal    •  Avoids  duplica3on  of  efforts     –  Standardized  to  the  same  upper  level   ontologies,  e.g.,     –  Basic  Formal  Ontology  (BFO),  OBO   Rela3ons  Ontology  (OBO-­‐RO),   •  Modules  cover  orthogonal  domain       Phonotypical  Quali3es  Ontology  (PATO)   –  Relies  on  exis3ng  community  ontologies   e.g.  ,  Brain  Regions,  Cells,  Molecules,                      e.g.,  CHEBI,  GO,  PRO,  OBI  etc.   Subcellular  parts,  Diseases,  Nervous   system  func3ons,  etc.  
  28. 28. Data  Services  for  Users  Vocabulary    •  NITRC  (autocomplete)  •  Neuroscience.com  (annotate)  •  INCF  Atlasing  tools  Data  Summary  (NIF  Navigator)  •  NIDA,  Blueprint  •  NeuroLex  Individual  Data  Sources  •  DOMEO  •  OneMind  •  Eagle  I   Current  DISCO  Services  (LinkOut)     Planned  •  PubMed    
  29. 29. NIF  Link  Out  Broker:    Connec*ng   Resources   NIF  inserted  >  800,000  references  to  Pub  Med   ID’s   NIF  inserts  links  between  data  and  ar3cles  on  behalf  of  data  providers   using  NCBI’s  Link  Out  feature  
  30. 30. Grabbing  the  long  tail  of  small  data  •  Analysis  of  NIF  shows  mul3ple  databases  with   similar  scope  and  content  •  Many  contain  par3ally  overlapping  data  •  Data  “flows”  from  one  resource  to  the  next   –  Data  is  reinterpreted,  reanalyzed  or  added  to   –  When  does  it  become  something  else?  •  Is  duplica3on  good  or  bad?  
  31. 31. NIF  Analy*cs:    The  Neuroscience  Ecosystem   Where  are  the  data?   Striatum   Brain   Hypothalamus   Olfactory  bulb   Data  source  Brain  region   Cerebral  cortex   NIF  is  in  a  unique  posi3on  to  answer  ques3ons  about  the  neuroscience   ecosystem  
  32. 32. How  much  of  the  landscape  do  we  have?   Query  for  “reference”  brain  structures  and   their  parts  in  NIF  Connec*vity  database  
  33. 33. Embracing  duplica*on:    Data  Mash  ups   •   ~300  PMID’s  were  common  between  Brede  and  SUMSdb   •   Same  informa3on;    value  added   Same  data  -­‐    different  aspects  
  34. 34. Same  data:    different  analysis  •  Drug  Related  Gene  database:     Chronic  vs  acute  morphine  in   extracted  statements  from  figures,   striatum   tables  and  supplementary  data   from  published  ar3cle  •  Gemma:    Reanalyzed  microarray   results  from  GEO  using  different   algorithms  •  Both  provide  results  of  increased   or  decreased  expression  as  a   func3on  of  experimental   paradigm   –  4  strains  of  mice   Mined  NIF  for  all  references  to  GEO   –  3  condi3ons:    chronic  morphine,   ID’s:    found  small  number  where  the   acute  morphine,  saline   same  dataset  was  represented  in  two   or  more  databases   hhp://www.chibi.ubc.ca/Gemma/home.html  
  35. 35. How  easy  was  it  to  compare?  •  Gemma:    Gene  ID    +  Gene  Symbol  •  DRG:    Gene  name  +  Probe  ID    •  Gemma:    Increased  expression/decreased  expression   NIF  annota3on  •  DRG:    Increased  expression/decreased  expression   standard   –  But...Gemma  presented  results  rela3ve  to  baseline  chronic  morphine;    DRG   with  respect  to  saline,  so  direc3on  of  change  is  opposite  in  the  2  databases  •  Analysis:   –  1370  statements  from  Gemma  regarding  gene  expression  as  a  func3on  of   chronic  morphine   –  617  were  consistent  with  DRG;    à  over  half    of  the  claims  of  the  paper   were  not  confirmed  in  this  analysis   –  Results  for  1  gene  were  opposite  in  DRG  and  Gemma   –  45  did  not  have  enough  informa3on  provided  in  the  paper  to  make  a   judgment    
  36. 36. A  global  view  of  data  Informa*cs  should  not  be  an  aherthought   –  You  (and  the  machine)  have  to  be  able  to  find  it   •  Accessible  through  the  web   •  Annota3ons   –  You  have  to  be  able  to  use  it   •  Data  type  specified  and  in  a  usable  form   –  You  have  to  know  what  the  data  mean   – Some  seman3cs   – Context:    Experimental  metadata   – Provenance:    Where  did  the  data  come  from?   Repor3ng  neuroscience  data  within  a  consistent  framework  helps  enormously  
  37. 37. Compe**on   Coopera*on   Coordina*on   Collabora*on  •  We  live  in  a  linked  world:  “  Too  Big  to   Know”  •  Mul3ple  efforts  are  underway   simultaneously   –  Launched  without  knowledge  of   others   –  Mine  is  beher  /  Not  Invented  Here  •  Coopera3on  and  coordina3on  will  allow   us  to  move  forward  faster   –  NIF  has  tried  to  be  a  good  ci3zen  by   sharing  exper3se,  data,  knowledge,   tools  
  38. 38. NIF  team  (past  and  present)  Maryann  Martone,  UCSD,  Principal  Inves3gator   Vadim  Astakhov  Jeffrey  Grethe,  UCSD,  Co  Inves3gator   Davis  Banks  Amarnath  Gupta,  UCSD,  Co  Inves3gator   Bill  Bug  Anita  Bandrowski,  NIF  Project  Leader   Jonathan  Cachat  Gordon  Shepherd,  Yale  University   Chris  Condit  Perry  Miller   Mark  Ellisman  Luis  Marenco   Lee  Hornbrook  Rixin  Wang   Fahim  Imam  David  Van  Essen,  Washington  University   Stephen  Larson  Erin  Reid   Jennifer  Lawrence  Paul  Sternberg,  Cal  Tech   Cliff  Lee  Arun  Rangarajan   Larry  Lui  Hans  Michael  Muller   Sarah  Maynard  Yuling  Li   Binh  Ngo  Giorgio  Ascoli,  George  Mason  University   Andrea  Arnaud  Stagg  Sridevi  Polavarum   Xufei  Qian  Tim  Clark,  Harvard  University   Willie  Wong  Paolo  Ciccarese           Jonathan  Pollock,  NIH,  Program  Officer   Karen  Skinner,  NIH,  Program  Officer  
  39. 39. Thank  You…  

×