SlideShare a Scribd company logo
1 of 75
Download to read offline
Maryann	
  E.	
  	
  Martone,	
  Ph.	
  D.	
  
University	
  of	
  California,	
  San	
  Diego	
  
INCF	
  Neuroinforma>cs	
  Short	
  Course,	
  Stockholm,	
  August	
  2013	
  
•  Introduc>on	
  
•  Introduc>on	
  to	
  the	
  Neuroscience	
  Informa>on	
  
Framework	
  
•  Structured	
  informa>on:	
  	
  data,	
  databases	
  
•  Federa>ng	
  neuroscience-­‐relevant	
  databases	
  
•  Informa>on	
  frameworks	
  
•  Ontologies	
  
•  What	
  can	
  we	
  do	
  with	
  informa>on	
  in	
  the	
  NIF?	
  
•  Conclusions	
  
Scholar	
  
Library	
  
Scholar	
  
Publisher	
  
FORCE11.org:	
  	
  Future	
  of	
  research	
  communica>ons	
  and	
  e-­‐scholarship	
  
Scholar	
  
Consumer	
  
Libraries	
  
Data	
  Repositories	
  
Code	
  Repositories	
  
Community	
  databases/
plaRorms	
  
OA	
  
Curators	
  
Social	
  
Networks	
  
Social	
  
Networks	
  Social	
  
Networks	
  
Peer	
  Reviewers	
  
Narra>ve	
  
Workflows	
  
Data	
  
Models	
  
Mul>media	
  
Nanopublica>ons	
  
Code	
  
hTp://neuinfo.org	
  
•  NIF’s	
  mission	
  is	
  to	
  maximize	
  the	
  awareness	
  of,	
  access	
  to	
  
and	
  u>lity	
  of	
  research	
  resources	
  produced	
  worldwide	
  to	
  
enable	
  beTer	
  science	
  and	
  promote	
  efficient	
  use	
  
–  NIF	
  unites	
  neuroscience	
  informa>on	
  without	
  respect	
  to	
  domain,	
  
funding	
  agency,	
  ins>tute	
  or	
  community	
  
–  NIF	
  is	
  like	
  a	
  “Pub	
  Med”	
  for	
  all	
  biomedical	
  resources	
  and	
  a	
  “Pub	
  
Med	
  Central”	
  for	
  databases	
  
–  Makes	
  them	
  searchable	
  from	
  a	
  single	
  interface	
  
–  Prac>cal	
  and	
  cost-­‐effec>ve;	
  	
  tries	
  to	
  be	
  sensible	
  
–  Learned	
  a	
  lot	
  about	
  the	
  effec0ve	
  data	
  sharing	
  	
  
The	
  Neuroscience	
  Informa>on	
  Framework	
  is	
  an	
  ini>a>ve	
  of	
  the	
  
NIH	
  Blueprint	
  consor>um	
  of	
  ins>tutes	
  	
  	
  	
  hTp://neuinfo.org	
  
We’d	
  like	
  to	
  be	
  able	
  to	
  find:	
  
•  What	
  is	
  known****:	
  
–  What	
  are	
  the	
  projec>ons	
  of	
  hippocampus?	
  
–  Is	
  GRM1	
  expressed	
  In	
  cerebral	
  cortex?	
  
–  What	
  genes	
  have	
  been	
  found	
  to	
  be	
  upregulated	
  in	
  
chronic	
  drug	
  abuse	
  in	
  adults	
  
–  What	
  animal	
  models	
  have	
  similar	
  phenotypes	
  to	
  
Parkinson’s	
  disease?	
  
–  What	
  studies	
  used	
  my	
  polyclonal	
  an>body	
  against	
  
GABA	
  in	
  humans?	
  
•  What	
  is	
  not	
  known:	
  
–  Connec>ons	
  among	
  data	
  
–  Gaps	
  in	
  knowledge	
  
A	
  framework	
  makes	
  it	
  easier	
  to	
  address	
  these	
  ques>ons	
  
Databases and Ontologies:  Where do we go from here?
Neuroscience	
  is	
  unlikely	
  to	
  be	
  served	
  by	
  
a	
  few	
  large	
  databases	
  like	
  the	
  genomics	
  
and	
  proteomics	
  community	
  
Whole	
  brain	
  data	
  
(20	
  um	
  
microscopic	
  MRI)	
  
Mosiac	
  LM	
  
images	
  (1	
  GB+)	
  
Conven>onal	
  LM	
  
images	
  
Individual	
  cell	
  
morphologies	
  
EM	
  volumes	
  &	
  
reconstruc>ons	
  
Solved	
  molecular	
  
structures	
  
No	
  single	
  technology	
  serves	
  
these	
  all	
  equally	
  well.	
  
 Mul0ple	
  data	
  types;	
  	
  
mul0ple	
  scales;	
  	
  mul0ple	
  
databases	
  
•  Data	
  warehouse:	
  	
  May	
  contain	
  data	
  from	
  diverse	
  
sources;	
  	
  schemas	
  are	
  integrated.	
  	
  Data	
  are	
  “cleaned”	
  
to	
  fit	
  unified	
  data	
  model.	
  	
  One	
  database	
  to	
  rule	
  them	
  
all...	
  
•  Data	
  federa>on:	
  	
  a	
  virtual	
  database	
  that	
  stores	
  
data	
  defini>ons	
  and	
  not	
  the	
  data	
  itself.	
  The	
  virtual	
  
database	
  will	
  have	
  informa>on	
  about	
  the	
  loca>on	
  of	
  
the	
  data.	
  	
  When	
  a	
  single	
  call	
  is	
  made	
  to	
  a	
  virtual	
  
database,	
  the	
  technology	
  ensures	
  mul>ple	
  calls	
  to	
  
underlying	
  databases	
  and	
  is	
  also	
  responsible	
  for	
  
meaningfully	
  aggrega>ng	
  the	
  returned	
  result	
  sets.	
  
From	
  wikipedia	
  and	
  hTp://www.infosysblogs.com/oracle/2010/01/
data_federa>on_a_potent_subst_1.html	
  
Subject	
  	
  473	
  
•  Species:	
  	
  mouse	
  (string)	
  
•  Age:	
  	
  50	
  days	
  (integer)	
  
•  Age	
  category:	
  	
  adult	
  
•  Protocol:	
  	
  2	
  
Rela0onal	
  Database	
  
“Mice	
  (aged	
  50	
  days)	
  were	
  perfused	
  with	
  
4%	
  paraformaldehyde	
  and	
  brains	
  were	
  
sec>oned	
  at	
  a	
  thickness	
  of	
  50	
  um.	
  	
  
Sec>ons	
  were	
  labeled	
  using	
  an>bodies	
  
against	
  calbindin	
  and	
  imaged	
  on	
  a	
  Zeiss	
  	
  
confocal	
  microscope.”	
  	
  	
  
Data	
  model;	
  	
  data	
  types,	
  formal	
  query	
  
language	
  
Free	
  text	
  
En>ty	
  recogni>on;	
  Natural	
  language	
  
processing	
  
∞	
  
What	
  is	
  easily	
  machine	
  
processable	
  and	
  accessible	
  
What	
  is	
  poten>ally	
  knowable	
  
What	
  is	
  known:	
  
Literature,	
  images,	
  human	
  
knowledge	
  
Unstructured;	
  	
  
Natural	
  language	
  
processing,	
  en>ty	
  
recogni>on,	
  image	
  
processing	
  and	
  
analysis;	
  paywalls	
  
communica>on	
  
Abstracts	
  vs	
  full	
  
text	
  vs	
  tables	
  etc	
  
hGp://neuinfo.org	
  
June10,	
  2013	
   dkCOIN	
  Inves>gator's	
  Retreat	
   13	
  
•  A	
  portal	
  for	
  finding	
  and	
  using	
  
neuroscience	
  resources	
  
  A	
  consistent	
  framework	
  for	
  
describing	
  resources	
  
  Provides	
  simultaneous	
  
search	
  of	
  mul>ple	
  types	
  of	
  
informa>on,	
  organized	
  by	
  
category	
  
  Supported	
  by	
  an	
  expansive	
  
ontology	
  for	
  neuroscience	
  
  U>lizes	
  advanced	
  
technologies	
  to	
  search	
  the	
  
“hidden	
  web”	
  
UCSD,	
  Yale,	
  Cal	
  Tech,	
  George	
  Mason,	
  Washington	
  Univ	
  
Literature	
  
Database	
  
Federa>on	
  
Registry	
  
Databases and Ontologies:  Where do we go from here?
With	
  the	
  thousands	
  of	
  databases	
  and	
  other	
  informa>on	
  sources	
  
available,	
  simple	
  descrip>ve	
  metadata	
  will	
  not	
  suffice	
  
• NIF	
  curators	
  
• Nomina>on	
  by	
  the	
  
community	
  
• Semi-­‐automated	
  text	
  
mining	
  pipelines	
  
 NIF	
  Registry	
  
 Requires	
  no	
  special	
  
skills	
  
 Site	
  map	
  available	
  
for	
  local	
  hos>ng	
  
• NIF	
  Data	
  Federa>on	
  
• DISCO	
  interop	
  
• Requires	
  some	
  
programming	
  skill	
  
• Open	
  Source	
  Brain	
  <	
  
2	
  hr	
  
Low	
  barrier	
  to	
  entry;	
  	
  incremental	
  refinement	
  
NIF	
  was	
  designed	
  to	
  be	
  populated	
  rapidly	
  
with	
  progressive	
  refinement	
  
Databases	
  come	
  in	
  many	
  shapes	
  and	
  sizes	
  
•  Primary	
  data:	
  
–  Data	
  available	
  for	
  reanalysis,	
  e.g.,	
  
microarray	
  data	
  sets	
  from	
  GEO;	
  	
  
brain	
  images	
  from	
  XNAT;	
  	
  
microscopic	
  images	
  (CCDB/CIL)	
  
•  Secondary	
  data	
  
–  Data	
  features	
  extracted	
  through	
  
data	
  processing	
  and	
  some>mes	
  
normaliza>on,	
  e.g,	
  brain	
  structure	
  
volumes	
  (IBVD),	
  gene	
  expression	
  
levels	
  (Allen	
  Brain	
  Atlas);	
  	
  brain	
  
connec>vity	
  statements	
  (BAMS)	
  
•  Ter>ary	
  data	
  
–  Claims	
  and	
  asser>ons	
  about	
  the	
  
meaning	
  of	
  data	
  
•  E.g.,	
  gene	
  upregula>on/
downregula>on,	
  brain	
  
ac>va>on	
  as	
  a	
  func>on	
  of	
  task	
  
•  Registries:	
  
–  Metadata	
  
–  Pointers	
  to	
  data	
  sets	
  or	
  
materials	
  stored	
  elsewhere	
  
•  Data	
  aggregators	
  
–  Aggregate	
  data	
  of	
  the	
  same	
  
type	
  from	
  mul>ple	
  sources,	
  
e.g.,	
  Cell	
  Image	
  
Library	
  ,SUMSdb,	
  Brede	
  
•  Single	
  source	
  
–  Data	
  acquired	
  within	
  a	
  single	
  
context	
  ,	
  e.g.,	
  Allen	
  Brain	
  Atlas	
  
Researchers	
  are	
  producing	
  a	
  variety	
  of	
  
informa>on	
  ar>facts	
  using	
  a	
  mul>tude	
  of	
  
technologies	
  
• Data:	
  values	
  of	
  qualita>ve	
  or	
  quan>ta>ve	
  variables,	
  belonging	
  to	
  a	
  set	
  of	
  items...	
  oten	
  
the	
  results	
  of	
  measurements	
  (Wikipedia)	
  
• Metadata:	
  	
  “Data	
  about	
  data”	
  
• Structural	
  metadata:	
  
• the	
  design	
  and	
  specifica>on	
  of	
  data	
  structures	
  and	
  is	
  more	
  properly	
  called	
  
"data	
  about	
  the	
  containers	
  of	
  data”	
  (Wikipedia)	
  
• e.g.,	
  image	
  size,	
  bit	
  depth,	
  integer	
  vs	
  string	
  
• Descrip>ve	
  metadata:	
  	
  	
  
• individual	
  instances	
  of	
  applica>on	
  data,	
  the	
  data	
  content	
  “data	
  about	
  data	
  
content”	
  
• e.g.,	
  creator,	
  subject,	
  	
  
• Data	
  type:	
  	
  the	
  form	
  of	
  the	
  data	
  for	
  the	
  purposes	
  of	
  data	
  opera>ons	
  
• Data	
  Integra>on:	
  combining	
  data	
  residing	
  in	
  different	
  sources	
  and	
  providing	
  users	
  
with	
  a	
  unified	
  view	
  of	
  these	
  data	
  
“Metadata	
  are	
  data”	
  -­‐Wikipedia	
  
0	
  
50	
  
100	
  
150	
  
200	
  
250	
  
0.01	
  
0.1	
  
1	
  
10	
  
100	
  
1000	
  
6-­‐12	
   12-­‐12	
   7-­‐13	
   1-­‐14	
   8-­‐14	
   2-­‐15	
   9-­‐15	
   4-­‐16	
   10-­‐16	
   5-­‐17	
  
Number	
  of	
  Federated	
  Databases	
  
Number	
  of	
  Federated	
  Records	
  (Millions)	
  
NIF	
  searches	
  the	
  largest	
  colla>on	
  of	
  
neuroscience-­‐relevant	
  data	
  on	
  the	
  web	
  
DISCO	
  
June10,	
  2013	
   dkCOIN	
  Inves>gator's	
  Retreat	
   20	
  
•  Long	
  tail	
  data:	
  	
  large	
  numbers	
  of	
  small	
  data	
  sets	
  
hTp://en.wikipedia.org/wiki/Long_tail	
  
Hippocampus	
  OR	
  “Cornu	
  Ammonis”	
  OR	
  
“Ammon’s	
  horn”	
   Query	
  expansion:	
  	
  Synonyms	
  
and	
  related	
  concepts	
  
Boolean	
  queries	
  
Data	
  sources	
  
categorized	
  by	
  
“data	
  type”	
  and	
  
level	
  of	
  nervous	
  
system	
  
Common	
  views	
  
across	
  mul>ple	
  
sources	
  
Tutorials	
  for	
  using	
  
full	
  resource	
  when	
  
ge{ng	
  there	
  from	
  
NIF	
  
Link	
  back	
  to	
  
record	
  in	
  
original	
  source	
  
Connects	
  to	
  
Synapsed	
  with	
  
Synapsed	
  by	
  
Input	
  region	
  
innervates	
  
Axon	
  innervates	
  
Projects	
  to	
  Cellular	
  contact	
  
Subcellular	
  contact	
  
Source	
  site	
  
Target	
  	
  site	
  
Each	
  resource	
  implements	
  a	
  different,	
  though	
  related	
  model;	
  	
  
systems	
  are	
  complex	
  and	
  difficult	
  to	
  learn,	
  in	
  many	
  cases	
  
Databases and Ontologies:  Where do we go from here?
•  Current	
  web	
  is	
  
designed	
  to	
  share	
  
documents	
  
–  Documents	
  are	
  
unstructured	
  data	
  
•  Much	
  of	
  the	
  content	
  
of	
  digital	
  resources	
  is	
  
part	
  of	
  the	
  “hidden	
  
web”	
  
•  Wikipedia:	
  	
  The	
  Deep	
  Web	
  
(also	
  called	
  Deepnet,	
  the	
  
invisible	
  Web,	
  DarkNet,	
  
Undernet	
  or	
  the	
  hidden	
  
Web)	
  refers	
  to	
  
World	
  Wide	
  Web	
  content	
  
that	
  is	
  not	
  part	
  of	
  the	
  
Surface	
  Web,	
  which	
  is	
  
indexed	
  by	
  standard	
  
search	
  engines.	
  
Even	
  Google	
  needs	
  a	
  knowledge	
  framework	
  
Knowledge	
  in	
  space	
  and	
  spa>al	
  rela>onships	
  
(the	
  “where”)	
  
Knowledge	
  in	
  words,	
  terminologies	
  and	
  
logical	
  rela>onships	
  (the	
  “what”)	
  
Purkinje	
  
Cell	
  
Axon	
  
Terminal	
  
Axon	
  
Dendri>c	
  
Tree	
  
Dendri>c	
  
Spine	
  
Dendrite	
  
Cell	
  body	
  
Cerebellar	
  
cortex	
  
There	
  is	
  liTle	
  obvious	
  connec>on	
  between	
  
data	
  sets	
  taken	
  at	
  different	
  scales	
  using	
  
different	
  microscopies	
  without	
  an	
  explicit	
  
representa>on	
  of	
  the	
  biological	
  objects	
  that	
  
the	
  data	
  represent	
  
•  NIF	
  covers	
  mul>ple	
  structural	
  scales	
  and	
  domains	
  of	
  relevance	
  to	
  neuroscience	
  
•  Aggregate	
  of	
  community	
  ontologies	
  with	
  some	
  extensions	
  for	
  neuroscience,	
  e.g.,	
  Gene	
  
Ontology,	
  Chebi,	
  Protein	
  Ontology	
  
NIFSTD	
  
Organism	
  
NS	
  Func>on	
  Molecule	
   Inves>ga>on	
  
Subcellular	
  
structure	
  
Macromolecule	
   Gene	
  
Molecule	
  Descriptors	
  
Techniques	
  
Reagent	
   Protocols	
  
Cell	
  
Resource	
   Instrument	
  
Dysfunc>on	
   Quality	
  
Anatomical	
  
Structure	
  
Brain	
  
Cerebellum	
  
Purkinje	
  Cell	
  Layer	
  
Purkinje	
  cell	
  
neuron	
  
has	
  a	
  
has	
  a	
  
has	
  a	
  
is	
  a	
  
•  Ontology:	
  an	
  explicit,	
  formal	
  
representa>on	
  of	
  concepts	
  	
  
rela>onships	
  among	
  them	
  within	
  
a	
  par>cular	
  domain	
  that	
  
expresses	
  human	
  knowledge	
  in	
  a	
  
machine	
  readable	
  form	
  
•  Branch	
  of	
  philosophy:	
  	
  a	
  theory	
  
of	
  what	
  is	
  
•  e.g.,	
  Gene	
  ontologies	
  
•  Express	
  neuroscience	
  concepts	
  in	
  a	
  way	
  that	
  is	
  machine	
  readable	
  	
  
–  Synonyms,	
  lexical	
  variants	
  
–  Defini>ons	
  
•  Provide	
  means	
  of	
  disambigua>on	
  of	
  strings	
  
–  Nucleus	
  part	
  of	
  cell;	
  	
  nucleus	
  part	
  of	
  brain;	
  	
  nucleus	
  part	
  of	
  atom	
  
•  Rules	
  by	
  which	
  a	
  class	
  is	
  defined,	
  e.g.,	
  a	
  GABAergic	
  neuron	
  is	
  neuron	
  that	
  releases	
  GABA	
  as	
  a	
  
neurotransmiTer	
  
•  Proper>es	
  
–  Support	
  reasoning	
  
•  Provide	
  universals	
  for	
  naviga>ng	
  across	
  different	
  data	
  sources	
  
–  Seman>c	
  “index”	
  
–  Link	
  data	
  through	
  rela>onships	
  not	
  just	
  one-­‐to-­‐one	
  mappings	
  
•  Provide	
  the	
  basis	
  for	
  concept-­‐based	
  queries	
  to	
  probe	
  and	
  mine	
  data	
  
•  Establish	
  a	
  seman>c	
  framework	
  for	
  landscape	
  analysis	
  
Mathema>cs,	
  Computer	
  code	
  or	
  Esperanto	
  
June10,	
  2013	
   32	
  
Aligns	
  sources	
  to	
  the	
  NIF	
  seman>c	
  framework	
  
Databases and Ontologies:  Where do we go from here?
birnlex_1741	
   Brodmann.10	
  
Explicit	
  mapping	
  of	
  database	
  content	
  helps	
  disambiguate	
  non-­‐unique	
  and	
  custom	
  
terminology	
  
birnlex_1204	
   CA3	
  
•  Search	
  Google:	
  	
  GABAergic	
  neuron	
  
•  Search	
  NIF:	
  	
  GABAergic	
  neuron	
  
–  NIF	
  automa>cally	
  searches	
  for	
  types	
  of	
  
GABAergic	
  neurons	
  
Types	
  of	
  GABAergic	
  
neurons	
  
Neuroscience Information Framework – http://neuinfo.org
Equivalence	
  classes;	
  	
  restric>ons	
  
Arbitrary	
  but	
  defensible	
  
• Neurons	
  classified	
  by	
  
• Circuit	
  role:	
  	
  principal	
  neuron	
  vs	
  
interneuron	
  
• Molecular	
  cons>tuent:	
  	
  Parvalbumin-­‐
neurons,	
  calbindin-­‐neurons	
  
• Brain	
  region:	
  	
  Cerebellar	
  neuron	
  
• Morphology:	
  	
  Spiny	
  neuron	
  
• 	
  Molecule	
  Roles:	
  	
  Drug	
  of	
  abuse,	
  anterograde	
  
tracer,	
  retrograde	
  tracer	
  
• Brain	
  parts:	
  	
  Circumventricular	
  organ	
  
• Organisms:	
  	
  Non-­‐human	
  primate,	
  non-­‐human	
  
vertebrate	
  
• Quali>es:	
  	
  Expression	
  level	
  
• Techniques:	
  	
  Neuroimaging	
  
What	
  genes	
  are	
  upregulated	
  by	
  drugs	
  of	
  abuse	
  in	
  the	
  
adult	
  mouse?	
  (show	
  me	
  the	
  data!)	
  
Morphine	
  
Increased	
  
expression	
  
Adult	
  Mouse	
  
• NIF	
  Connec>vity:	
  	
  7	
  databases	
  containing	
  connec>vity	
  primary	
  data	
  or	
  claims	
  
from	
  literature	
  on	
  connec>vity	
  between	
  brain	
  regions	
  
• Brain	
  Architecture	
  Management	
  System	
  (rodent)	
  
• Temporal	
  lobe.com	
  (rodent)	
  
• Connectome	
  Wiki	
  (human)	
  
• Brain	
  Maps	
  (various)	
  
• CoCoMac	
  (primate	
  cortex)	
  
• UCLA	
  Mul>modal	
  database	
  (Human	
  fMRI)	
  
• Avian	
  Brain	
  Connec>vity	
  Database	
  (Bird)	
  
• Total:	
  	
  1800	
  unique	
  brain	
  terms	
  (excluding	
  Avian)	
  
• Number	
  of	
  exact	
  terms	
  used	
  in	
  >	
  1	
  database:	
  	
  42	
  
• Number	
  of	
  synonym	
  matches:	
  	
  99	
  
• Number	
  of	
  1st	
  order	
  partonomy	
  matches:	
  	
  385	
  
•  Realism	
  vs	
  conceptualism	
  
•  Controlled	
  vocabularies	
  vs	
  taxonomies	
  vs	
  ontology?	
  
•  How	
  do	
  I	
  name	
  classes?	
  
•  Shared	
  vs	
  custom	
  ontologies	
  
•  Single	
  vs	
  mul>ple	
  inheritance	
  
•  RDF	
  vs	
  OWL?	
  
•  Top	
  down	
  vs	
  boTom	
  up:	
  	
  heavy	
  weight	
  vs	
  light	
  
weight	
  ontologies	
  
•  Should	
  I	
  encode	
  everything	
  in	
  my	
  ontology?	
  
Many	
  schools	
  of	
  thought	
  about	
  ontologies-­‐their	
  construc>on	
  
and	
  use	
  
•  Controlled	
  vocabularies:	
  prescribed	
  
list	
  of	
  terms	
  or	
  headings	
  each	
  one	
  having	
  
an	
  assigned	
  meaning	
  
•  Lexicon/Thesaurus:	
  Vocabularies	
  +	
  
their	
  lexical	
  proper>es,	
  e.g.,	
  synonyms,	
  
lexical	
  variants	
  
•  Taxonomy:	
  	
  monohierarchical	
  
classifica>on	
  of	
  concepts,	
  as	
  used,	
  for	
  
example,	
  in	
  the	
  classifica>on	
  of	
  biological	
  
organisms,	
  built	
  on	
  the	
  “is	
  a	
  “	
  rela>onship	
  
•  	
  Ontology:	
  	
  specifica>on	
  of	
  the	
  concepts	
  
of	
  a	
  domain	
  and	
  their	
  rela>onships,	
  
structured	
  to	
  allow	
  computer	
  processing	
  
and	
  reasoning	
  	
  
hTp://www.willpowerinfo.co.uk/glossary.htm	
   Mike	
  Bergman	
  
•  Iden>ty:	
  
–  En>>es	
  are	
  uniquely	
  iden>fiable	
  
–  Name	
  is	
  a	
  meaningless	
  numerical	
  iden>fier	
  (URI:	
  	
  Uniform	
  resource	
  iden>fier)	
  
–  Any	
  number	
  of	
  human	
  readable	
  labels	
  can	
  be	
  assigned	
  to	
  it	
  
•  Defini>on:	
  	
  	
  
–  Genera:	
  	
  is	
  a	
  type	
  of	
  (cell,	
  anatomical	
  structure,	
  cell	
  part)	
  
–  Differen>a:	
  	
  “has	
  a”	
  A	
  set	
  of	
  proper>es	
  that	
  dis>nguish	
  among	
  members	
  of	
  that	
  
class	
  
–  Can	
  include	
  necessary	
  and	
  sufficient	
  condi>ons	
  
•  Implementa>on:	
  	
  How	
  is	
  this	
  defini>on	
  expressed	
  
–  Depending	
  on	
  the	
  nature	
  of	
  the	
  concept	
  or	
  en>ty	
  and	
  the	
  needs	
  of	
  the	
  
informa>on	
  system,	
  we	
  can	
  say	
  more	
  or	
  fewer	
  things	
  
–  Different	
  languages;	
  	
  can	
  express	
  different	
  things	
  about	
  the	
  concept	
  that	
  can	
  be	
  
computed	
  upon	
  
•  OWL	
  W3C	
  standard,	
  RDF	
  
birnlex_1362	
   CA2	
  
CHEBI_29108	
   CA2	
  
NIF	
  follows	
  OBO	
  Foundry	
  best	
  prac>ces	
  for	
  naming	
  and	
  defining	
  
classes	
  
•  XML:	
  	
  Extensible	
  Mark	
  Up	
  language:	
  	
  	
  Mark	
  up	
  language	
  for	
  data.	
  	
  XML	
  itself	
  is	
  not	
  very	
  
much	
  concerned	
  with	
  meaning.	
  XML	
  nodes	
  don't	
  need	
  to	
  be	
  associated	
  with	
  par>cular	
  
concepts,	
  and	
  the	
  XML	
  standard	
  doesn't	
  indicate	
  how	
  to	
  derive	
  a	
  fact	
  from	
  a	
  document.	
  
•  RDF:	
  	
  Resource	
  Descrip>on	
  Framework:	
  	
  a	
  general	
  method	
  to	
  decompose	
  knowledge	
  into	
  
small	
  pieces,	
  with	
  some	
  rules	
  about	
  the	
  seman>cs,	
  or	
  meaning,	
  of	
  those	
  pieces.	
  What	
  sets	
  
RDF	
  apart	
  from	
  XML	
  is	
  that	
  RDF	
  is	
  designed	
  to	
  represent	
  knowledge	
  in	
  a	
  distributed	
  world.	
  
That	
  RDF	
  is	
  designed	
  for	
  knowledge,	
  and	
  not	
  data,	
  means	
  RDF	
  is	
  par>cularly	
  concerned	
  
with	
  meaning.	
  
–  Small	
  pieces	
  are	
  called	
  “triples”:	
  	
  Subject	
  predicate	
  object	
  
–  Purkinje	
  neuron	
  (S)	
  has	
  neurotransmiDer	
  (P)	
  GABA	
  (O)	
  
•  RDFS	
  -­‐	
  a	
  method	
  of	
  specifying	
  metadata	
  about	
  proper>es/characteris>cs	
  of	
  things	
  and	
  
classes	
  of	
  things	
  such	
  that	
  inference	
  an	
  be	
  carried	
  out	
  (conceptualized	
  in	
  RDF)	
  
•  OWL	
  (Web	
  Ontology	
  Language)	
  -­‐	
  a	
  more	
  complex(/powerful)	
  extension	
  of	
  RDFS	
  
•  SPARQL	
  -­‐	
  Is	
  a	
  query	
  language	
  designed	
  for	
  RDF	
  (similar	
  to	
  how	
  SQL	
  was	
  designed	
  for	
  
rela>onal	
  databases)	
  
hTp://answers.seman>cweb.com/ques>ons/15215/whats-­‐the-­‐difference-­‐between-­‐using-­‐rdfsowl-­‐
versus-­‐xml	
   hTp://www.rdfabout.com/intro/#Introducing%20RDF	
  
Rela>onal	
  model	
  
• Mouse	
  has	
  age	
  50	
  days	
  
• Protocol	
  uses	
  instrument	
  confocal	
  
microscope	
  
• A	
  confocal	
  imaging	
  protocol	
  is	
  a	
  protocol	
  
that	
  uses	
  instrument	
  confocal	
  microscope	
  
RDF:	
  	
  The	
  computer	
  doesn't	
  need	
  to	
  know	
  what	
  
has	
  actually	
  means	
  in	
  English	
  for	
  this	
  to	
  be	
  useful.	
  
It	
  is	
  let	
  up	
  to	
  the	
  applica>on	
  writer	
  to	
  choose	
  
appropriate	
  names	
  for	
  things	
  (confocal	
  
microscope)	
  and	
  to	
  use	
  the	
  right	
  predicates	
  (uses,	
  
has).	
  RDF	
  tools	
  are	
  ignorant	
  of	
  what	
  these	
  names	
  
mean,	
  but	
  they	
  can	
  s>ll	
  usefully	
  process	
  the	
  
informa>on.-­‐hTp://www.rdfabout.com/intro/
#Introducing%20RDF	
  
May	
  link	
  to	
  other	
  informa>on,	
  e.g.,	
  mouse	
  is	
  
a	
  rodent	
  
The	
  thalamus	
  projects	
  to	
  the	
  cortex	
  in	
  mammals	
  
•  Universal:	
  allValuesFrom:	
  	
  If	
  a	
  mammal	
  has	
  a	
  cortex	
  and	
  a	
  
thalamus,	
  then	
  the	
  thalamus	
  must	
  project	
  to	
  the	
  cortex	
  
•  Existen>al:	
  	
  SomeValuesFrom:	
  	
  The	
  thalamus	
  projects	
  to	
  
the	
  cortex	
  in	
  at	
  least	
  one	
  member	
  of	
  the	
  class	
  mammal	
  
•  Disjointness:	
  	
  owl:disjointWith:	
  a	
  member	
  of	
  one	
  class	
  
cannot	
  simultaneously	
  be	
  an	
  instance	
  of	
  a	
  specified	
  other	
  
class:	
  	
  Rep>les	
  are	
  disjoint	
  from	
  mammals	
  
W3C	
  OWL	
  guide:	
  	
  www.w3.org/TR/2004/REC-­‐owl-­‐guide-­‐20040210/	
  
Restric>ons	
  places	
  on	
  classes	
  allow	
  us	
  to	
  reason	
  
over	
  the	
  ontology	
  and	
  check	
  for	
  consistency	
  
46	
  
1.  Look	
  brain	
  region	
  up	
  in	
  NeuroLex	
  
2.  Look	
  up	
  cells	
  contained	
  in	
  the	
  brain	
  
region	
  
3.  Find	
  those	
  cells	
  that	
  are	
  known	
  to	
  project	
  
out	
  of	
  that	
  brain	
  region	
  
4.  Look	
  up	
  the	
  neurotransmiTers	
  for	
  those	
  
cells	
  
5.  Determine	
  whether	
  those	
  
neurotransmiTers	
  are	
  known	
  to	
  be	
  
excitatory	
  or	
  inhibitory	
  
6.  Report	
  the	
  projec>on	
  as	
  excitatory	
  or	
  
inhibitory,	
  and	
  report	
  the	
  en>re	
  chain	
  of	
  
logic	
  with	
  links	
  back	
  to	
  the	
  wiki	
  pages	
  
where	
  they	
  were	
  made	
  
7.  Make	
  sure	
  user	
  can	
  get	
  back	
  to	
  each	
  
statement	
  in	
  the	
  logic	
  chain	
  to	
  edit	
  it	
  if	
  
they	
  think	
  it	
  is	
  wrong	
  
Stephen	
  Larson	
  
CHEBI:18243	
  
Brain	
  
Cerebellum	
  
Cortex	
  
Cerebellar	
  Purkinje	
  
cell	
  
Purkinje	
  neuron	
  
Purkinje	
  cell	
  
soma	
  
Purkinje	
  cell	
  
layer	
  	
  
Cerebellar	
  
cortex	
  
IP3	
  
Cerebellum	
  
• To	
  create	
  the	
  
linkages	
  requires	
  
mapping	
  
• Mapping	
  is	
  
usually	
  incomplete	
  
and	
  not	
  always	
  
possible	
  
• Can’t	
  take	
  
advantage	
  of	
  
others’	
  work	
  
Gross	
  anatomy	
  ontology	
   Cell	
  centered	
  anatomy	
  ontology	
  
Reuse	
  iden>fiers	
  rather	
  than	
  recreate	
  them	
  
•  “The	
  trouble	
  is	
  that	
  if	
  I	
  make	
  up	
  all	
  of	
  my	
  
own	
  URIs,	
  my	
  RDF	
  document	
  has	
  no	
  
meaning	
  to	
  anyone	
  else	
  unless	
  I	
  explain	
  
what	
  each	
  URI	
  is	
  intended	
  to	
  denote	
  or	
  
mean.	
  Two	
  RDF	
  documents	
  with	
  no	
  URIs	
  in	
  
common	
  have	
  no	
  informa>on	
  that	
  can	
  be	
  
interrelated.”	
  
•  NIF	
  favors	
  reuse	
  of	
  iden>fiers	
  rather	
  than	
  
mapping	
  
•  Crea>ng	
  ontologies	
  to	
  be	
  used	
  as	
  common	
  
building	
  blocks:	
  modularity,	
  low	
  seman>c	
  
overhead,	
  is	
  important	
  
hTp://www.rdfabout.com/intro/#Introducing%20RDF	
  
Cerebellum	
  
Purkinje	
  cell	
  soma	
  
Cerebellum	
  
Purkinje	
  cell	
  
dendrite	
  
Cerebellum	
  
Purkinje	
  cell	
  axon	
  
(Cell	
  part	
  
ontology)	
  
Cerebellum	
  granule	
  cell	
  
layer	
  	
  (Anatomy	
  ontology)	
  
Cerebellum	
  Purkinje	
  
cell	
  layer	
  
Cerebellum	
  
molecular	
  layer	
  
Has	
  
part	
  
Has	
  
part	
  
Has	
  
part	
  
Is	
  part	
  of	
  
Is	
  part	
  of	
  
Is	
  part	
  of	
  
Calbindin	
   IP3	
  
(CHEBI:16595)	
  
Cerebellum	
  
Purkinje	
  neuron	
  
(Cell	
  Ontology)	
  
Cerebellar	
  cortex	
  
Has	
  part	
  
Has	
  part	
  
Has	
  part	
  
•  Neuroscience	
  Informa>on	
  Framework	
  
–  NIFSTD	
  available	
  for	
  download	
  
–  Ontoquest	
  web	
  services	
  
–  NIF	
  annota>on	
  services	
  and	
  mapping	
  	
  tools	
  
available	
  
–  Neurolex	
  available	
  via	
  SPARQL	
  endpoint	
  
•  Bioportal:	
  	
  Collec>on	
  of	
  >	
  300	
  ontologies	
  
covering	
  many	
  domains	
  
–  automated	
  mapping	
  between	
  ontologies	
  
–  Annota>on	
  services	
  
–  Web	
  services	
  for	
  access	
  
•  OBO	
  Foundry:	
  	
  hTp://www.obofoundry.org/	
  
–  Collec>on	
  of	
  community	
  ontologies	
  designed	
  
according	
  to	
  OBO	
  Foundry	
  principles	
  
•  Protégé	
  Ontology	
  editor:	
  	
  Edi>ng	
  tool	
  for	
  
construc>ng	
  ontologies.	
  	
  Excellent	
  short	
  course	
  
available	
  for	
  Protégé/OWL.	
  
•  Program	
  on	
  Ontologies	
  of	
  Neural	
  Structures	
  
(INCF):	
  	
  CUMBO,	
  Neurolex	
  Wiki,	
  Scalable	
  Brain	
  
Atlas	
  
You	
  can	
  enhance	
  your	
  tools	
  and	
  annota>on	
  with	
  community	
  
ontologies	
  
hTp://neurolex.org	
   Larson	
  et	
  al,	
  Fron>ers	
  in	
  Neuroinforma>cs,	
  in	
  press	
  
• Seman>c	
  MediWiki	
  
• Provide	
  a	
  simple	
  interface	
  
for	
  defining	
  the	
  concepts	
  
required	
  
• Light	
  weight	
  seman>cs	
  
• Good	
  teaching	
  tool	
  for	
  
learning	
  about	
  seman>c	
  
integra>on	
  and	
  the	
  benefits	
  of	
  
a	
  consistent	
  seman>c	
  
framework	
  
• Community	
  based:	
  
• Anyone	
  can	
  contribute	
  their	
  
terms,	
  concepts,	
  things	
  
• Anyone	
  can	
  edit	
  
• Anyone	
  can	
  link	
  
• Accessible:	
  	
  searched	
  by	
  Google	
  
• Growing	
  into	
  a	
  significant	
  
knowledge	
  base	
  for	
  
neuroscience	
  
Demo	
  	
  D03	
  
 200,000	
  
edits	
  
 150	
  
contributors	
  
Red	
  Links:	
  	
  Informa>on	
  is	
  missing	
  (or	
  misspelled)	
  
•  Neurolex	
  provides	
  an	
  
on-­‐line	
  computable	
  
index	
  for	
  expressing	
  
models	
  in	
  seman>c	
  
terms,	
  and	
  linking	
  to	
  
other	
  knowledge	
  and	
  
data	
  
•  INCF	
  task	
  forces	
  are	
  
contribu>ng	
  
knowledge	
  
•  Neuroscience	
  
knowledge	
  in	
  the	
  web	
  
Builds	
  a	
  knowledge	
  base	
  by	
  cross-­‐modular	
  rela>ons	
  
and	
  links	
  to	
  data	
  
Once	
  terms	
  have	
  been	
  proposed	
  and	
  veTed	
  by	
  
neuroscience	
  community,	
  NIF	
  feeds	
  them	
  back	
  to	
  general	
  
ontologies	
  to	
  enrich	
  coverage	
  of	
  neuroscience	
  
Because	
  they	
  are	
  sta>c	
  URL’s,	
  Wikis	
  are	
  searchable	
  by	
  
Google	
  
•  INCF	
  Project	
  
–  Neuron	
  Registry	
  
–  >	
  30	
  experts	
  
worldwide	
  
–  Fill	
  out	
  neuron	
  
pages	
  in	
  Neurolex	
  
Wiki	
  
–  Led	
  by	
  Dr.	
  Gordon	
  
Shepherd	
  
Soma	
  loca>on	
  
Dendrite	
  loca>on	
  
Axon	
  loca>on	
  
0	
  
50	
  
100	
  
150	
  
200	
  
250	
  
300	
  
Number	
   Total	
  
redlinks	
  
easy	
  
fixes	
  
hard	
  
fixes	
  
Soma	
  loca>on	
  
Dendrite	
  loca>on	
  
Axon	
  loca>on	
  
Social	
  networks	
  and	
  community	
  sites	
  let	
  us	
  learn	
  things	
  from	
  the	
  
collec>ve	
  behavior	
  of	
  contributors	
  	
  INCF	
  Knowledge	
  Space	
  
•  Of	
  the	
  ~	
  4000	
  columns	
  
that	
  NIF	
  queries,	
  
~1300	
  map	
  to	
  one	
  of	
  
our	
  core	
  categories:	
  
–  Organism	
  
–  Anatomical	
  structure	
  
–  Cell	
  
–  Molecule	
  
–  Func>on	
  
–  Dysfunc>on	
  
–  Technique	
  
•  30-­‐50%	
  of	
  NIF’s	
  
queries	
  autocomplete	
  
•  When	
  NIF	
  combines	
  
mul>ple	
  sources,	
  a	
  set	
  
of	
  common	
  fields	
  
emerges	
  
–  >Basic	
  informa>on	
  
models/seman>c	
  
models	
  exist	
  for	
  
certain	
  types	
  of	
  
en>>es	
  
Biomedical	
  science	
  does	
  have	
  a	
  conceptual	
  framework;	
  	
  but	
  we	
  don’t	
  place	
  
undo	
  importance	
  on	
  it	
  	
  must	
  >e	
  to	
  data	
  
Databases and Ontologies:  Where do we go from here?
•  NIF	
  can	
  be	
  used	
  to	
  survey	
  the	
  
data	
  landscape	
  
•  Analysis	
  of	
  NIF	
  shows	
  mul>ple	
  
databases	
  with	
  similar	
  scope	
  
and	
  content	
  
•  Many	
  contain	
  par>ally	
  
overlapping	
  data	
  
•  Data	
  “flows”	
  from	
  one	
  
resource	
  to	
  the	
  next	
  
–  Data	
  is	
  reinterpreted,	
  reanalyzed	
  or	
  
added	
  to	
  
•  Is	
  duplica>on	
  good	
  or	
  bad?	
  
NIF	
  is	
  trying	
  to	
  make	
  it	
  easier	
  to	
  work	
  with	
  diverse	
  data	
  
NIF	
  is	
  in	
  a	
  unique	
  posi>on	
  to	
  answer	
  ques>ons	
  about	
  the	
  neuroscience	
  
landscape	
  
Where	
  are	
  the	
  data?	
  
Striatum	
  
Hypothalamus	
  
Olfactory	
  bulb	
  
Cerebral	
  cortex	
  
Brain	
  
Brain	
  region	
  
Data	
  source	
  
∞	
  
What	
  is	
  easily	
  machine	
  
processable	
  and	
  accessible	
  
What	
  is	
  poten>ally	
  knowable	
  
What	
  is	
  known:	
  
Literature,	
  images,	
  human	
  
knowledge	
  
Unstructured;	
  	
  
Natural	
  language	
  
processing,	
  en>ty	
  
recogni>on,	
  image	
  
processing	
  and	
  
analysis;	
  	
  
communica>on	
  
“Known	
  unknowns	
  vs	
  
unknown	
  unknowns”	
  
Open	
  world	
  meets	
  closed	
  world	
  
Comprehensive	
  and	
  unbiased?	
  
We	
  know	
  a	
  lot	
  about	
  some	
  things	
  and	
  less	
  about	
  others;	
  	
  some	
  
of	
  NIF’s	
  sources	
  are	
  comprehensive;	
  	
  others	
  are	
  highly	
  biased	
  
But...NIF	
  has	
  >	
  2M	
  an>bodies,	
  
338,000	
  model	
  organisms,	
  and	
  3	
  
million	
  microarray	
  records	
  
Neocortex	
  
Olfactory	
  bulb	
  
Neostriatum	
  
Cochlear	
  nucleus	
  
All	
  neurons	
  with	
  cell	
  bodies	
  in	
  the	
  same	
  brain	
  region	
  are	
  grouped	
  
together	
  
Proper>es	
  in	
  Neurolex	
  
NIF	
  is	
  in	
  a	
  unique	
  posi>on	
  to	
  answer	
  ques>ons	
  about	
  the	
  neuroscience	
  
landscape	
  
Where	
  are	
  the	
  data?	
  
Striatum	
  
Hypothalamus	
  
Olfactory	
  bulb	
  
Cerebral	
  cortex	
  
Brain	
  
Brain	
  region	
  
Data	
  source	
   Funding	
  
• Requires	
  account	
  in	
  MyNIF	
  
• S>ll	
  a	
  work	
  in	
  progress,	
  i.e.,	
  it	
  breaks	
  a	
  lot	
  
• If	
  you	
  are	
  interested,	
  contact	
  us!	
  
Vadim	
  Astakhov,	
  Kepler	
  Workflow	
  Engine	
  
•  Gemma:	
  	
  Gene	
  ID	
  	
  +	
  Gene	
  Symbol	
  
•  DRG:	
  	
  Gene	
  name	
  +	
  Probe	
  ID	
  
•  Gemma	
  presented	
  results	
  rela>ve	
  to	
  baseline	
  chronic	
  
morphine;	
  	
  DRG	
  with	
  respect	
  to	
  saline,	
  so	
  direc>on	
  of	
  change	
  is	
  
opposite	
  in	
  the	
  2	
  databases	
  
• 	
  	
  	
  	
  	
  Analysis:	
  
• 1370	
  statements	
  from	
  Gemma	
  regarding	
  gene	
  expression	
  as	
  a	
  func>on	
  of	
  chronic	
  
morphine	
  
• 617	
  were	
  consistent	
  with	
  DRG;	
  	
  	
  over	
  half	
  	
  of	
  the	
  claims	
  of	
  the	
  paper	
  were	
  not	
  
confirmed	
  in	
  this	
  analysis	
  
• Results	
  for	
  1	
  gene	
  were	
  opposite	
  in	
  DRG	
  and	
  Gemma	
  
• 45	
  did	
  not	
  have	
  enough	
  informa>on	
  provided	
  in	
  the	
  paper	
  to	
  make	
  a	
  judgment	
  
Rela>vely	
  simple	
  standards	
  would	
  make	
  life	
  easier	
  
Databases and Ontologies:  Where do we go from here?
47/50	
  major	
  preclinical	
  
published	
  cancer	
  studies	
  
could	
  not	
  be	
  replicated	
  
•  “The	
  scien>fic	
  community	
  
assumes	
  that	
  the	
  claims	
  in	
  a	
  
preclinical	
  study	
  can	
  be	
  taken	
  at	
  
face	
  value-­‐that	
  although	
  there	
  
might	
  be	
  some	
  errors	
  in	
  detail,	
  
the	
  main	
  message	
  of	
  the	
  paper	
  
can	
  be	
  relied	
  on	
  and	
  the	
  data	
  
will,	
  for	
  the	
  most	
  part,	
  stand	
  the	
  
test	
  of	
  >me.	
  	
  Unfortunately,	
  this	
  
is	
  not	
  always	
  the	
  case.”	
  	
  
•  Ge{ng	
  data	
  out	
  sooner	
  in	
  a	
  
form	
  where	
  they	
  can	
  be	
  
exposed	
  to	
  many	
  eyes	
  and	
  
many	
  analyses	
  may	
  allow	
  us	
  
to	
  expose	
  errors	
  and	
  develop	
  
beTer	
  metrics	
  to	
  evaluate	
  the	
  
validity	
  of	
  data	
  
Begley	
  and	
  Ellis,	
  29	
  MARCH	
  2012	
  |	
  VOL	
  483	
  |	
  
NATURE	
  |	
  531	
  
NIF	
  favors	
  a	
  hybrid,	
  >ered,	
  
federated	
  system	
  
•  Domain	
  knowledge	
  
–  Ontologies	
  
•  Claims,	
  models	
  and	
  
observa>ons	
  
–  Virtuoso	
  RDF	
  triples	
  	
  
–  Model	
  repositories	
  
•  Data	
  
–  Data	
  federa>on	
  
–  Spa>al	
  data	
  
–  Workflows	
  
•  Narra>ve	
  
–  Full	
  text	
  access	
  
Neuron	
   Brain	
  part	
   Disease	
  
Organism	
   Gene	
  
Caudate	
  projects	
  to	
  
Snpc	
   Grm1	
  is	
  upregulated	
  in	
  
chronic	
  cocaine	
  
Betz	
  cells	
  
degenerate	
  in	
  ALS	
  
NIF	
  provides	
  the	
  tentacles	
  that	
  connect	
  the	
  pieces:	
  	
  a	
  
new	
  type	
  of	
  en>ty	
  for	
  21st	
  century	
  science	
  
Technique	
  
People	
  
•  Several	
  powerful	
  trends	
  should	
  change	
  the	
  way	
  we	
  think	
  about	
  
our	
  data:	
  	
  One	
  	
  Many	
  
–  Many	
  data	
  
•  Genera>on	
  of	
  data	
  is	
  ge{ng	
  easier	
  	
  shared	
  data	
  
•  Data	
  space	
  is	
  ge{ng	
  richer:	
  	
  more	
  –omes	
  everyday	
  
•  But...compared	
  to	
  the	
  biological	
  space,	
  s>ll	
  sparse	
  
–  Many	
  eyes	
  
•  Wisdom	
  of	
  crowds	
  
•  More	
  than	
  one	
  way	
  to	
  interpret	
  data	
  
–  Many	
  algorithms	
  
•  Not	
  a	
  single	
  way	
  to	
  analyze	
  data	
  
–  Many	
  analy>cs	
  
•  “Signatures”	
  in	
  data	
  may	
  not	
  be	
  directly	
  related	
  to	
  the	
  ques>on	
  for	
  which	
  they	
  
were	
  acquired	
  but	
  tell	
  us	
  something	
  really	
  interes>ng	
  
Are	
  you	
  exposing	
  or	
  burying	
  your	
  work?	
  
•  You	
  (and	
  the	
  machine)	
  have	
  to	
  be	
  able	
  to	
  find	
  it	
  
–  Accessible	
  through	
  the	
  web	
  
–  Structured	
  or	
  semi-­‐structured	
  
–  Annota>ons	
  
•  You	
  (and	
  the	
  machine)	
  	
  have	
  to	
  be	
  able	
  to	
  use	
  it	
  
–  Data	
  type	
  specified	
  and	
  in	
  an	
  ac>onable	
  form	
  
•  You	
  (and	
  the	
  machine)	
  have	
  to	
  know	
  what	
  the	
  data	
  
mean	
  
•  Seman>cs	
  
•  Context:	
  	
  Experimental	
  metadata	
  
•  Provenance:	
  	
  where	
  did	
  they	
  come	
  from	
  
Repor>ng	
  neuroscience	
  data	
  within	
  a	
  consistent	
  framework	
  helps	
  
enormously,	
  but	
  the	
  frameworks	
  need	
  not	
  be	
  onerous	
  
A	
  data	
  sharing	
  snafu	
  in	
  3	
  acts	
  
hTp://force11.org	
  
Jeff	
  Grethe,	
  UCSD,	
  Co	
  Inves>gator,	
  Interim	
  PI	
  
Amarnath	
  Gupta,	
  UCSD,	
  Co	
  Inves>gator	
  
Anita	
  Bandrowski,	
  NIF	
  Project	
  Leader	
  
Gordon	
  Shepherd,	
  Yale	
  University	
  
Perry	
  Miller	
  
Luis	
  Marenco	
  
Rixin	
  Wang	
  
David	
  Van	
  Essen,	
  Washington	
  University	
  
Erin	
  Reid	
  
Paul	
  Sternberg,	
  Cal	
  Tech	
  
Arun	
  Rangarajan	
  
Hans	
  Michael	
  Muller	
  
Yuling	
  Li	
  
Giorgio	
  Ascoli,	
  George	
  Mason	
  University	
  
Sridevi	
  Polavarum	
  
Fahim	
  Imam	
  
Larry	
  Lui	
  
Andrea	
  Arnaud	
  Stagg	
  
Jonathan	
  Cachat	
  
Jennifer	
  Lawrence	
  
Svetlana	
  Sulima	
  
Davis	
  Banks	
  
Vadim	
  Astakhov	
  
Xufei	
  Qian	
  
Chris	
  Condit	
  
Mark	
  Ellisman	
  
Stephen	
  Larson	
  
Willie	
  Wong	
  
Tim	
  Clark,	
  Harvard	
  University	
  
Paolo	
  Ciccarese	
  
Karen	
  Skinner,	
  NIH,	
  Program	
  Officer	
  
(re>red)	
  
Jonathan	
  Pollock,	
  NIH,	
  Program	
  Officer	
  
And	
  my	
  colleagues	
  in	
  Monarch,	
  dkNet,	
  3DVC,	
  Force	
  11	
  

More Related Content

What's hot

How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...Maryann Martone
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Neuroscience Information Framework
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...Maryann Martone
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...Neuroscience Information Framework
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...Maryann Martone
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EITESANGO
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...Neuroscience Information Framework
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Amit Sheth
 
DNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectiveDNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectivePalaniappan SP
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Amit Sheth
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformaticsc.titus.brown
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Amit Sheth
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Amit Sheth
 

What's hot (18)

Martone acs presentation
Martone acs presentationMartone acs presentation
Martone acs presentation
 
How do we know what we don’t know: Using the Neuroscience Information Framew...
How do we know what we don’t know:  Using the Neuroscience Information Framew...How do we know what we don’t know:  Using the Neuroscience Information Framew...
How do we know what we don’t know: Using the Neuroscience Information Framew...
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
EcsiNeurosciences Information Framework (NIF): An example of community Cyberi...
 
The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...The Neuroscience Information Framework: Establishing a practical semantic fra...
The Neuroscience Information Framework: Establishing a practical semantic fra...
 
Navigating the Neuroscience Data Landscape
Navigating the Neuroscience Data LandscapeNavigating the Neuroscience Data Landscape
Navigating the Neuroscience Data Landscape
 
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...A Deep Survey of the Digital Resource Landscape:Perspectives from the Neuros...
A Deep Survey of the Digital Resource Landscape: Perspectives from the Neuros...
 
Data Landscapes - Addiction
Data Landscapes - AddictionData Landscapes - Addiction
Data Landscapes - Addiction
 
EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017 EiTESAL eHealth Conference 14&15 May 2017
EiTESAL eHealth Conference 14&15 May 2017
 
The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...The Neuroscience Information Framework:The present and future of neuroscience...
The Neuroscience Information Framework:The present and future of neuroscience...
 
Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...Semantic Technology empowering Real World outcomes in Biomedical Research and...
Semantic Technology empowering Real World outcomes in Biomedical Research and...
 
DNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data PerspectiveDNA Sequence Data in Big Data Perspective
DNA Sequence Data in Big Data Perspective
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
Semantics for Bioinformatics: What, Why and How of Search, Integration and An...
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
Semantic Web for 360-degree Health: State-of-the-Art & Vision for Better Inte...
 
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
Semantic Web & Web 3.0 empowering real world outcomes in biomedical research ...
 
Martone grethe
Martone gretheMartone grethe
Martone grethe
 

Similar to Databases and Ontologies: Where do we go from here?

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkASIS&T
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...Maryann Martone
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neuroscience Information Framework
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptsirwansleman
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planC. Tobin Magle
 
Open repositories for neuroimaging research
Open repositories for neuroimaging researchOpen repositories for neuroimaging research
Open repositories for neuroimaging researchCameron Craddock
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelKrzysztof Gorgolewski
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Spark Summit
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planC. Tobin Magle
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Anita de Waard
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08Russ Altman
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016Jisc
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleEnis Afgan
 

Similar to Databases and Ontologies: Where do we go from here? (19)

RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information FrameworkRDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
RDAP14: Maryann Martone, Keynote, The Neuroscience Information Framework
 
The real world of ontologies and phenotype representation: perspectives from...
The real world of ontologies and phenotype representation:  perspectives from...The real world of ontologies and phenotype representation:  perspectives from...
The real world of ontologies and phenotype representation: perspectives from...
 
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...Neurosciences Information Framework (NIF): An example of community Cyberinfra...
Neurosciences Information Framework (NIF): An example of community Cyberinfra...
 
Big Data
Big Data Big Data
Big Data
 
A Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource LandscapeA Deep Survey of the Digital Resource Landscape
A Deep Survey of the Digital Resource Landscape
 
Data 101: A Gentle Introduction
Data 101: A Gentle IntroductionData 101: A Gentle Introduction
Data 101: A Gentle Introduction
 
Bioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.pptBioinformatics__Lecture_1.ppt
Bioinformatics__Lecture_1.ppt
 
Datat and donuts: how to write a data management plan
Datat and donuts: how to write a data management planDatat and donuts: how to write a data management plan
Datat and donuts: how to write a data management plan
 
Open repositories for neuroimaging research
Open repositories for neuroimaging researchOpen repositories for neuroimaging research
Open repositories for neuroimaging research
 
Share and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next levelShare and Reuse: how data sharing can take your research to the next level
Share and Reuse: how data sharing can take your research to the next level
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
Escaping Flatland: Interactive High-Dimensional Data Analysis in Drug Discove...
 
Data and Donuts: How to write a data management plan
Data and Donuts: How to write a data management planData and Donuts: How to write a data management plan
Data and Donuts: How to write a data management plan
 
Big Data in Clinical Research
Big Data in Clinical ResearchBig Data in Clinical Research
Big Data in Clinical Research
 
Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013Talk at OHSU, September 25, 2013
Talk at OHSU, September 25, 2013
 
Amia tb-review-08
Amia tb-review-08Amia tb-review-08
Amia tb-review-08
 
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
The fourth paradigm: data intensive scientific discovery - Jisc Digifest 2016
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
The pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an exampleThe pulse of cloud computing with bioinformatics as an example
The pulse of cloud computing with bioinformatics as an example
 

More from Maryann Martone

Dk net webinar tutorial pen
Dk net webinar tutorial penDk net webinar tutorial pen
Dk net webinar tutorial penMaryann Martone
 
Guided tutorial of the Neuroscience Information Framework
Guided tutorial of the Neuroscience Information FrameworkGuided tutorial of the Neuroscience Information Framework
Guided tutorial of the Neuroscience Information FrameworkMaryann Martone
 
Resource Identification Initiative
Resource Identification InitiativeResource Identification Initiative
Resource Identification InitiativeMaryann Martone
 
Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11Maryann Martone
 
FORCE11: Future of Research Communications and e-Scholarship
FORCE11:  Future of Research Communications and e-ScholarshipFORCE11:  Future of Research Communications and e-Scholarship
FORCE11: Future of Research Communications and e-ScholarshipMaryann Martone
 

More from Maryann Martone (6)

Dk net webinar tutorial pen
Dk net webinar tutorial penDk net webinar tutorial pen
Dk net webinar tutorial pen
 
Guided tutorial of the Neuroscience Information Framework
Guided tutorial of the Neuroscience Information FrameworkGuided tutorial of the Neuroscience Information Framework
Guided tutorial of the Neuroscience Information Framework
 
Resource Identification Initiative
Resource Identification InitiativeResource Identification Initiative
Resource Identification Initiative
 
Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11Open Access and Research Communication: The Perspective of Force11
Open Access and Research Communication: The Perspective of Force11
 
FORCE11: Future of Research Communications and e-Scholarship
FORCE11:  Future of Research Communications and e-ScholarshipFORCE11:  Future of Research Communications and e-Scholarship
FORCE11: Future of Research Communications and e-Scholarship
 
Alpsp final martone
Alpsp final martoneAlpsp final martone
Alpsp final martone
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...Nguyen Thanh Tu Collection
 
3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptx3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptxmary850239
 
POST ENCEPHALITIS case study Jitendra bhargav
POST ENCEPHALITIS case study  Jitendra bhargavPOST ENCEPHALITIS case study  Jitendra bhargav
POST ENCEPHALITIS case study Jitendra bhargavJitendra Bhargav
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudDr. Bruce A. Johnson
 
Pharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdfPharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdfSumit Tiwari
 
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxBBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxProf. Kanchan Kumari
 
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17Celine George
 
The OERs: Transforming Education for Sustainable Future by Dr. Sarita Anand
The OERs: Transforming Education for Sustainable Future by Dr. Sarita AnandThe OERs: Transforming Education for Sustainable Future by Dr. Sarita Anand
The OERs: Transforming Education for Sustainable Future by Dr. Sarita AnandDr. Sarita Anand
 
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...AKSHAYMAGAR17
 
30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc
30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc
30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.docdieu18
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsDhatriParmar
 
3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptx3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptxmary850239
 
Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxDhatriParmar
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudDr. Bruce A. Johnson
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...gdgsurrey
 
EDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderEDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderDr. Bruce A. Johnson
 
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYSDLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYSTeacherNicaPrintable
 
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptxAUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptxiammrhaywood
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
BÀI TẬP BỔ TRỢ TIẾNG ANH 11 THEO ĐƠN VỊ BÀI HỌC - CẢ NĂM - CÓ FILE NGHE (FRIE...
 
3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptx3.14.24 The Selma March and the Voting Rights Act.pptx
3.14.24 The Selma March and the Voting Rights Act.pptx
 
POST ENCEPHALITIS case study Jitendra bhargav
POST ENCEPHALITIS case study  Jitendra bhargavPOST ENCEPHALITIS case study  Jitendra bhargav
POST ENCEPHALITIS case study Jitendra bhargav
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced Stud
 
Least Significance Difference:Biostatics and Research Methodology
Least Significance Difference:Biostatics and Research MethodologyLeast Significance Difference:Biostatics and Research Methodology
Least Significance Difference:Biostatics and Research Methodology
 
Pharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdfPharmacology chapter No 7 full notes.pdf
Pharmacology chapter No 7 full notes.pdf
 
t-test Parametric test Biostatics and Research Methodology
t-test Parametric test Biostatics and Research Methodologyt-test Parametric test Biostatics and Research Methodology
t-test Parametric test Biostatics and Research Methodology
 
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptxBBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
BBA 205 BUSINESS ENVIRONMENT UNIT I.pptx
 
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
How to Customise Quotation's Appearance Using PDF Quote Builder in Odoo 17
 
The OERs: Transforming Education for Sustainable Future by Dr. Sarita Anand
The OERs: Transforming Education for Sustainable Future by Dr. Sarita AnandThe OERs: Transforming Education for Sustainable Future by Dr. Sarita Anand
The OERs: Transforming Education for Sustainable Future by Dr. Sarita Anand
 
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
DNA and RNA , Structure, Functions, Types, difference, Similarities, Protein ...
 
30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc
30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc
30-de-thi-vao-lop-10-mon-tieng-anh-co-dap-an.doc
 
Auchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian PoeticsAuchitya Theory by Kshemendra Indian Poetics
Auchitya Theory by Kshemendra Indian Poetics
 
3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptx3.14.24 Gender Discrimination and Gender Inequity.pptx
3.14.24 Gender Discrimination and Gender Inequity.pptx
 
Alamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptxAlamkara theory by Bhamaha Indian Poetics (1).pptx
Alamkara theory by Bhamaha Indian Poetics (1).pptx
 
LEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced StudLEAD6001 - Introduction to Advanced Stud
LEAD6001 - Introduction to Advanced Stud
 
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
Certification Study Group - Professional ML Engineer Session 3 (Machine Learn...
 
EDD8524 The Future of Educational Leader
EDD8524 The Future of Educational LeaderEDD8524 The Future of Educational Leader
EDD8524 The Future of Educational Leader
 
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYSDLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
DLL Catch Up Friday March 22.docx CATCH UP FRIDAYS
 
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptxAUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
AUDIENCE THEORY - PARTICIPATORY - JENKINS.pptx
 

Databases and Ontologies: Where do we go from here?

  • 1. Maryann  E.    Martone,  Ph.  D.   University  of  California,  San  Diego   INCF  Neuroinforma>cs  Short  Course,  Stockholm,  August  2013  
  • 2. •  Introduc>on   •  Introduc>on  to  the  Neuroscience  Informa>on   Framework   •  Structured  informa>on:    data,  databases   •  Federa>ng  neuroscience-­‐relevant  databases   •  Informa>on  frameworks   •  Ontologies   •  What  can  we  do  with  informa>on  in  the  NIF?   •  Conclusions  
  • 3. Scholar   Library   Scholar   Publisher   FORCE11.org:    Future  of  research  communica>ons  and  e-­‐scholarship  
  • 4. Scholar   Consumer   Libraries   Data  Repositories   Code  Repositories   Community  databases/ plaRorms   OA   Curators   Social   Networks   Social   Networks  Social   Networks   Peer  Reviewers   Narra>ve   Workflows   Data   Models   Mul>media   Nanopublica>ons   Code  
  • 6. •  NIF’s  mission  is  to  maximize  the  awareness  of,  access  to   and  u>lity  of  research  resources  produced  worldwide  to   enable  beTer  science  and  promote  efficient  use   –  NIF  unites  neuroscience  informa>on  without  respect  to  domain,   funding  agency,  ins>tute  or  community   –  NIF  is  like  a  “Pub  Med”  for  all  biomedical  resources  and  a  “Pub   Med  Central”  for  databases   –  Makes  them  searchable  from  a  single  interface   –  Prac>cal  and  cost-­‐effec>ve;    tries  to  be  sensible   –  Learned  a  lot  about  the  effec0ve  data  sharing     The  Neuroscience  Informa>on  Framework  is  an  ini>a>ve  of  the   NIH  Blueprint  consor>um  of  ins>tutes        hTp://neuinfo.org  
  • 7. We’d  like  to  be  able  to  find:   •  What  is  known****:   –  What  are  the  projec>ons  of  hippocampus?   –  Is  GRM1  expressed  In  cerebral  cortex?   –  What  genes  have  been  found  to  be  upregulated  in   chronic  drug  abuse  in  adults   –  What  animal  models  have  similar  phenotypes  to   Parkinson’s  disease?   –  What  studies  used  my  polyclonal  an>body  against   GABA  in  humans?   •  What  is  not  known:   –  Connec>ons  among  data   –  Gaps  in  knowledge   A  framework  makes  it  easier  to  address  these  ques>ons  
  • 9. Neuroscience  is  unlikely  to  be  served  by   a  few  large  databases  like  the  genomics   and  proteomics  community   Whole  brain  data   (20  um   microscopic  MRI)   Mosiac  LM   images  (1  GB+)   Conven>onal  LM   images   Individual  cell   morphologies   EM  volumes  &   reconstruc>ons   Solved  molecular   structures   No  single  technology  serves   these  all  equally  well.    Mul0ple  data  types;     mul0ple  scales;    mul0ple   databases  
  • 10. •  Data  warehouse:    May  contain  data  from  diverse   sources;    schemas  are  integrated.    Data  are  “cleaned”   to  fit  unified  data  model.    One  database  to  rule  them   all...   •  Data  federa>on:    a  virtual  database  that  stores   data  defini>ons  and  not  the  data  itself.  The  virtual   database  will  have  informa>on  about  the  loca>on  of   the  data.    When  a  single  call  is  made  to  a  virtual   database,  the  technology  ensures  mul>ple  calls  to   underlying  databases  and  is  also  responsible  for   meaningfully  aggrega>ng  the  returned  result  sets.   From  wikipedia  and  hTp://www.infosysblogs.com/oracle/2010/01/ data_federa>on_a_potent_subst_1.html  
  • 11. Subject    473   •  Species:    mouse  (string)   •  Age:    50  days  (integer)   •  Age  category:    adult   •  Protocol:    2   Rela0onal  Database   “Mice  (aged  50  days)  were  perfused  with   4%  paraformaldehyde  and  brains  were   sec>oned  at  a  thickness  of  50  um.     Sec>ons  were  labeled  using  an>bodies   against  calbindin  and  imaged  on  a  Zeiss     confocal  microscope.”       Data  model;    data  types,  formal  query   language   Free  text   En>ty  recogni>on;  Natural  language   processing  
  • 12. ∞   What  is  easily  machine   processable  and  accessible   What  is  poten>ally  knowable   What  is  known:   Literature,  images,  human   knowledge   Unstructured;     Natural  language   processing,  en>ty   recogni>on,  image   processing  and   analysis;  paywalls   communica>on   Abstracts  vs  full   text  vs  tables  etc  
  • 13. hGp://neuinfo.org   June10,  2013   dkCOIN  Inves>gator's  Retreat   13   •  A  portal  for  finding  and  using   neuroscience  resources     A  consistent  framework  for   describing  resources     Provides  simultaneous   search  of  mul>ple  types  of   informa>on,  organized  by   category     Supported  by  an  expansive   ontology  for  neuroscience     U>lizes  advanced   technologies  to  search  the   “hidden  web”   UCSD,  Yale,  Cal  Tech,  George  Mason,  Washington  Univ   Literature   Database   Federa>on   Registry  
  • 15. With  the  thousands  of  databases  and  other  informa>on  sources   available,  simple  descrip>ve  metadata  will  not  suffice  
  • 16. • NIF  curators   • Nomina>on  by  the   community   • Semi-­‐automated  text   mining  pipelines    NIF  Registry    Requires  no  special   skills    Site  map  available   for  local  hos>ng   • NIF  Data  Federa>on   • DISCO  interop   • Requires  some   programming  skill   • Open  Source  Brain  <   2  hr   Low  barrier  to  entry;    incremental  refinement  
  • 17. NIF  was  designed  to  be  populated  rapidly   with  progressive  refinement  
  • 18. Databases  come  in  many  shapes  and  sizes   •  Primary  data:   –  Data  available  for  reanalysis,  e.g.,   microarray  data  sets  from  GEO;     brain  images  from  XNAT;     microscopic  images  (CCDB/CIL)   •  Secondary  data   –  Data  features  extracted  through   data  processing  and  some>mes   normaliza>on,  e.g,  brain  structure   volumes  (IBVD),  gene  expression   levels  (Allen  Brain  Atlas);    brain   connec>vity  statements  (BAMS)   •  Ter>ary  data   –  Claims  and  asser>ons  about  the   meaning  of  data   •  E.g.,  gene  upregula>on/ downregula>on,  brain   ac>va>on  as  a  func>on  of  task   •  Registries:   –  Metadata   –  Pointers  to  data  sets  or   materials  stored  elsewhere   •  Data  aggregators   –  Aggregate  data  of  the  same   type  from  mul>ple  sources,   e.g.,  Cell  Image   Library  ,SUMSdb,  Brede   •  Single  source   –  Data  acquired  within  a  single   context  ,  e.g.,  Allen  Brain  Atlas   Researchers  are  producing  a  variety  of   informa>on  ar>facts  using  a  mul>tude  of   technologies  
  • 19. • Data:  values  of  qualita>ve  or  quan>ta>ve  variables,  belonging  to  a  set  of  items...  oten   the  results  of  measurements  (Wikipedia)   • Metadata:    “Data  about  data”   • Structural  metadata:   • the  design  and  specifica>on  of  data  structures  and  is  more  properly  called   "data  about  the  containers  of  data”  (Wikipedia)   • e.g.,  image  size,  bit  depth,  integer  vs  string   • Descrip>ve  metadata:       • individual  instances  of  applica>on  data,  the  data  content  “data  about  data   content”   • e.g.,  creator,  subject,     • Data  type:    the  form  of  the  data  for  the  purposes  of  data  opera>ons   • Data  Integra>on:  combining  data  residing  in  different  sources  and  providing  users   with  a  unified  view  of  these  data   “Metadata  are  data”  -­‐Wikipedia  
  • 20. 0   50   100   150   200   250   0.01   0.1   1   10   100   1000   6-­‐12   12-­‐12   7-­‐13   1-­‐14   8-­‐14   2-­‐15   9-­‐15   4-­‐16   10-­‐16   5-­‐17   Number  of  Federated  Databases   Number  of  Federated  Records  (Millions)   NIF  searches  the  largest  colla>on  of   neuroscience-­‐relevant  data  on  the  web   DISCO   June10,  2013   dkCOIN  Inves>gator's  Retreat   20  
  • 21. •  Long  tail  data:    large  numbers  of  small  data  sets   hTp://en.wikipedia.org/wiki/Long_tail  
  • 22. Hippocampus  OR  “Cornu  Ammonis”  OR   “Ammon’s  horn”   Query  expansion:    Synonyms   and  related  concepts   Boolean  queries   Data  sources   categorized  by   “data  type”  and   level  of  nervous   system   Common  views   across  mul>ple   sources   Tutorials  for  using   full  resource  when   ge{ng  there  from   NIF   Link  back  to   record  in   original  source  
  • 23. Connects  to   Synapsed  with   Synapsed  by   Input  region   innervates   Axon  innervates   Projects  to  Cellular  contact   Subcellular  contact   Source  site   Target    site   Each  resource  implements  a  different,  though  related  model;     systems  are  complex  and  difficult  to  learn,  in  many  cases  
  • 25. •  Current  web  is   designed  to  share   documents   –  Documents  are   unstructured  data   •  Much  of  the  content   of  digital  resources  is   part  of  the  “hidden   web”   •  Wikipedia:    The  Deep  Web   (also  called  Deepnet,  the   invisible  Web,  DarkNet,   Undernet  or  the  hidden   Web)  refers  to   World  Wide  Web  content   that  is  not  part  of  the   Surface  Web,  which  is   indexed  by  standard   search  engines.  
  • 26. Even  Google  needs  a  knowledge  framework  
  • 27. Knowledge  in  space  and  spa>al  rela>onships   (the  “where”)   Knowledge  in  words,  terminologies  and   logical  rela>onships  (the  “what”)  
  • 28. Purkinje   Cell   Axon   Terminal   Axon   Dendri>c   Tree   Dendri>c   Spine   Dendrite   Cell  body   Cerebellar   cortex   There  is  liTle  obvious  connec>on  between   data  sets  taken  at  different  scales  using   different  microscopies  without  an  explicit   representa>on  of  the  biological  objects  that   the  data  represent  
  • 29. •  NIF  covers  mul>ple  structural  scales  and  domains  of  relevance  to  neuroscience   •  Aggregate  of  community  ontologies  with  some  extensions  for  neuroscience,  e.g.,  Gene   Ontology,  Chebi,  Protein  Ontology   NIFSTD   Organism   NS  Func>on  Molecule   Inves>ga>on   Subcellular   structure   Macromolecule   Gene   Molecule  Descriptors   Techniques   Reagent   Protocols   Cell   Resource   Instrument   Dysfunc>on   Quality   Anatomical   Structure  
  • 30. Brain   Cerebellum   Purkinje  Cell  Layer   Purkinje  cell   neuron   has  a   has  a   has  a   is  a   •  Ontology:  an  explicit,  formal   representa>on  of  concepts     rela>onships  among  them  within   a  par>cular  domain  that   expresses  human  knowledge  in  a   machine  readable  form   •  Branch  of  philosophy:    a  theory   of  what  is   •  e.g.,  Gene  ontologies  
  • 31. •  Express  neuroscience  concepts  in  a  way  that  is  machine  readable     –  Synonyms,  lexical  variants   –  Defini>ons   •  Provide  means  of  disambigua>on  of  strings   –  Nucleus  part  of  cell;    nucleus  part  of  brain;    nucleus  part  of  atom   •  Rules  by  which  a  class  is  defined,  e.g.,  a  GABAergic  neuron  is  neuron  that  releases  GABA  as  a   neurotransmiTer   •  Proper>es   –  Support  reasoning   •  Provide  universals  for  naviga>ng  across  different  data  sources   –  Seman>c  “index”   –  Link  data  through  rela>onships  not  just  one-­‐to-­‐one  mappings   •  Provide  the  basis  for  concept-­‐based  queries  to  probe  and  mine  data   •  Establish  a  seman>c  framework  for  landscape  analysis   Mathema>cs,  Computer  code  or  Esperanto  
  • 32. June10,  2013   32   Aligns  sources  to  the  NIF  seman>c  framework  
  • 34. birnlex_1741   Brodmann.10   Explicit  mapping  of  database  content  helps  disambiguate  non-­‐unique  and  custom   terminology  
  • 36. •  Search  Google:    GABAergic  neuron   •  Search  NIF:    GABAergic  neuron   –  NIF  automa>cally  searches  for  types  of   GABAergic  neurons   Types  of  GABAergic   neurons   Neuroscience Information Framework – http://neuinfo.org
  • 37. Equivalence  classes;    restric>ons   Arbitrary  but  defensible   • Neurons  classified  by   • Circuit  role:    principal  neuron  vs   interneuron   • Molecular  cons>tuent:    Parvalbumin-­‐ neurons,  calbindin-­‐neurons   • Brain  region:    Cerebellar  neuron   • Morphology:    Spiny  neuron   •   Molecule  Roles:    Drug  of  abuse,  anterograde   tracer,  retrograde  tracer   • Brain  parts:    Circumventricular  organ   • Organisms:    Non-­‐human  primate,  non-­‐human   vertebrate   • Quali>es:    Expression  level   • Techniques:    Neuroimaging  
  • 38. What  genes  are  upregulated  by  drugs  of  abuse  in  the   adult  mouse?  (show  me  the  data!)   Morphine   Increased   expression   Adult  Mouse  
  • 39. • NIF  Connec>vity:    7  databases  containing  connec>vity  primary  data  or  claims   from  literature  on  connec>vity  between  brain  regions   • Brain  Architecture  Management  System  (rodent)   • Temporal  lobe.com  (rodent)   • Connectome  Wiki  (human)   • Brain  Maps  (various)   • CoCoMac  (primate  cortex)   • UCLA  Mul>modal  database  (Human  fMRI)   • Avian  Brain  Connec>vity  Database  (Bird)   • Total:    1800  unique  brain  terms  (excluding  Avian)   • Number  of  exact  terms  used  in  >  1  database:    42   • Number  of  synonym  matches:    99   • Number  of  1st  order  partonomy  matches:    385  
  • 40. •  Realism  vs  conceptualism   •  Controlled  vocabularies  vs  taxonomies  vs  ontology?   •  How  do  I  name  classes?   •  Shared  vs  custom  ontologies   •  Single  vs  mul>ple  inheritance   •  RDF  vs  OWL?   •  Top  down  vs  boTom  up:    heavy  weight  vs  light   weight  ontologies   •  Should  I  encode  everything  in  my  ontology?   Many  schools  of  thought  about  ontologies-­‐their  construc>on   and  use  
  • 41. •  Controlled  vocabularies:  prescribed   list  of  terms  or  headings  each  one  having   an  assigned  meaning   •  Lexicon/Thesaurus:  Vocabularies  +   their  lexical  proper>es,  e.g.,  synonyms,   lexical  variants   •  Taxonomy:    monohierarchical   classifica>on  of  concepts,  as  used,  for   example,  in  the  classifica>on  of  biological   organisms,  built  on  the  “is  a  “  rela>onship   •   Ontology:    specifica>on  of  the  concepts   of  a  domain  and  their  rela>onships,   structured  to  allow  computer  processing   and  reasoning     hTp://www.willpowerinfo.co.uk/glossary.htm   Mike  Bergman  
  • 42. •  Iden>ty:   –  En>>es  are  uniquely  iden>fiable   –  Name  is  a  meaningless  numerical  iden>fier  (URI:    Uniform  resource  iden>fier)   –  Any  number  of  human  readable  labels  can  be  assigned  to  it   •  Defini>on:       –  Genera:    is  a  type  of  (cell,  anatomical  structure,  cell  part)   –  Differen>a:    “has  a”  A  set  of  proper>es  that  dis>nguish  among  members  of  that   class   –  Can  include  necessary  and  sufficient  condi>ons   •  Implementa>on:    How  is  this  defini>on  expressed   –  Depending  on  the  nature  of  the  concept  or  en>ty  and  the  needs  of  the   informa>on  system,  we  can  say  more  or  fewer  things   –  Different  languages;    can  express  different  things  about  the  concept  that  can  be   computed  upon   •  OWL  W3C  standard,  RDF   birnlex_1362   CA2   CHEBI_29108   CA2   NIF  follows  OBO  Foundry  best  prac>ces  for  naming  and  defining   classes  
  • 43. •  XML:    Extensible  Mark  Up  language:      Mark  up  language  for  data.    XML  itself  is  not  very   much  concerned  with  meaning.  XML  nodes  don't  need  to  be  associated  with  par>cular   concepts,  and  the  XML  standard  doesn't  indicate  how  to  derive  a  fact  from  a  document.   •  RDF:    Resource  Descrip>on  Framework:    a  general  method  to  decompose  knowledge  into   small  pieces,  with  some  rules  about  the  seman>cs,  or  meaning,  of  those  pieces.  What  sets   RDF  apart  from  XML  is  that  RDF  is  designed  to  represent  knowledge  in  a  distributed  world.   That  RDF  is  designed  for  knowledge,  and  not  data,  means  RDF  is  par>cularly  concerned   with  meaning.   –  Small  pieces  are  called  “triples”:    Subject  predicate  object   –  Purkinje  neuron  (S)  has  neurotransmiDer  (P)  GABA  (O)   •  RDFS  -­‐  a  method  of  specifying  metadata  about  proper>es/characteris>cs  of  things  and   classes  of  things  such  that  inference  an  be  carried  out  (conceptualized  in  RDF)   •  OWL  (Web  Ontology  Language)  -­‐  a  more  complex(/powerful)  extension  of  RDFS   •  SPARQL  -­‐  Is  a  query  language  designed  for  RDF  (similar  to  how  SQL  was  designed  for   rela>onal  databases)   hTp://answers.seman>cweb.com/ques>ons/15215/whats-­‐the-­‐difference-­‐between-­‐using-­‐rdfsowl-­‐ versus-­‐xml   hTp://www.rdfabout.com/intro/#Introducing%20RDF  
  • 44. Rela>onal  model   • Mouse  has  age  50  days   • Protocol  uses  instrument  confocal   microscope   • A  confocal  imaging  protocol  is  a  protocol   that  uses  instrument  confocal  microscope   RDF:    The  computer  doesn't  need  to  know  what   has  actually  means  in  English  for  this  to  be  useful.   It  is  let  up  to  the  applica>on  writer  to  choose   appropriate  names  for  things  (confocal   microscope)  and  to  use  the  right  predicates  (uses,   has).  RDF  tools  are  ignorant  of  what  these  names   mean,  but  they  can  s>ll  usefully  process  the   informa>on.-­‐hTp://www.rdfabout.com/intro/ #Introducing%20RDF   May  link  to  other  informa>on,  e.g.,  mouse  is   a  rodent  
  • 45. The  thalamus  projects  to  the  cortex  in  mammals   •  Universal:  allValuesFrom:    If  a  mammal  has  a  cortex  and  a   thalamus,  then  the  thalamus  must  project  to  the  cortex   •  Existen>al:    SomeValuesFrom:    The  thalamus  projects  to   the  cortex  in  at  least  one  member  of  the  class  mammal   •  Disjointness:    owl:disjointWith:  a  member  of  one  class   cannot  simultaneously  be  an  instance  of  a  specified  other   class:    Rep>les  are  disjoint  from  mammals   W3C  OWL  guide:    www.w3.org/TR/2004/REC-­‐owl-­‐guide-­‐20040210/   Restric>ons  places  on  classes  allow  us  to  reason   over  the  ontology  and  check  for  consistency  
  • 46. 46  
  • 47. 1.  Look  brain  region  up  in  NeuroLex   2.  Look  up  cells  contained  in  the  brain   region   3.  Find  those  cells  that  are  known  to  project   out  of  that  brain  region   4.  Look  up  the  neurotransmiTers  for  those   cells   5.  Determine  whether  those   neurotransmiTers  are  known  to  be   excitatory  or  inhibitory   6.  Report  the  projec>on  as  excitatory  or   inhibitory,  and  report  the  en>re  chain  of   logic  with  links  back  to  the  wiki  pages   where  they  were  made   7.  Make  sure  user  can  get  back  to  each   statement  in  the  logic  chain  to  edit  it  if   they  think  it  is  wrong   Stephen  Larson   CHEBI:18243  
  • 48. Brain   Cerebellum   Cortex   Cerebellar  Purkinje   cell   Purkinje  neuron   Purkinje  cell   soma   Purkinje  cell   layer     Cerebellar   cortex   IP3   Cerebellum   • To  create  the   linkages  requires   mapping   • Mapping  is   usually  incomplete   and  not  always   possible   • Can’t  take   advantage  of   others’  work   Gross  anatomy  ontology   Cell  centered  anatomy  ontology   Reuse  iden>fiers  rather  than  recreate  them  
  • 49. •  “The  trouble  is  that  if  I  make  up  all  of  my   own  URIs,  my  RDF  document  has  no   meaning  to  anyone  else  unless  I  explain   what  each  URI  is  intended  to  denote  or   mean.  Two  RDF  documents  with  no  URIs  in   common  have  no  informa>on  that  can  be   interrelated.”   •  NIF  favors  reuse  of  iden>fiers  rather  than   mapping   •  Crea>ng  ontologies  to  be  used  as  common   building  blocks:  modularity,  low  seman>c   overhead,  is  important   hTp://www.rdfabout.com/intro/#Introducing%20RDF  
  • 50. Cerebellum   Purkinje  cell  soma   Cerebellum   Purkinje  cell   dendrite   Cerebellum   Purkinje  cell  axon   (Cell  part   ontology)   Cerebellum  granule  cell   layer    (Anatomy  ontology)   Cerebellum  Purkinje   cell  layer   Cerebellum   molecular  layer   Has   part   Has   part   Has   part   Is  part  of   Is  part  of   Is  part  of   Calbindin   IP3   (CHEBI:16595)   Cerebellum   Purkinje  neuron   (Cell  Ontology)   Cerebellar  cortex   Has  part   Has  part   Has  part  
  • 51. •  Neuroscience  Informa>on  Framework   –  NIFSTD  available  for  download   –  Ontoquest  web  services   –  NIF  annota>on  services  and  mapping    tools   available   –  Neurolex  available  via  SPARQL  endpoint   •  Bioportal:    Collec>on  of  >  300  ontologies   covering  many  domains   –  automated  mapping  between  ontologies   –  Annota>on  services   –  Web  services  for  access   •  OBO  Foundry:    hTp://www.obofoundry.org/   –  Collec>on  of  community  ontologies  designed   according  to  OBO  Foundry  principles   •  Protégé  Ontology  editor:    Edi>ng  tool  for   construc>ng  ontologies.    Excellent  short  course   available  for  Protégé/OWL.   •  Program  on  Ontologies  of  Neural  Structures   (INCF):    CUMBO,  Neurolex  Wiki,  Scalable  Brain   Atlas   You  can  enhance  your  tools  and  annota>on  with  community   ontologies  
  • 52. hTp://neurolex.org   Larson  et  al,  Fron>ers  in  Neuroinforma>cs,  in  press   • Seman>c  MediWiki   • Provide  a  simple  interface   for  defining  the  concepts   required   • Light  weight  seman>cs   • Good  teaching  tool  for   learning  about  seman>c   integra>on  and  the  benefits  of   a  consistent  seman>c   framework   • Community  based:   • Anyone  can  contribute  their   terms,  concepts,  things   • Anyone  can  edit   • Anyone  can  link   • Accessible:    searched  by  Google   • Growing  into  a  significant   knowledge  base  for   neuroscience   Demo    D03    200,000   edits    150   contributors  
  • 53. Red  Links:    Informa>on  is  missing  (or  misspelled)  
  • 54. •  Neurolex  provides  an   on-­‐line  computable   index  for  expressing   models  in  seman>c   terms,  and  linking  to   other  knowledge  and   data   •  INCF  task  forces  are   contribu>ng   knowledge   •  Neuroscience   knowledge  in  the  web   Builds  a  knowledge  base  by  cross-­‐modular  rela>ons   and  links  to  data  
  • 55. Once  terms  have  been  proposed  and  veTed  by   neuroscience  community,  NIF  feeds  them  back  to  general   ontologies  to  enrich  coverage  of  neuroscience  
  • 56. Because  they  are  sta>c  URL’s,  Wikis  are  searchable  by   Google  
  • 57. •  INCF  Project   –  Neuron  Registry   –  >  30  experts   worldwide   –  Fill  out  neuron   pages  in  Neurolex   Wiki   –  Led  by  Dr.  Gordon   Shepherd   Soma  loca>on   Dendrite  loca>on   Axon  loca>on   0   50   100   150   200   250   300   Number   Total   redlinks   easy   fixes   hard   fixes   Soma  loca>on   Dendrite  loca>on   Axon  loca>on   Social  networks  and  community  sites  let  us  learn  things  from  the   collec>ve  behavior  of  contributors    INCF  Knowledge  Space  
  • 58. •  Of  the  ~  4000  columns   that  NIF  queries,   ~1300  map  to  one  of   our  core  categories:   –  Organism   –  Anatomical  structure   –  Cell   –  Molecule   –  Func>on   –  Dysfunc>on   –  Technique   •  30-­‐50%  of  NIF’s   queries  autocomplete   •  When  NIF  combines   mul>ple  sources,  a  set   of  common  fields   emerges   –  >Basic  informa>on   models/seman>c   models  exist  for   certain  types  of   en>>es   Biomedical  science  does  have  a  conceptual  framework;    but  we  don’t  place   undo  importance  on  it    must  >e  to  data  
  • 60. •  NIF  can  be  used  to  survey  the   data  landscape   •  Analysis  of  NIF  shows  mul>ple   databases  with  similar  scope   and  content   •  Many  contain  par>ally   overlapping  data   •  Data  “flows”  from  one   resource  to  the  next   –  Data  is  reinterpreted,  reanalyzed  or   added  to   •  Is  duplica>on  good  or  bad?   NIF  is  trying  to  make  it  easier  to  work  with  diverse  data  
  • 61. NIF  is  in  a  unique  posi>on  to  answer  ques>ons  about  the  neuroscience   landscape   Where  are  the  data?   Striatum   Hypothalamus   Olfactory  bulb   Cerebral  cortex   Brain   Brain  region   Data  source  
  • 62. ∞   What  is  easily  machine   processable  and  accessible   What  is  poten>ally  knowable   What  is  known:   Literature,  images,  human   knowledge   Unstructured;     Natural  language   processing,  en>ty   recogni>on,  image   processing  and   analysis;     communica>on   “Known  unknowns  vs   unknown  unknowns”   Open  world  meets  closed  world  
  • 63. Comprehensive  and  unbiased?   We  know  a  lot  about  some  things  and  less  about  others;    some   of  NIF’s  sources  are  comprehensive;    others  are  highly  biased   But...NIF  has  >  2M  an>bodies,   338,000  model  organisms,  and  3   million  microarray  records  
  • 64. Neocortex   Olfactory  bulb   Neostriatum   Cochlear  nucleus   All  neurons  with  cell  bodies  in  the  same  brain  region  are  grouped   together   Proper>es  in  Neurolex  
  • 65. NIF  is  in  a  unique  posi>on  to  answer  ques>ons  about  the  neuroscience   landscape   Where  are  the  data?   Striatum   Hypothalamus   Olfactory  bulb   Cerebral  cortex   Brain   Brain  region   Data  source   Funding  
  • 66. • Requires  account  in  MyNIF   • S>ll  a  work  in  progress,  i.e.,  it  breaks  a  lot   • If  you  are  interested,  contact  us!   Vadim  Astakhov,  Kepler  Workflow  Engine  
  • 67. •  Gemma:    Gene  ID    +  Gene  Symbol   •  DRG:    Gene  name  +  Probe  ID   •  Gemma  presented  results  rela>ve  to  baseline  chronic   morphine;    DRG  with  respect  to  saline,  so  direc>on  of  change  is   opposite  in  the  2  databases   •           Analysis:   • 1370  statements  from  Gemma  regarding  gene  expression  as  a  func>on  of  chronic   morphine   • 617  were  consistent  with  DRG;      over  half    of  the  claims  of  the  paper  were  not   confirmed  in  this  analysis   • Results  for  1  gene  were  opposite  in  DRG  and  Gemma   • 45  did  not  have  enough  informa>on  provided  in  the  paper  to  make  a  judgment   Rela>vely  simple  standards  would  make  life  easier  
  • 69. 47/50  major  preclinical   published  cancer  studies   could  not  be  replicated   •  “The  scien>fic  community   assumes  that  the  claims  in  a   preclinical  study  can  be  taken  at   face  value-­‐that  although  there   might  be  some  errors  in  detail,   the  main  message  of  the  paper   can  be  relied  on  and  the  data   will,  for  the  most  part,  stand  the   test  of  >me.    Unfortunately,  this   is  not  always  the  case.”     •  Ge{ng  data  out  sooner  in  a   form  where  they  can  be   exposed  to  many  eyes  and   many  analyses  may  allow  us   to  expose  errors  and  develop   beTer  metrics  to  evaluate  the   validity  of  data   Begley  and  Ellis,  29  MARCH  2012  |  VOL  483  |   NATURE  |  531  
  • 70. NIF  favors  a  hybrid,  >ered,   federated  system   •  Domain  knowledge   –  Ontologies   •  Claims,  models  and   observa>ons   –  Virtuoso  RDF  triples     –  Model  repositories   •  Data   –  Data  federa>on   –  Spa>al  data   –  Workflows   •  Narra>ve   –  Full  text  access   Neuron   Brain  part   Disease   Organism   Gene   Caudate  projects  to   Snpc   Grm1  is  upregulated  in   chronic  cocaine   Betz  cells   degenerate  in  ALS   NIF  provides  the  tentacles  that  connect  the  pieces:    a   new  type  of  en>ty  for  21st  century  science   Technique   People  
  • 71. •  Several  powerful  trends  should  change  the  way  we  think  about   our  data:    One    Many   –  Many  data   •  Genera>on  of  data  is  ge{ng  easier    shared  data   •  Data  space  is  ge{ng  richer:    more  –omes  everyday   •  But...compared  to  the  biological  space,  s>ll  sparse   –  Many  eyes   •  Wisdom  of  crowds   •  More  than  one  way  to  interpret  data   –  Many  algorithms   •  Not  a  single  way  to  analyze  data   –  Many  analy>cs   •  “Signatures”  in  data  may  not  be  directly  related  to  the  ques>on  for  which  they   were  acquired  but  tell  us  something  really  interes>ng   Are  you  exposing  or  burying  your  work?  
  • 72. •  You  (and  the  machine)  have  to  be  able  to  find  it   –  Accessible  through  the  web   –  Structured  or  semi-­‐structured   –  Annota>ons   •  You  (and  the  machine)    have  to  be  able  to  use  it   –  Data  type  specified  and  in  an  ac>onable  form   •  You  (and  the  machine)  have  to  know  what  the  data   mean   •  Seman>cs   •  Context:    Experimental  metadata   •  Provenance:    where  did  they  come  from   Repor>ng  neuroscience  data  within  a  consistent  framework  helps   enormously,  but  the  frameworks  need  not  be  onerous  
  • 73. A  data  sharing  snafu  in  3  acts  
  • 75. Jeff  Grethe,  UCSD,  Co  Inves>gator,  Interim  PI   Amarnath  Gupta,  UCSD,  Co  Inves>gator   Anita  Bandrowski,  NIF  Project  Leader   Gordon  Shepherd,  Yale  University   Perry  Miller   Luis  Marenco   Rixin  Wang   David  Van  Essen,  Washington  University   Erin  Reid   Paul  Sternberg,  Cal  Tech   Arun  Rangarajan   Hans  Michael  Muller   Yuling  Li   Giorgio  Ascoli,  George  Mason  University   Sridevi  Polavarum   Fahim  Imam   Larry  Lui   Andrea  Arnaud  Stagg   Jonathan  Cachat   Jennifer  Lawrence   Svetlana  Sulima   Davis  Banks   Vadim  Astakhov   Xufei  Qian   Chris  Condit   Mark  Ellisman   Stephen  Larson   Willie  Wong   Tim  Clark,  Harvard  University   Paolo  Ciccarese   Karen  Skinner,  NIH,  Program  Officer   (re>red)   Jonathan  Pollock,  NIH,  Program  Officer   And  my  colleagues  in  Monarch,  dkNet,  3DVC,  Force  11