Linking data to publications: Towards the execution of papers
Upcoming SlideShare
Loading in...5
×
 

Linking data to publications: Towards the execution of papers

on

  • 1,212 views

Talk for day 2 of the workshop on Developing Data Attribution and Citation Practices and Standards, Berkeley, CA August 220

Talk for day 2 of the workshop on Developing Data Attribution and Citation Practices and Standards, Berkeley, CA August 220

Statistics

Views

Total Views
1,212
Views on SlideShare
1,207
Embed Views
5

Actions

Likes
0
Downloads
23
Comments
0

2 Embeds 5

http://twitter.com 3
http://tweetedtimes.com 2

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs LicenseCC Attribution-NonCommercial-NoDerivs License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n

Linking data to publications: Towards the execution of papers Linking data to publications: Towards the execution of papers Presentation Transcript

  • Linking  data  to  publica0ons:Towards  the  execu0on  of  papers Anita  de  Waard   Elsevier  Labs/UUtrecht h5p://elsatglabs.com/labs/anita  
  • Cycle  of  Scien,fic  Inves,ga,on make observational assertions make interpretational assertions gather data aggregate assertions Observations Interpretations Experimental Domain-speci c Design Model Reasoning Modelperform experiments make predictions design experiments formulate hypotheses CoSI  model  by  Gully  Burns,  ISI/USC 2
  • Cycle  of  Scien,fic  Inves,ga,on make observational assertions make interpretational assertions Processed  Data/Sta0s0cs Conclusions gather data aggregate assertions Observations Interpretations Observed  Results Experimental Domain-speci c Design Model Reasoning Model Background perform experiments make predictionsExperimental  Objects design experiments formulate hypotheses Experimental  Design Hypotheses CoSI  model  by  Gully  Burns,  ISI/USC 2
  • Cycle  of  Scien,fic  Inves,ga,on Publica0on make observational assertions make interpretational assertions Processed  Data/Sta0s0cs Figures Conclusions gather data aggregate assertions Observations Interpretations Observed  Results Results Experimental Domain-speci c Design Model Reasoning Model Background perform experiments make predictionsExperimental  Objects design experiments formulate hypotheses Methods Experimental  Design Hypotheses CoSI  model  by  Gully  Burns,  ISI/USC 2
  • Cycle  of  Scien,fic  Inves,ga,on Publica0on make observational assertions make interpretational assertions Processed  Data/Sta0s0cs Background gather data aggregate assertions Observations Interpretations Observed  Results Hypotheses Experimental Domain-speci c Design Model Methods Reasoning Model Results perform experiments make predictionsExperimental  Objects design experiments Figures formulate hypotheses Experimental  Design Conclusions CoSI  model  by  Gully  Burns,  ISI/USC 2
  • 1.  Current  prac?ce:  store  data  in  repository,  link   from  document,  and  vice  versa Publica0onBackground Workflow  RepositoryHypotheses Experimental  DesignMethods Data  RepositoryResults Observed  ResultsFigures Sta,s,cs  storage  systemConclusions Processed  Data/Sta0s0cs 3
  • Current  Prac,ce:  linking  to  documents Least  favorite:  raw  research  data  delivered  as  supplementary  data Much  beGer:  linking  into/from  data  centres,  e.g.  Pangea:   3
  • Current  Prac,ce:  linking  to  documents Least  favorite:  raw  research  data  delivered  as  supplementary  data Much  beGer:  linking  into/from  data  centres,  e.g.  Pangea:   3
  • Current  Prac,ce:  linking  to  documents Least  favorite:  raw  research  data  delivered  as  supplementary  data Much  beGer:  linking  into/from  data  centres,  e.g.  Pangea:   3
  • Linking  data  and  papers:  ‘the  publisher’s’  posi,on: STM’s  “Brussels  Declara,on”,  June  2006:  “...  believe  that,  as  a  general  principle,  data  sets,  raw  data   outputs  of  research,  and  sets  or  subsets  of  that  data  should   wherever  possible  be  made  freely  accessible  ...”• Publishers  are  (in  general)  not  interested  in  owning  or  charging   for  research  data  repositories    • Publishers  are  very  interested  in  linking  to  and  from  data,  and   want  to  work  with  data  repositories  to  do  this  effec,vely• Publishers  believe  in  (and  know)  the  concept  of  Digital  Object   Iden,fiers:   – Where  possible:  one  repository  for  iden,fiers – Persistent  and  unique  (don’t  keep  same  ID  if  content  changes) – Where  possible,  link  back  to  the  publica,on
  • Linking  data  and  papers:  ‘the  publisher’s’  posi,on: STM’s  “Brussels  Declara,on”,  June  2006:  “...  believe  that,  as  a  general  principle,  data  sets,  raw  data   outputs  of  research,  and  sets  or  subsets  of  that  data  should   wherever  possible  be  made  freely  accessible  ...”• Publishers  are  (in  general)  not  interested  in  owning  or  charging   for  research  data  repositories    • Publishers  are  very  interested  in  linking  to  and  from  data,  and   want  to  work  with  data  repositories  to  do  this  effec,vely• Publishers  believe  in  (and  know)  the  concept  of  Digital  Object   Iden,fiers:    Complete  agreement  with    MacKenzie   Smith’s  “Requirements  for  Data  Cita,on!” – Where  possible:  one  repository  for  iden,fiers – Persistent  and  unique  (don’t  keep  same  ID  if  content  changes) – Where  possible,  link  back  to  the  publica,on
  • 2. Store  data  in  repository,  link  within  document. Publica0on Background Workflow  Repository Hypotheses Experimental  Design Methods Data  Repository Results Observed  Results Figures So]ware  Repository Conclusions Code/Sta0s0cs 6
  • Enabler  at  Elsevier  -­‐  Linked  Data:  access  any   level  of  granularity  of  content 7
  • Enabler  at  Elsevier  -­‐  Linked  Data:  access  any   level  of  granularity  of  content 7
  • Enabler  at  Elsevier  -­‐  Linked  Data:  access  any   level  of  granularity  of  content Dublin Core and SKOS 7
  • Enabler  at  Elsevier  -­‐  Linked  Data:  access  any   level  of  granularity  of  content Dublin Core and SKOS SWAN’s PAV (Provenance, Authoring and Versioning) ontology 7
  • Enabler  at  Elsevier  -­‐  Linked  Data:  access  any   level  of  granularity  of  content1. Where the document region is completely described by an existing ID, use that ID to Dublin Core and SKOSdefine the region.Example: http://api.elsevier.com/content/article/DOI:10.1016/S0030-3992(02)00069-5#p0100 specifies a document region as the element with ID "p0100".2. Where the document region can be completely described by an element within an IDdelement, navigate outwards to an ID that encloses the region, and use a relative Xpath.Example: #xpath-e(id(s0050)/ce:para[4]) specifies a document region as the fourth SWAN’s PAV (Provenance, Authoring and Versioning) ontologyce:para element within an element with ID "s0050".3. Where the document region cannot be completely described by an element within thecontent, use the above locators combined with substrings.Example: #xpath-e(substring(id(p0100),10,20)) specifies a document region as beingcharacters 10–20 in the element with ID "p0100".4. Where the source content does not contain IDs, use absolute Xpaths to navigate tothe appropriate element, and use substrings as required.Example: #xpath-e(article/body/ce:sections/ce:section[4]/ce:para[4]) points to a particularce:para as defined by the given Xpath. An example of an absolute Xpath with substrings isleft as an exercise for the reader. 7
  • Few  (modest)  examples  of  linking  within  document Authors  manually  iden,fy  (and   tag)  en,,es  for  which   associated  data  is  in  databases,   like  GenBank,  Uniprot,  PDB,  etc Or:  automa,c  en,ty   iden,fica,on  and  linking  to   relevant  databases.   4
  • Few  (modest)  examples  of  linking  within  document Authors  manually  iden,fy  (and   tag)  en,,es  for  which   associated  data  is  in  databases,   like  GenBank,  Uniprot,  PDB,  etc Or:  automa,c  en,ty   iden,fica,on  and  linking  to   relevant  databases.   4
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Publica0on Background Workflow  Repository Hypotheses Experimental  Design Methods Data  Repository Results Observed  Results Figures So]ware  Repository Conclusions Code/Sta0s0cs 9
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Workflow  Repository Experimental  Design Data  Repository Observed  Results So]ware  Repository Code/Sta0s0cs 9
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Workflow  Repository Research  Process Data  Repository So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Research  Process Data  Repository So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Experimental  Design Workflow  Repository Hypotheses Research  Process Data  Repository So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Experimental  Design Experimental  Design Research  Process Data  Repository So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Observed  Results Experimental  Design Experimental  Design Research  Process Data  Repository So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Experimental  Design Experimental  Design Observed  Results Research  Process Data  Repository Observed  Results So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Experimental  Design Experimental  Design Code/Sta0s0cs Observed  Results Research  Process Data  Repository Observed  Results So]ware  Repository 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Experimental  Design Experimental  Design Observed  Results Research  Process Data  Repository Code/Sta0s0cs Observed  Results So]ware  Repository Code/Sta0s0cs 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Experimental  Design Experimental  Design Observed  Results Research  Process Data  Repository Code/Sta0s0cs Observed  Results Conclusions So]ware  Repository Code/Sta0s0cs 10
  • 3. The  future  being  made  today:  let’s  execute  the  paper!   Research  Report Background Workflow  Repository Hypotheses Experimental  Design Experimental  Design Observed  Results Research  Process Data  Repository Code/Sta0s0cs Observed  Results Maintain  context:   Conclusions -­‐ Experimental So]ware  Repository -­‐ Narra0ve Code/Sta0s0cs -­‐ Domain 10
  • 3. Even  be5er:  why  move  anything  anywhere??   Research  Report Background Experimental  Design Workflow  Repository Hypotheses Observed  Results Code/Sta0s0cs Research  Process Data  Repository Conclusions So]ware  Repository 11
  • 3. Even  be5er:  why  move  anything  anywhere??   Research  Report Background Experimental  Design Workflow  Repository Hypotheses Observed  Results Experimental  Design Experimental  Design Code/Sta0s0cs Observed  Results Research  Process Data  Repository Code/Sta0s0cs Observed  Results Conclusions So]ware  Repository Code/Sta0s0cs 11
  • 3.Science  in  the  cloud 12
  • 3.Science  in  the  cloudProposal   Advantages  to  the  scien4stStore  research  plan,  results,  thoughts,   Always  keep  track  of  your  own  data!  observa0ons,  etc.  locally/in  the  cloud  in  a   Maintain  copyright  and  access  system  that  adds  metadata.   privileges.  Allow  access  to  the  data,  workflow  etc.  to   Data  is  veXed,  iden0fied,  and  the  data  repository,  who adver0sed.1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   data  repository  controls  access  rights3.    adver0ses  its  existence data  repository  maintains  archiveAllow  access  to  the  collected  thoughts,   Content  veXed,  iden0fied,  and  (with  links  to  data)  to  the  publisher,  who adver0sed..  1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   publisher/library  controls  access  rights3.    adver0ses  its  existence publisher/library  maintains  archiveOthers  -­‐  perhaps  publishers,  perhaps  data   BeXer  so[ware!  repositories,  perhaps  (egad!)  so[ware   BeXer  links  to  everything  else  we  do.developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context. 12
  • 3.Science  in  the  cloudProposal   Advantages  to  the  scien4stStore  research  plan,  results,  thoughts,   Always  keep  track  of  your  own  data!  observa0ons,  etc.  locally/in  the  cloud  in  a   Maintain  copyright  and  access  system  that  adds  metadata.   privileges.  Allow  access  to  the  data,  workflow  etc.  to   Data  is  veXed,  iden0fied,  and  the  data  repository,  who adver0sed.1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   data  repository  controls  access  rights3.    adver0ses  its  existence data  repository  maintains  archiveAllow  access  to  the  collected  thoughts,   Content  veXed,  iden0fied,  and  (with  links  to  data)  to  the  publisher,  who adver0sed..  1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   publisher/library  controls  access  rights3.    adver0ses  its  existence publisher/library  maintains  archiveOthers  -­‐  perhaps  publishers,  perhaps  data   BeXer  so[ware!  repositories,  perhaps  (egad!)  so[ware   BeXer  links  to  everything  else  we  do.developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context. 12
  • 3.Science  in  the  cloudProposal   Advantages  to  the  scien4stStore  research  plan,  results,  thoughts,   Always  keep  track  of  your  own  data!  observa0ons,  etc.  locally/in  the  cloud  in  a   Maintain  copyright  and  access  system  that  adds  metadata.   privileges.  Allow  access  to  the  data,  workflow  etc.  to   Data  is  veXed,  iden0fied,  and  the  data  repository,  who adver0sed.1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   data  repository  controls  access  rights3.    adver0ses  its  existence data  repository  maintains  archiveAllow  access  to  the  collected  thoughts,   Content  veXed,  iden0fied,  and  (with  links  to  data)  to  the  publisher,  who adver0sed..  1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   publisher/library  controls  access  rights3.    adver0ses  its  existence publisher/library  maintains  archiveOthers  -­‐  perhaps  publishers,  perhaps  data   BeXer  so[ware!  repositories,  perhaps  (egad!)  so[ware   BeXer  links  to  everything  else  we  do.developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context. 12
  • 3.Science  in  the  cloudProposal   Advantages  to  the  scien4stStore  research  plan,  results,  thoughts,   Always  keep  track  of  your  own  data!  observa0ons,  etc.  locally/in  the  cloud  in  a   Maintain  copyright  and  access  system  that  adds  metadata.   privileges.  Allow  access  to  the  data,  workflow  etc.  to   Data  is  veXed,  iden0fied,  and  the  data  repository,  who adver0sed.1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   data  repository  controls  access  rights3.    adver0ses  its  existence data  repository  maintains  archiveAllow  access  to  the  collected  thoughts,   Content  veXed,  iden0fied,  and  (with  links  to  data)  to  the  publisher,  who adver0sed..  1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   publisher/library  controls  access  rights3.    adver0ses  its  existence publisher/library  maintains  archiveOthers  -­‐  perhaps  publishers,  perhaps  data   BeXer  so[ware!  repositories,  perhaps  (egad!)  so[ware   BeXer  links  to  everything  else  we  do.developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context. 12
  • 3.Science  in  the  cloudProposal   Advantages  to  the  scien4stStore  research  plan,  results,  thoughts,   Always  keep  track  of  your  own  data!  observa0ons,  etc.  locally/in  the  cloud  in  a   Maintain  copyright  and  access  system  that  adds  metadata.   privileges.  Allow  access  to  the  data,  workflow  etc.  to   Data  is  veXed,  iden0fied,  and  the  data  repository,  who adver0sed.1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   data  repository  controls  access  rights3.    adver0ses  its  existence data  repository  maintains  archiveAllow  access  to  the  collected  thoughts,   Content  veXed,  iden0fied,  and  (with  links  to  data)  to  the  publisher,  who adver0sed..  1.    validates  quality  (content  and  form)   If  scien0st/funding  body  wants:  2.    assigns  a  UID   publisher/library  controls  access  rights3.    adver0ses  its  existence publisher/library  maintains  archiveOthers  -­‐  perhaps  publishers,  perhaps  data   BeXer  so[ware!  repositories,  perhaps  (egad!)  so[ware   BeXer  links  to  everything  else  we  do.developers  -­‐  build  tools,  to  place  thoughts  and  data  into  context. 12
  • Technology  1:  Workflow  tools http://VisTrails.org http://MyExperiment.orghttp://wings.isi.edu/
  • Technology  2:  Executable  Papers
  • Technology  2:  Executable  Papers
  • Technology  2:  Executable  Papers
  • Technology  2:  Executable  Papers
  • Technology  3:  Applica,on  Plahorms
  • Technology  3:  Applica,on  Plahorms
  • Technology  3:  Applica,on  Plahorms
  • Technology  3:  Applica,on  Plahorms
  • In  summary: 16
  • In  summary:• Publishers  are  in  general  not  interes0ng  in  owning  or  charging  for   research  data  repositories  (Brussels  declara0on)• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to   work  with  data  repositories  to  do  this  effec0vely• Publishers  believe  in  Digital  Object  Iden0fiers• Publishers  embrace  open  standards  and  interoperability,  and  are   adap0ng  their  infrastructure  to  be  future-­‐compliant: – In  par0cular,  we  think  scien0sts  should  keep  (track  of)  their  work 16
  • In  summary:• Publishers  are  in  general  not  interes0ng  in  owning  or  charging  for   research  data  repositories  (Brussels  declara0on)• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to   work  with  data  repositories  to  do  this  effec0vely• Publishers  believe  in  Digital  Object  Iden0fiers• Publishers  embrace  open  standards  and  interoperability,  and  are   adap0ng  their  infrastructure  to  be  future-­‐compliant: – In  par0cular,  we  think  scien0sts  should  keep  (track  of)  their  work – We  also  think  novel  informa0on  architectures  work  for  science,   including  Linked  Data,  the  concept  of  app  servers,  and  the  cloud 16
  • In  summary:• Publishers  are  in  general  not  interes0ng  in  owning  or  charging  for   research  data  repositories  (Brussels  declara0on)• Publishers  are  very  interested  in  linking  to  and  from  data,  and  want  to   work  with  data  repositories  to  do  this  effec0vely• Publishers  believe  in  Digital  Object  Iden0fiers• Publishers  embrace  open  standards  and  interoperability,  and  are   adap0ng  their  infrastructure  to  be  future-­‐compliant: – In  par0cular,  we  think  scien0sts  should  keep  (track  of)  their  work – We  also  think  novel  informa0on  architectures  work  for  science,   including  Linked  Data,  the  concept  of  app  servers,  and  the  cloud• Publishers  believe  in  a  future  that  stores  and  shares  science  in  a  beXer   and  more  produc0ve  way,  and  inven0ng  it  together:   FoRCE11:  The  Future  of  Research  Communica0ons  and  eScience 16