• Like
IASSIST 2011 presentation: Tracking Data Reuse  Motivations, Methods, and Obstacles
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

IASSIST 2011 presentation: Tracking Data Reuse Motivations, Methods, and Obstacles



Published in Education , Technology , Sports
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On SlideShare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. Tracking Data ReuseMotivations, Methods, and Obstacles Heather  Piwowar DataONE  postdoc  with  NESCent  and  Dryad @researchremix   IASSIST2011  #iassist
  • 2. http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
  • 3. http://www.flickr.com/photos/jsmjr/62443357/
  • 4. http://www.flickr.com/photos/camilleharrington/3587294608/
  • 5. http://www.flickr.com/photos/rkuhnau/3318245976/
  • 6. http://www.flickr.com/photos/conformpdx/1796399674/
  • 7. http://www.flickr.com/photos/rkuhnau/3317418699/
  • 8. http://www.flickr.com/photos/zemlinki/261617721/
  • 9. http://www.flickr.com/photos/tracenmatt/3020786491/
  • 10. http://www.flickr.com/photos/the-o/2078239333/
  • 11. ? http://www.flickr.com/photos/ryanr/142455033/
  • 12. http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
  • 13. http://www.flickr.com/photos/archeon/2941655917/
  • 14. In 2009, 116 articles cited ORNL DAAC data.Finding these articles took 70-80 hoursacross at least 12 resourcesall chosen from a deep understandingof this specific research domain then the full text of all the hits were manually reviewed Valerie Enriquez interview with James Kidder http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data
  • 15. How  to  iden9fy  Dataset  Reuse  in  the  published  literature This  cita2on  paCern  (dataset   DOI/ID  in  references  sec2on)  is   used  almost  exclusively  for   dataset  has  an  iden2fier? with  dataset  unique  ID search  in  reference   dataset  reuse.     (DOI,  url,  accession  #) sec2ons    of  all  papers Manual  disambigua2on  not   required:    can  be  automated   IDs  are  difficult  to   DOI/ID  reference  search  possible  in  full-­‐text  portals  like   pending  API  support. unambiguously  iden2fy  in   PubMed  Central  and  HighWire  Press,  however  portal   full  text    unless  they  have  a   coverage  is  limited  and  search  is  not  restricted  to   Does  not  require  access  to   unique  paCern  (DOI)  or   references  sec2on. full-­‐text unusual  prefix  or  suffix. with  dataset  unique   DOI/ID  search  works  in  Google  Scholar,  but  scope  is   This  cita2on  paCern  is  currently   ID poorly  defined,  results  are  messy. rare This  cita2on  paCern  is  difficult   DOI/ID  search  not  supported  by  ISI  Web  of  Science  or   to  track  with  exis2ng  tool   Scopus limita2ons with  (submi-er  surname  AND   repository  name),  publicly   dataset  submission  record  has   and  also This  cita2on  paCern  archived   submiCer  name  or  dataset   (dataset  9tle  AND search  in  full  text  of  all   sort  hits  to  disambiguate   (accession  numbers  in  full  dataset 2tle?  repository  name) papers reuse  from  submission text)  is  very  common  in   some  subdisciplines,  so   Names  and  2tles  are  messy   Disambigua2on  is  2me   probably  finds  most   Requires  ability  to  query   iden2fiers consuming reuses. full  text  across  all   literature  that  may   Requires  access  to  full  text  of   with  (first  author  surname   contain  reuse search  hits  for  sor2ng AND  repository  name) sort  hits  to  disambiguate   dataset  submission  record  men2ons   gather  papers  that  cite  the  data   This  cita2on  paCern   with  data   reuse  from  other   data  collec2on  ar2cle  publica2on? collec2on  paper (cita2on  to  data  crea2on   collec2on  ar2cle’s   cita2on  contexts paper)  is  very  common  in   journal,  volume,   Disambigua2on  is  2me   some  subdisciplines,  so   page,  etc. Cita2on  history  export  is  2me   probably  finds  most  reuses. Link  to  data  collec2on  paper  oVen   consuming:  most  cita2ons  are   consuming:    automa2on  not   missing  from  dataset  submission  record,   not  in  the  context  of  reuse supported. especially  when  dataset  submission   predates  ar2cle  publica2on. Only  finds  cita2ons  indexed  by   Requires  access  to  full  text  of   cita2on  databases search  hits  for  sor2ng This  flow  s2ll  misses  aCribu2ons  embedded  in  supplementary  informa2on,  reuses   aCributed  through  a  query  descrip2on,  etc. Heather  Piwowar,  v1.0,  CC-­‐BY
  • 16. 10 * 100 = 1000
  • 17. publication-based datasets
  • 18. deposited in 2005
  • 19. 1. following citations to thepaper that describes the data collection, then filtering.
  • 20. 2. searching for accessionnumbers, urls, and DOIs in full text
  • 21. http://api.plos.org/2011/05/31/announcing_the_plos_search_api/
  • 22. 2005 long time agobiomedicine familiar, also verydominantsearch interfaces not well designedfor this taskhelpdesks are very helpful
  • 23. stay tuned for resultsposter at ASIS&T, SIGUSE
  • 24. I post my data, code, and statistical scripts:http://researchremix.orgShare yours too!-> Open Notebook Science http://www.flickr.com/photos/myklroventine/892446624/
  • 25. https://notebooks.dataone.org/tracking1000datasets/
  • 26. thank youTodd Vision, Estephanie Sta Maria Jonathan Carlson Dryad and DataONE teamsThe open science online community and those who release their articles, datasets and photos openly