Data CitationChallenges and Opportunities    from the perspective of      Tracking Data Reuse                Heather	  Piw...
http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
http://www.flickr.com/photos/jsmjr/62443357/
http://www.flickr.com/photos/camilleharrington/3587294608/
http://www.flickr.com/photos/rkuhnau/3318245976/
http://www.flickr.com/photos/conformpdx/1796399674/
http://www.flickr.com/photos/rkuhnau/3317418699/
http://www.flickr.com/photos/zemlinki/261617721/
http://www.flickr.com/photos/tracenmatt/3020786491/
http://www.flickr.com/photos/the-o/2078239333/
?    http://www.flickr.com/photos/ryanr/142455033/
http://www.flickr.com/photos/archeon/2941655917/
http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
http://www.flickr.com/photos/jima/606588905/
http://www.flickr.com/photos/lofaesofa/248546821/
We have observed reuse of at 35% of GEO datasets submitted in 2005.
Piwowar, Vision, Whitlock (2011)  Data archiving is a good investment.  Nature 473, 285http://researchremix.wordpress.com/...
Tracking 1k
10 * 100 = 1000
!"#$"%&()*++,*&*#"-."*/#01-2(%.   $!!"#                                         ,!"#                                      ...
!&#$!!                                                                                                           456!!"#$%...
Piwowar, Carlson, Vision (2011)  Beginning to track 1000 datasets from  public repositories into the published  literature...
My research blog:ResearchRemix.wordpress.com               http://www.flickr.com/photos/myklroventine/892446624/
Data citation in the wild IDCC 2010 poster.
A best-practice   solution!
http://www.flickr.com/photos/nilsrinaldi/5157809483/
#1Lack of tool supportfor our best practice
Research Remix blog: http://bit.ly/aOwLoJ“Tracking Dataset Citations Using Common Citation Tracking Tools Doesn’t Work”
Research Remix blog: http://bit.ly/aOwLoJ“Tracking Dataset Citations Using Common Citation Tracking Tools Doesn’t Work”
We need more diversityWe need more players We need start-ups
Abstracts are open.Ref lists should be too.
#2Our best practice doesn’t  scale to mega-reuse
!"#$%&()*+,+-%,-&%)%&%./%*$0+&1/2%-            ,3+,&%"-%*456*+,+7/"#"2+18%$!!"# ,!"# +!"# *!"# )!"# (!"# !"# &!"# %!"# $!"...
!"#$%&()*+,+-%,-&%)%&%./%*$0+&1/2%-            ,3+,&%"-%*456*+,+7/"#"2+18%$!!"# ,!"# +!"# *!"# )!"# (!"# !"# &!"# %!"# $!"...
!"#$%&()*+,+-%,-&%)%&%./%*$0+&1/2%-            ,3+,&%"-%*456*+,+7/"#"2+18%$!!"# ,!"# +!"# *!"# )!"# (!"# !"# &!"# %!"# $!"...
http://www.nature.com/nature/authors/gta/
But wait!
#2Our best practice doesn’t            can  scale to mega-reuse     (if we work at it)
Another place where having a few   big players is a bottleneck      Open reference lists.
#3aAdoption of best practices erode incentives in the       short term
~70%in multivariateanalysis
#3bData citations only matter    if they are valued
Please donʼt tweet or publicize this next bit...Early results from an ongoing survey.n=538
9:/54351#,*;5+3#<4-#=82/1#,-#>0#?,+@#>,+5#5432/0#(!"#!"#&!"#%!"#$!"# !"#       )*+,-./0#     6#       6#   758*+4/#    6# ...
9:/54351#,*;5+3#<4-#=82/1#,-#>0#?,+@#>,+5#5432/0#                                     9#:/54351#2*#;2//#<5#=4/851##(!"#   ...
Top-down
http://www.nsf.gov/pubs/policydocs/pappguide/nsf08_1/gpg_2.jsp
Bottom-up
Text       DataCite!
http://www.flickr.com/photos/ginable/325235488/
http://www.flickr.com/photos/ryanr/142455033/
http://www.flickr.com/photos/supersam5/216868485/
thank youTodd Vision,  Jonathan Carlson, Estephanie Sta Maria,  Nicholas Weber, Sarah Judson,Valerie Enriquez  Jason Priem...
No consistent practice
We reviewed 500 articles in six major evolution and ecologyjournals for evidence of data citation:                        ...
We reviewed 500 articles in six major evolution and ecologyjournals for evidence of data citation:                        ...
In 2009, 116 articles cited ORNL DAAC data.Finding these articles took 70-80 hoursacross at least 12 resourcesall chosen f...
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Data Citation from the perspective of tracking data reuse
Upcoming SlideShare
Loading in …5
×

Data Citation from the perspective of tracking data reuse

2,505 views
2,444 views

Published on

Presentation by Heather Piwowar at DataCite 2011 Summer Meeting, Aug 24 2011.

Published in: Education, Technology, Sports

Data Citation from the perspective of tracking data reuse

  1. 1. Data CitationChallenges and Opportunities from the perspective of Tracking Data Reuse Heather  Piwowar DataONE  postdoc  with  NESCent  and  Dryad @researchremix   DataCite  Summer  Meeting August  2011
  2. 2. http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm
  3. 3. http://www.flickr.com/photos/jsmjr/62443357/
  4. 4. http://www.flickr.com/photos/camilleharrington/3587294608/
  5. 5. http://www.flickr.com/photos/rkuhnau/3318245976/
  6. 6. http://www.flickr.com/photos/conformpdx/1796399674/
  7. 7. http://www.flickr.com/photos/rkuhnau/3317418699/
  8. 8. http://www.flickr.com/photos/zemlinki/261617721/
  9. 9. http://www.flickr.com/photos/tracenmatt/3020786491/
  10. 10. http://www.flickr.com/photos/the-o/2078239333/
  11. 11. ? http://www.flickr.com/photos/ryanr/142455033/
  12. 12. http://www.flickr.com/photos/archeon/2941655917/
  13. 13. http://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Gamma_distribution_pdf.svg/500px-Gamma_distribution_pdf.svg.png
  14. 14. http://www.flickr.com/photos/jima/606588905/
  15. 15. http://www.flickr.com/photos/lofaesofa/248546821/
  16. 16. We have observed reuse of at 35% of GEO datasets submitted in 2005.
  17. 17. Piwowar, Vision, Whitlock (2011) Data archiving is a good investment. Nature 473, 285http://researchremix.wordpress.com/2011/05/19/nature-letter/
  18. 18. Tracking 1k
  19. 19. 10 * 100 = 1000
  20. 20. !"#$"%&()*++,*&*#"-."*/#01-2(%. $!!"# ,!"# +!"# *!"# 1?6@AB:C2#@2#64D4642E4# 948:;1<4=># )!"# (!"# 1?6@AB:C2#@2#DCC<2C<4# !"# 1?6@AB:C2#@2#<1AF4# &!"# 1?6@AB:C2#@2#<4G<# %!"# $!"# !"# -./# 0123141# 56447184#
  21. 21. !&#$!! 456!!"#$%&()*+,+&%"-%-.,-./01,23! 70890,0! !&$$!! :;,,<0-,! !%#$!! =>?@89!0?,;,09,! A4563! =>?@89!0?,;09,! !%$$!! A70890,03! =>?@89!0?,;09,! A:;,,<0-,3! !#$!! !"!!!! &$$! &$$#! &$$(! &$$)! &$$*! &$$+! &$%$! &$%%!
  22. 22. Piwowar, Carlson, Vision (2011) Beginning to track 1000 datasets from public repositories into the published literature. ASIS&T poster.https://notebooks.dataone.org/tracking1000datasets/
  23. 23. My research blog:ResearchRemix.wordpress.com http://www.flickr.com/photos/myklroventine/892446624/
  24. 24. Data citation in the wild IDCC 2010 poster.
  25. 25. A best-practice solution!
  26. 26. http://www.flickr.com/photos/nilsrinaldi/5157809483/
  27. 27. #1Lack of tool supportfor our best practice
  28. 28. Research Remix blog: http://bit.ly/aOwLoJ“Tracking Dataset Citations Using Common Citation Tracking Tools Doesn’t Work”
  29. 29. Research Remix blog: http://bit.ly/aOwLoJ“Tracking Dataset Citations Using Common Citation Tracking Tools Doesn’t Work”
  30. 30. We need more diversityWe need more players We need start-ups
  31. 31. Abstracts are open.Ref lists should be too.
  32. 32. #2Our best practice doesn’t scale to mega-reuse
  33. 33. !"#$%&()*+,+-%,-&%)%&%./%*$0+&1/2%- ,3+,&%"-%*456*+,+7/"#"2+18%$!!"# ,!"# +!"# *!"# )!"# (!"# !"# &!"# %!"# $!"# !"# !# (# $!# $(# %!# %(# &!#
  34. 34. !"#$%&()*+,+-%,-&%)%&%./%*$0+&1/2%- ,3+,&%"-%*456*+,+7/"#"2+18%$!!"# ,!"# +!"# *!"# )!"# (!"# !"# &!"# %!"# $!"# !"# !# (# $!# $(# %!# %(# &!#
  35. 35. !"#$%&()*+,+-%,-&%)%&%./%*$0+&1/2%- ,3+,&%"-%*456*+,+7/"#"2+18%$!!"# ,!"# +!"# *!"# )!"# (!"# !"# &!"# %!"# $!"# !"# !# (# $!# $(# %!# %(# &!#
  36. 36. http://www.nature.com/nature/authors/gta/
  37. 37. But wait!
  38. 38. #2Our best practice doesn’t can scale to mega-reuse (if we work at it)
  39. 39. Another place where having a few big players is a bottleneck Open reference lists.
  40. 40. #3aAdoption of best practices erode incentives in the short term
  41. 41. ~70%in multivariateanalysis
  42. 42. #3bData citations only matter if they are valued
  43. 43. Please donʼt tweet or publicize this next bit...Early results from an ongoing survey.n=538
  44. 44. 9:/54351#,*;5+3#<4-#=82/1#,-#>0#?,+@#>,+5#5432/0#(!"#!"#&!"#%!"#$!"# !"# )*+,-./0# 6# 6# 758*+4/# 6# 6# )*+,-./0# 1234.+55# 4.+55# 9:/54351#;#<2//#.5*#=,+5#>2*4?,-3##(!"#!"#&!"#%!"#$!"# !"# )*+,-./0# 6# 6# 758*+4/# 6# 6# )*+,-./0# 1234.+55# 4.+55# Do not publicize
  45. 45. 9:/54351#,*;5+3#<4-#=82/1#,-#>0#?,+@#>,+5#5432/0# 9#:/54351#2*#;2//#<5#=4/851##(!"# <0#>0#?8-15+# (!"#!"# !"#&!"# &!"#%!"# %!"#$!"# $!"# !"# !"# )*+,-./0# 6# 6# 758*+4/# 6# 6# )*+,-./0# )*+,-./0# 6# 6# 758*+4/# 6# 6# )*+,-./0# 1234.+55# 4.+55# 1234.+55# 4.+55# 9:/54351#;#<2//#.5*#=,+5#>2*4?,-3## 9#:/54351#2*#;2//#<5#=4/851##(!"# <0#>0#:+,>,?,-#,+#*5-8+5#@,>>2A55# (!"#!"# !"#&!"# &!"#%!"# %!"#$!"# $!"# !"# !"# )*+,-./0# 6# 6# 758*+4/# 6# 6# )*+,-./0# )*+,-./0# 6# 6# 758*+4/# 6# 6# )*+,-./0# 1234.+55# 4.+55# 1234.+55# 4.+55# Do not publicize
  46. 46. Top-down
  47. 47. http://www.nsf.gov/pubs/policydocs/pappguide/nsf08_1/gpg_2.jsp
  48. 48. Bottom-up
  49. 49. Text DataCite!
  50. 50. http://www.flickr.com/photos/ginable/325235488/
  51. 51. http://www.flickr.com/photos/ryanr/142455033/
  52. 52. http://www.flickr.com/photos/supersam5/216868485/
  53. 53. thank youTodd Vision, Jonathan Carlson, Estephanie Sta Maria, Nicholas Weber, Sarah Judson,Valerie Enriquez Jason Priem and Beyond Impact Dryad and DataONE teamsThe open science online community and those who release their articles, datasets and photos openly blog: ResearchRemix.wordpress.com
  54. 54. No consistent practice
  55. 55. We reviewed 500 articles in six major evolution and ecologyjournals for evidence of data citation: Sarah Judson, Data citation in the wild IDCC 2010 poster.
  56. 56. We reviewed 500 articles in six major evolution and ecologyjournals for evidence of data citation: Sarah Judson, Data citation in the wild IDCC 2010 poster.
  57. 57. In 2009, 116 articles cited ORNL DAAC data.Finding these articles took 70-80 hoursacross at least 12 resourcesall chosen from a deep understandingof this specific research domain then the full text of all the hits were manually reviewed Valerie Enriquez interview with James Kidder http://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data

×