Data Stewardship for Researchers, SPATIAL course

516 views

Published on

Presentation on data management et cetera for the SPATIAL short course (https://itce.utah.edu/spatial.html) at University of Utah. 20 June 2013.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
516
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
7
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Data Stewardship for Researchers, SPATIAL course

  1. 1. Data  Stewardship  for  Researchers  Carly  Strasser,  PhD  California  Digital  Library  @carlystrasser  carly.strasser@ucop.edu  SPATIAL  2013  From  Calisphere,    Couretsy  of    UC  Riverside,  California  Museum  of  Photography  Tips,  Tools,  &  Guidance    From  Calisphere,    Courtesy  of  Thousand  Oaks  Library      
  2. 2. Roadmap  4.  Toolbox    1.  Background    2.  Why  you  should  care  3.  Best  practices  
  3. 3. Is  data  management  being  taught?  Do  attitudes  about  sharing  differ  among  disciplines?  What  role  can  libraries  play  in  data  education?  How  can  we  promote  storing  data  in  repositories?  What  barriers  to  sharing  can  we  eliminate?  Why  don’t  people  share  data?  
  4. 4. Why  is  data  management      a  hot  topic?  From  Flickr  by  Velo  Steve  
  5. 5. Back in the day…Da  Vinci  Curie  Newton  classicalschool.blogspot.com  Darwin  
  6. 6. Digital  data  From  Flickr  by  Flickmor  From  Flickr  by  US  Army  Environmental  Command  From  Flickr  by    DW0825  C.  Strasser  Courtesey  of  WHOI  From  Flickr  by    deltaMike  
  7. 7. Digital  data  +    Complex  workflows  
  8. 8. From  Flickr  by  ~Minnea~  Data  management  Documentation  Reproducibility  
  9. 9. From  Flickr  by  iowa_spirit_walker  •  Cost  •  Confusion  about  standards  •  Lack  of  training  •  Fear  of  lost  rights  or  benefits  •  No  incentives  
  10. 10. THETRUTHFrom  sandierpastures.com  Data  management  Metadata  Data  repositories  Data  sharing  YOU NEEDTO KNOWABOUT
  11. 11. From  Flickr  by  johntrainor  Why  you  should  care  
  12. 12. From  Flickr  by  hyperion327  From  Flickr  by  Redden-­‐McAllister  Because  they  care:  
  13. 13. Because  they  care:  All  data  must  be  in  a  public  archive.  You  can’t  hoard  it.  If  it’s  not  available  you  can’t  cite  it.  Include  a  data  section  with  how  to  find  datasets.  
  14. 14. …  “Federal  agencies  investing  in  research  and  development  (more  than  $100  million  in  annual  expenditures)  must  have  clear  and  coordinated  policies  for  increasing  public  access  to  research  products.”  Four  months  ago…  
  15. 15. 1.  Maximize  free  public  access  2.  Ensure  researchers  create  data  management  plans  3.  Allow  costs  for  data  preservation  and  access  in  proposal  budgets  4.  Ensure  evaluation  of  data  management  plan  merits  5.  Ensure  researchers  comply  with  their  data  management  plans  6.  Promote  data  deposition  into  public  repositories  7.  Develop  approaches  for  identification  and  attribution  of  datasets  8.  Educate  folks  about  data  stewardship  From  Flickr  by  Joe  Crimmings  Photography  
  16. 16. From  Flickr  by  twm1340  Culture  Shift  Ahead  
  17. 17. science  source  notebook  content  access  data  government  knowledge  From  Flickr  by  cdsessums  
  18. 18. flowingdata.comMap  of  Scientific  Collaborations  
  19. 19. From  Flickr  by  ~shorts  and  longs  Publications  &    Their  Citation     &  data  availability  
  20. 20. Data  are  being  recognized  as  first  class  products  of  research  From  Flickr  by  Richard  Moross  
  21. 21. Data  management  plans  Data  sharing  mandates  Data  publications  Data  citation  From  Flickr  by  torkildr  
  22. 22. Data  publications  Data  citation  Data  management  plans  Data  sharing  mandates  
  23. 23. What  should  you  be  doing?  From  Flickr  by  whatthefeed  
  24. 24. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data SheetWash Cresc Lake Peters lab Dont use - old dataAlgal Washed RocksDec. 16Tray 004SD for delta13C = 0.07 SD for delta15N = 0.15Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 cB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 cB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 cB5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 cC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 2539823.78 1.17Reference statistics:Sampling Site / Identifier:Sample Type:Date:Tray ID and Sequence:From  Stephanie  Hampton  (2010)      ESA  Workshop  on  Best  Practices  2  tables   Random  notes  From  Stephanie  Hampton  
  25. 25. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data SheetWash Cresc Lake Peters lab Dont use - old dataAlgal Washed RocksDec. 16Tray 004SD for delta13C = 0.07 SD for delta15N = 0.15Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 cB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 cB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 cB5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 cC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 2539823.78 1.17Reference statistics:Sampling Site / Identifier:Sample Type:Date:Tray ID and Sequence:From  Stephanie  Hampton  (2010)      ESA  Workshop  on  Best  Practices  Wash  Cres  Lake  Dec  15  Dont_Use.xls  From  Stephanie  Hampton  
  26. 26. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data SheetWash Cresc Lake Peters lab Dont use - old dataAlgal Washed RocksDec. 16Tray 004SD for delta13C = 0.07 SD for delta15N = 0.15Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUTB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression StatisticsB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVAC1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance FC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.63427823.78 1.17 Total 10 35.55962CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569Reference statistics:Sampling Site / Identifier:Sample Type:Date:Tray ID and Sequence:Random  stats  output  From  Stephanie  Hampton  
  27. 27. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1Stable Isotope Data SheetWash Cresc Lake Peters lab Dont use - old dataAlgal Washed RocksDec. 16Tray 004SD for delta13C = 0.07 SD for delta15N = 0.15Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No.A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg ConA5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 cA8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUTB2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression StatisticsB4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVAC1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance FC2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.63427823.78 1.17 Total 10 35.55962CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0%Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569Reference statistics:Sampling Site / Identifier:Sample Type:Date:Tray ID and Sequence:SampleID ALG03 ALG05 ALG07 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07Weight (mg) 2.91 2.91 3.04 2.95 3.01 3 2.99 2.92 2.9%C 6.85 35.56 33.49 41.17 43.74 4.51 1.59 4.37 33.58delta 13C -21.11 -28.05 -29.56 -27.32 -27.50 -22.68 -24.58 -21.06 -29.44delta 13C_ca -20.65 -27.59 -29.10 -26.86 -27.04 -22.22 -24.12 -20.60 -28.98%N 0.48 2.30 1.68 1.97 1.36 0.34 0.15 0.34 1.74delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62delta 15N_ca -1.62 -0.06 0.14 2.06 0.34 3.66 -2.34 -2.17 -0.03-3.00-2.00-1.000.001.002.003.004.00-35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00Series1From  Stephanie  Hampton  
  28. 28. What  should  you  be  doing?  From  Flickr  by  whatthefeed  
  29. 29. data managementFrom  Flickr  by  Big  Swede  Guy  1.  Planning  2.  Data  collection  &  organization  3.  Quality  control  &  assurance  4.  Metadata  5.  Workflows  6.  Data  stewardship  &  reuse  Best  Practices  
  30. 30. Create  unique  identifiers  •  Decide  on  naming  scheme  early  •  Create  a  key  •  Different  for  each  sample  2.  Data  collection  &  organization  From  Flickr  by  sjbresnahan  From  Flickr  by  zebbie  
  31. 31. Standardize  •  Consistent  within  columns  – only  numbers,  dates,  or  text  •  Consistent  names,  codes,  formats  Modified  from  K.  Vanderbilt    From  Pink  Floyd,  The  Wall      themurkyfringe.com  2.  Data  collection  &  organization  
  32. 32. Google  Docs  Forms  Standardize  •  Reduce  possibility  of  manual  error  by  constraining  entry  choices  Modified  from  K.  Vanderbilt    2.  Data  collection  &  organization  Excel  lists  Data  validataion  
  33. 33. 2.  Data  collection  &  organization      Create  parameter  table  Create  a  site  table  From  doi:10.3334/ORNLDAAC/777  From  doi:10.3334/ORNLDAAC/777  From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  34. 34.  Use  descriptive  file  names  •  Unique  •  Reflect  contents  From  R  Cook,  ESA  Best  Practices  Workshop  2010  Bad:    Mydata.xls      2001_data.csv      best  version.txt  Better:  Eaffinis_nanaimo_2010_counts.xls  Site  name  Year  What  was  measured    Study  organism  2.  Data  collection  &  organization  *Not  for  everyone  *  
  35. 35. Organize  files    logically  Biodiversity  Lake  Experiments  Field  work  Grassland  Biodiv_H20_heatExp_2005to2008.csv  Biodiv_H20_predatorExp_2001to2003.csv  …  Biodiv_H20_PlanktonCount_2001toActive.csv  Biodiv_H20_ChlAprofiles_2003.csv  …    From  S.  Hampton  2.  Data  collection  &  organization  
  36. 36.  Preserve  information  •  Keep  raw  data  raw  •  Use  scripts  to  process  data      &  save  them  with  data  Raw  data  as  .csv  R  script  for  processing  &  analysis  2.  Data  collection  &  organization  
  37. 37. data managementFrom  Flickr  by  Big  Swede  Guy  1.  Planning  2.  Data  collection  &  organization  3.  Quality  control  &  assurance  4.  Metadata  5.  Workflows  6.  Data  stewardship  &  reuse  Best  Practices  
  38. 38. Before  data  collection  •  Define  &  enforce  standards  •  Assign  responsibility  for  data  quality  3.  Quality  control  and  quality  assurance  From  Flickr  by  StacieBee  
  39. 39. After  data  entry  •  Check  for  missing,  impossible,  anomalous  values  •  Perform  statistical  summaries    •  Look  for  outliers    3.  Quality  control  and  quality  assurance  0  10  20  30  40  50  60  0   10   20   30   40  
  40. 40. data managementFrom  Flickr  by  Big  Swede  Guy  1.  Planning  2.  Data  collection  &  organization  3.  Quality  control  &  assurance  4.  Metadata  5.  Workflows  6.  Data  stewardship  &  reuse  Best  Practices  
  41. 41. 4.  Metadata  basics   Why  are  you  promoting  Excel?  What  is  metadata?  
  42. 42. •  Digital  context  •  Name  of  the  data  set  •  The  name(s)  of  the  data  file(s)  in  the  data  set  •  Date  the  data  set  was  last  modified  •  Example  data  file  records  for  each  data  type  file  •  Pertinent  companion  files  •  List  of  related  or  ancillary  data  sets  •  Software  (including  version  number)  used  to  prepare/read    the  data  set  •  Data  processing  that  was  performed  •  Personnel  &  stakeholders  •  Who  collected    •  Who  to  contact  with  questions  •  Funders  •  Scientific  context  •  Scientific  reason  why  the  data  were  collected  •  What  data  were  collected  •  What  instruments  (including  model  &  serial  number)  were  used  •  Environmental  conditions  during  collection  •  Where  collected  &  spatial  resolution  When  collected  &  temporal  resolution  •  Standards  or  calibrations  used  •  Information  about  parameters  •  How  each  was  measured  or  produced  •  Units  of  measure  •  Format  used  in  the  data  set  •  Precision  &  accuracy  if  known  •  Information  about  data  •  Definitions  of  codes  used  •  Quality  assurance  &  control  measures  •  Known  problems  that  limit  data  use  (e.g.  uncertainty,  sampling  problems)    •  How  to  cite  the  data  set  4.  Metadata  basics  
  43. 43. •  Provides  structure  to  describe  data  Common  terms    |    definitions    |    language    |    structure  4.  Metadata  basics  •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…  •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)        What  is  metadata?  Select  the  appropriate  standard  
  44. 44. data managementFrom  Flickr  by  Big  Swede  Guy  1.  Planning  2.  Data  collection  &  organization  3.  Quality  control  &  assurance  4.  Metadata  5.  Workflows  6.  Data  stewardship  &  reuse  Best  Practices  
  45. 45. Temperature  data  Salinity                data  Data  import  into  R  Analysis:  mean,  SD  Graph  production  Quality  control  &  data  cleaning  “Clean”  T  &  S  data  Summary  statistics  Data  in  R  format  5.  Workflows  Workflow:  how  you  get  from  the  raw  data  to  the  final  products  of  your  research    Simple  workflows:  flow  charts  
  46. 46. •  R,  SAS,  MATLAB  •  Well-­‐documented  code  is…  Easier  to  review  Easier  to  share  Easier  to  repeat  analysis  5.  Workflows  Workflow:  how  you  get  from  the  raw  data  to  the  final  products  of  your  research    Simple  workflows:  commented  scripts  #  %  $  &  
  47. 47. Fancy  Schmancy  workflows:  Kepler  Resulting  output  5.  Workflows  https://kepler-­‐project.org  
  48. 48. Workflows  enable…    Reproducibility    can  someone  independently  validate  findings?  Transparency      others  can  understand  how  you  arrived  at  your  results  Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    5.  Workflows  From  Flickr  by  merlinprincesse  Coming  Soon:  workflow  sharing  requirements!  
  49. 49. data managementFrom  Flickr  by  Big  Swede  Guy  1.  Planning  2.  Data  collection  &  organization  3.  Quality  control  &  assurance  4.  Metadata  5.  Workflows  6. Data  stewardship  &  reuse  Best  Practices  
  50. 50. Use  stable  formats      csv,  txt,  tiff  Create  back-­‐up  copies    original,  near,  far  Periodically  test  ability  to  restore  information  6.  Data  stewardship  &  reuse  Modified from R. Cook  
  51. 51. Store  your  data  in  a  repository  Institutional  archive  Discipline/specialty  archive        6.  Data  stewardship  &  reuse  From  Flickr  by  torkildr  Ask  a  librarian  Repos  of  repos:  databib.org  re3data.org  
  52. 52. Allows  readers  to  find  data  products  Get  credit  for  data  and  publications  Promotes  reproducibility  Better  measure  of  research  impact  Example:  Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological  diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from  characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20   Persistent  Unique  Identifier  6.  Data  stewardship  &  reuse  Practice  Data  Citation  
  53. 53. data managementFrom  Flickr  by  Big  Swede  Guy  1.  Planning  2.  Data  collection  &  organization  3.  Quality  control  &  assurance  4.  Metadata  5.  Workflows  6.  Data  stewardship  &  reuse  Best  Practices  
  54. 54. A  document  that  describes  what  you  will  do  with  your  data  throughout    the  research  project  From Flickr by Barbies LandWhat  is  a  data  management  plan?  
  55. 55. DMP  for  funders:  A  short  plan  submitted  alongside  grant  applications  But they all havedifferent requirementsand express them indifferent waysFrom  Flickr  by  401(K)  2013    An  outline  of    –  what  will  be  collected  –  methods  –  Standards  –  Metadata  –  sharing/access  –  long-­‐term  storage    Includes  how  and  why  
  56. 56.  DMP  supplement  may  include:  1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum  materials,  and  other  materials  to  be  produced  in  the  course  of  the  project  2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where  existing  standards  are  absent  or  deemed  inadequate,  this  should  be  documented  along  with  any  proposed  solutions  or  remedies)  3.   policies  for  access  and  sharing  including  provisions  for  appropriate  protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other  rights  or  requirements  4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of  derivatives  5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for  preservation  of  access  to  them  NSF  DMP  Requirements  From  Grant  Proposal  Guidelines:  
  57. 57. •  Types  of  data  •  Existing  data  •  How/when/where  created?  •  How  processed?  •  Quality  control    •  Security  •  Who  is  responsible    1.  Types  of  data  &  other  information  biology.kenyon.edu  C.  Strasser  From  Flickr  by  Lazurite  
  58. 58. Wired.com  •  Metadata  needed  •  How  captured    •  Standards  2.  Data  &  metadata  standards  
  59. 59. •  Obligation  to  share    •  How/when/where  available  •  Getting  access    •  Copyright  /  IP  •  Permission  restrictions  •  Embargo  periods    •  Ethics/privacy    •  How  cited  3.  Policies  for  access  &  sharing  4.  Policies  for  re-­‐use  &  re-­‐distribution  From  Flickr  by  maryfrancesmain  
  60. 60. •  What  &  where    •  Metadata  •  Who’s  responsible  5.  Plans  for  archiving  &  preservation  From  Flickr  by  theManWhoSurfedTooMuch  
  61. 61. Don’t  forget  the  budget  dorrvs.com  
  62. 62. NSF’s  Vision*  DMPs  and  their  evaluation  will  grow  &  change  over  time    Peer  review  will  determine  next  steps  Community-­‐driven  guidelines    Evaluation  will  vary  with  directorate,  division,  &  program  officer    *Unofficially  
  63. 63. From  Flickr  by  celikins  Where  to  start?  
  64. 64. From  Flickr  by  Andy  Graulund  Make  a  resolution  • Triage  on  current  projects  • Get    advisor,  lab  mates,  collaborators  on  board  • Do  better  next  time  
  65. 65. Start  working  online  From  Flickr  by  karindalziel  
  66. 66. From  Flickr  by  karindalziel  E-­‐notebooks  Online  science      http://datapub.cdlib.org/software-­‐for-­‐reproducibility-­‐part-­‐2-­‐the-­‐tools/  Reproducibility  
  67. 67. From  Flickr  by  dipster1  Toolbox  
  68. 68. Step-by-step wizard for generating DMPcreate | edit | re-use | shareFree & open to communitydmptool.org                    Write  a  DMP  
  69. 69. databib.org  Where  should  I  put  my  data?  Find  a  repository  
  70. 70. Get  help  FromFlickrbythewmatt
  71. 71. Get  help  from  your  library  From  Flickr  by  North  Carolina  Digital  Heritage  Center  From  Flickr  by  Madison  Guy  
  72. 72. NSF  funded  DataNet  Project  Office  of  Cyberinfrastructure  www.dataone.org  Get  help  
  73. 73. B  C  A  
  74. 74. •  Data  Education  Tutorials  •  Database  of  best  practices    &  software  tools  •  Primer  on  data  management  •  Investigator  Toolkit  www.dataone.org  
  75. 75. DCXL  blog:  dcxl.cdlib.org  Toolbox:    
  76. 76. Data  Pub  Blog:  datapub.cdlib.org  
  77. 77. From  Flickr  by  Skakerman  A  word  about  Metrics…  
  78. 78. Articles  are  the  butterfly  pinned  on  the  wall.  Pretty  but  not  very  useful.  They  are  only  the  advertisements  for  scholarship.      –  A.  Levi,  U.  Maryland  College  of  Information  Studies    From  Flickr  by  LisaW123  
  79. 79. How toincentivizegood datastewardship?Data  Citation  Altmetrics  (Alternative  Metrics)  From  Flickr  by  chriscook04  
  80. 80. From  Flickr  by  dotpolka  Doing  science  is  a  privilege  –  not  a  right  
  81. 81.  There  is  a  social  contract  of  science:  we  have  an  obligation  to  ensure  dissemination,  validation,  &  advancement.  To  not  do  so  is  science  malpractice.    Whos  responsible?  Researchers,  publishers,  libraries,  repositories…    –  Brian  Hole,  Ubiquity  Press  at  UCL  From  Flickr  by  mikerosebery  
  82. 82. From  Flickr  by  Michael  Tinkler  
  83. 83. My  website  Email  me  Tweet  me  My  slides  carlystrasser.net  carlystrasser@gmail.com  @carlystrasser    slideshare.net/carlystrasser  

×