Data Stewardship for Scientists, for CLIR Postdoc Workshop
Upcoming SlideShare
Loading in...5
×
 

Data Stewardship for Scientists, for CLIR Postdoc Workshop

on

  • 381 views

Presentation for CLIR/DLF Postdoctoral Fellows on data management for scientists; Bryn Mawr College 31 July 2013.

Presentation for CLIR/DLF Postdoctoral Fellows on data management for scientists; Bryn Mawr College 31 July 2013.

Statistics

Views

Total Views
381
Views on SlideShare
381
Embed Views
0

Actions

Likes
1
Downloads
9
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Data Stewardship for Scientists, for CLIR Postdoc Workshop Data Stewardship for Scientists, for CLIR Postdoc Workshop Presentation Transcript

  • Data  Stewardship   for  Researchers   Carly  Strasser,  PhD   California  Digital  Library   @carlystrasser   carly.strasser@ucop.edu   31  July  2013   CLIR  Symposium   From  Calisphere,    Couretsy  of    UC  Riverside,  California  Museum  of  Photography   Tips,  Tools,  &  Guidance    From  Calisphere,    Courtesy  of  Thousand  Oaks  Library      
  • Roadmap   4.  Toolbox     1.  Background     2.  Why  you  should  care   3.  Best  practices  
  • NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Two  main  goals:   1.  Build  a  network  for  data  repositories   2.  Build  community  around  data   Focus  on     Earth  |  environmental  |  ecological  |  oceanographic     data     View slide
  • Why  don’t  people   share  data?   Is  data  management   being  taught?   Do  attitudes  about   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?   What  barriers  to  sharing   can  we  eliminate?   What  role  can   libraries  play  in   data  education?   View slide
  • Why  is  data   management       a  hot  topic?   From  Flickr  by  Velo  Steve  
  • Back in the day… Da  Vinci   Curie   Newton   classicalschool.blogspot.com   Darwin  
  • Digital  data   From  Flickr  by  Flickmor   From  Flickr  by  US  Army  Environmental  Command   From  Flickr  by    DW0825   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by    deltaMike  
  • Digital  data   +     Complex   workflows  
  • From  Flickr  by  ~Minnea~   Data  management   Documentation   Reproducibility  
  • From  Flickr  by  iowa_spirit_walker   •  Cost   •  Confusion  about   standards   •  Lack  of  training   •  Fear  of  lost  rights  or   benefits   •  No  incentives  
  • THE TRUTH From  sandierpastures.com   Data  management   Metadata   Data  repositories   Data  sharing   RESEARCHERS NEED TO KNOW ABOUT
  • From  Flickr  by  johntrainor   Who  cares?  
  • From  Flickr  by  hyperion327   From  Flickr  by  Redden-­‐McAllister  
  • …  “Federal  agencies  investing  in  research  and   development  (more  than  $100  million  in  annual   expenditures)  must  have  clear  and  coordinated   policies  for  increasing  public  access  to  research   products.”   Back  in   February:    
  • 1.  Maximize  free  public  access   2.  Ensure  researchers  create  data   management  plans   3.  Allow  costs  for  data  preservation  and  access   in  proposal  budgets   4.  Ensure  evaluation  of  data  management   plan  merits   5.  Ensure  researchers  comply  with  their  data   management  plans   6.  Promote  data  deposition  into  public   repositories   7.  Develop  approaches  for  identification  and   attribution  of  datasets   8.  Educate  folks  about  data  stewardship   From  Flickr  by  Joe  Crimmings  Photography  
  • From  Flickr  by  twm1340   Culture   Shift  Ahead  
  • science   source   notebook   content   access   data   government   knowledge   From  Flickr  by  cdsessums  
  • flowingdata.com Map  of  Scientific  Collaborations  
  • From  Flickr  by  ~shorts  and  longs   Publications  &     Their  Citation     &  data   availability  
  • Data  are  being  recognized   as  first  class  products  of   research   From  Flickr  by  Richard  Moross  
  • Data  management  plans   Data  sharing  mandates   Data  publications   Data  citation   From  Flickr  by  torkildr  
  • Data  publications   Data  citation   Data  management  plans   Data  sharing  mandates  
  • What  should   researchers  be  doing?   From  Flickr  by  whatthefeed   NOT V
  • C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   2  tables   Random  notes   From  Stephanie  Hampton  
  • C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   Wash  Cres  Lake  Dec  15  Dont_Use.xls   From  Stephanie  Hampton  
  • C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: Random  stats  output   From  Stephanie  Hampton  
  • C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Wash Cresc Lake Peter's lab Don't use - old data Algal Washed Rocks Dec. 16 Tray 004 SD for delta 13 C = 0.07 SD for delta 15 N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square-0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 CoefficientsStandard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 Reference statistics: Sampling Site / Identifier: Sample Type: Date: Tray ID and Sequence: SampleID ALG03 ALG05 ALG07 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 Weight (mg) 2.91 2.91 3.04 2.95 3.01 3 2.99 2.92 2.9 %C 6.85 35.56 33.49 41.17 43.74 4.51 1.59 4.37 33.58 delta 13C -21.11 -28.05 -29.56 -27.32 -27.50 -22.68 -24.58 -21.06 -29.44 delta 13C_ca -20.65 -27.59 -29.10 -26.86 -27.04 -22.22 -24.12 -20.60 -28.98 %N 0.48 2.30 1.68 1.97 1.36 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 delta 15N_ca -1.62 -0.06 0.14 2.06 0.34 3.66 -2.34 -2.17 -0.03 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 Series1 From  Stephanie  Hampton  
  • From  Flickr  by  whatthefeed   What  should   researchers  be  doing?  
  • data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • Create  unique  identifiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   2.  Data  collection  &  organization   From  Flickr  by  sjbresnahan   From  Flickr  by  zebbie  
  • Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com   2.  Data  collection  &  organization  
  • Google  Docs   Forms   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Modified  from  K.  Vanderbilt     2.  Data  collection  &  organization   Excel  lists   Data   validataion  
  • 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  •  Use  descriptive  file  names   •  Unique   •  Reflect  contents   From  R  Cook,  ESA  Best  Practices  Workshop  2010   Bad:    Mydata.xls      2001_data.csv      best  version.txt   Better:  Eaffinis_nanaimo_2010_counts.xls   Site   name   Year   What  was   measured     Study   organism   2.  Data  collection  &  organization   *Not  for  everyone   *  
  • Organize  files    logically   Biodiversity   Lake   Experiments   Field  work   Grassland   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     From  S.  Hampton   2.  Data  collection  &  organization  
  •  Preserve  information   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv   R  script  for  processing  &   analysis   2.  Data  collection  &  organization  
  • data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • Before  data  collection   •  Define  &  enforce  standards   •  Assign  responsibility  for  data  quality   3.  Quality  control  and  quality  assurance   From  Flickr  by  StacieBee  
  • After  data  entry   •  Check  for  missing,  impossible,   anomalous  values   •  Perform  statistical  summaries     •  Look  for  outliers     3.  Quality  control  and  quality  assurance   0   10   20   30   40   50   60   0   10   20   30   40  
  • data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • 4.  Metadata  basics   Why  are  you   promoting   Excel?   What  is   metadata?  
  • •  Digital  context   •  Name  of  the  data  set   •  The  name(s)  of  the  data  file(s)  in  the  data   set   •  Date  the  data  set  was  last  modified   •  Example  data  file  records  for  each  data   type  file   •  Pertinent  companion  files   •  List  of  related  or  ancillary  data  sets   •  Software  (including  version  number)   used  to  prepare/read    the  data  set   •  Data  processing  that  was  performed   •  Personnel  &  stakeholders   •  Who  collected     •  Who  to  contact  with  questions   •  Funders   •  Scientific  context   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected   •  What  instruments  (including  model  &   serial  number)  were  used   •  Environmental  conditions  during  collection   •  Where  collected  &  spatial  resolution  When   collected  &  temporal  resolution   •  Standards  or  calibrations  used   •  Information  about  parameters   •  How  each  was  measured  or  produced   •  Units  of  measure   •  Format  used  in  the  data  set   •  Precision  &  accuracy  if  known   •  Information  about  data   •  Definitions  of  codes  used   •  Quality  assurance  &  control  measures   •  Known  problems  that  limit  data  use  (e.g.   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set   4.  Metadata  basics  
  • •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   4.  Metadata  basics   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…   •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)         What  is   metadata?   Select  the  appropriate  standard  
  • data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • Temperature   data   Salinity                 data   Data  import  into  R   Analysis:  mean,  SD   Graph  production   Quality  control  &   data  cleaning  “Clean”  T   &  S  data   Summary   statistics   Data  in  R   format   5.  Workflows   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflows:  flow  charts  
  • •  R,  SAS,  MATLAB   •  Well-­‐documented  code  is…   Easier  to  review   Easier  to  share   Easier  to  repeat  analysis   5.  Workflows   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflows:  commented  scripts   #  %   $   &  
  • Fancy  Schmancy  workflows:  Kepler   Resulting  output   5.  Workflows   https://kepler-­‐project.org  
  • Workflows  enable…     Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis     5.  Workflows   From  Flickr  by  merlinprincesse   Coming  Soon:   workflow  sharing   requirements!  
  • data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6. Data  stewardship  &  reuse   Best  Practices  
  • Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   6.  Data  stewardship  &  reuse   Modified from R. Cook  
  • Store  your  data  in  a  repository   Institutional  archive   Discipline/specialty  archive         6.  Data  stewardship  &  reuse   From  Flickr  by  torkildr   Ask  a  librarian   Repos  of  repos:   databib.org   re3data.org  
  • Allows  readers  to  find  data  products   Get  credit  for  data  and  publications   Promotes  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of   morphological  diversification  in  the  absence  of  a  detailed   phylogeny:  a  case  study  from  characiform  fishes.  Dryad  Digital   Repository.  doi:10.5061/dryad.20   Persistent  Unique   Identifier   6.  Data  stewardship  &  reuse   Practice  Data  Citation  
  • data management From  Flickr  by  Big  Swede  Guy   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   Best  Practices  
  • A  document  that   describes  what  you  will   do  with  your  data   throughout     the  research  project   From Flickr by Barbies Land What  is  a  data   management  plan?  
  • DMP  for  funders:   A  short  plan  submitted   alongside  grant  applications   But they all have different requirements and express them in different ways From  Flickr  by  401(K)  2013    An  outline  of     –  what  will  be  collected   –  methods   –  Standards   –  Metadata   –  sharing/access   –  long-­‐term  storage    Includes  how  and  why  
  •  DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them   NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:  
  • •  Types  of  data   •  Existing  data   •  How/when/where  created?   •  How  processed?   •  Quality  control     •  Security   •  Who  is  responsible     1.  Types  of  data  &  other  information   biology.kenyon.edu   C.  Strasser   From  Flickr  by  Lazurite  
  • Wired.com   •  Metadata  needed   •  How  captured     •  Standards   2.  Data  &  metadata  standards  
  • •  Obligation  to  share     •  How/when/where  available   •  Getting  access     •  Copyright  /  IP   •  Permission  restrictions   •  Embargo  periods     •  Ethics/privacy     •  How  cited   3.  Policies  for  access  &  sharing   4.  Policies  for  re-­‐use  &  re-­‐distribution   From  Flickr  by  maryfrancesmain  
  • •  What  &  where     •  Metadata   •  Who’s  responsible   5.  Plans  for  archiving  &  preservation   From  Flickr  by  theManWhoSurfedTooMuch  
  • Don’t  forget  the  budget   dorrvs.com  
  • NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &   change  over  time     Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     Evaluation  will  vary  with  directorate,   division,  &  program  officer     *Unofficially  
  • From  Flickr  by  celikins   Where  to  start?  
  • From  Flickr  by  Andy  Graulund   Make  a   resolution   • Triage  on  current   projects   • Get    advisor,  lab  mates,   collaborators  on  board   • Do  better  next  time  
  • Start  working   online   From  Flickr  by  karindalziel  
  • From  Flickr  by  karindalziel   E-­‐notebooks   Online  science       http://datapub.cdlib.org/software-­‐for-­‐reproducibility-­‐part-­‐2-­‐the-­‐tools/   Reproducibility  
  • From  Flickr  by  dipster1   Toolbox  
  • Step-by-step wizard for generating DMP create | edit | re-use | share Free & open to community dmptool.org                     Write  a  DMP  
  • databib.org   Where   should  I  put   my  data?   Find  a  repository  
  • Get  help   FromFlickrbythewmatt
  • Get  help  from  your  library   From  Flickr  by  North  Carolina  Digital   Heritage  Center   From  Flickr  by  Madison  Guy  
  • NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   www.dataone.org   Get  help  
  • B   C  A  
  • •  Data  Education  Tutorials   •  Database  of  best  practices    &   software  tools   •  Primer  on  data  management   •  Investigator  Toolkit   www.dataone.org  
  • From  Flickr  by  Skakerman   A  word  about   Metrics…  
  • Articles  are  the  butterfly  pinned  on   the  wall.  Pretty  but  not  very   useful.  They  are  only  the   advertisements  for  scholarship.       –  A.  Levi,  U.  Maryland  College  of  Information   Studies     From  Flickr  by  LisaW123  
  • How to incentivize good data stewardship? Data  Citation   Altmetrics  (Alternative  Metrics)   From  Flickr  by  chriscook04  
  • From  Flickr  by  dotpolka   Doing  science  is  a   privilege  –  not  a  right  
  •  There  is  a  social  contract  of  science:  we   have  an  obligation  to  ensure  dissemination,   validation,  &  advancement.   To  not  do  so  is  science  malpractice.     Who's  responsible?  Researchers,   publishers,  libraries,  repositories…     –  Brian  Hole,  Ubiquity  Press  at  UCL   From  Flickr  by  mikerosebery  
  • From  Flickr  by  Michael  Tinkler  
  • Data  Pub  Blog:  datapub.cdlib.org  
  • My  website   Email  me   Tweet  me   My  slides   carlystrasser.net   carlystrasser@gmail.com   @carlystrasser     slideshare.net/carlystrasser