UC Santa Cruz: Data Management for Scientists

861 views
748 views

Published on

Presentation at UC Santa Cruz for Open Access Week 2011. 26 Nov 2011

Published in: Education, Travel
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
861
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
14
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

UC Santa Cruz: Data Management for Scientists

  1. 1. Data  Management  for  Scientists     Reduce  your  workload   Reuse  your  ideas   Recycle  your  data     www.oddee.com  Carly  Strasser,  PhD  California  Digital  Library,  UC  Office  of  the  President  carly.strasser@ucop.edu  www.carlystrasser.net  
  2. 2. Roadmap   4.  Toolbox     3.  Control   2.  Chaos  1.  Who  are  you?    
  3. 3. Roadmap   4.  Toolbox     3.  Control   2.  Chaos  1.  Who  are  you?    
  4. 4. NSF  funded  DataNet  Project  Office  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   From  Flickr  by  ThomasThomas   From  Flickr  by  Langwitches  
  5. 5. What  role  can   libraries  play  in   data  education?   Why  don’t  people   What  barriers  to  sharing   share  data?   can  we  eliminate?   Is  data  management  Do  attitudes  about   being  taught?   sharing  differ  among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  6. 6. Roadmap   4.  Toolbox     3.  Control   2.  Chaos  1.  Who  are  you?    
  7. 7. Digital  data   +     Complex  workflows  
  8. 8. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  9. 9. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  10. 10. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…      5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  11. 11. 2  tables   Random  notes  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  oStephanie  ractices   Modified  from   n  Best  P Hampton  
  12. 12. Wash  Cres  Lake  Dec  15  Dont_Use.xls  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  oStephanie  ractices   Modified  from   n  Best  P Hampton  
  13. 13. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 Modified  from  Stephanie  Hampton  
  14. 14. What  is  this?  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 Modified  from  Stephanie  Hampton  
  15. 15. The  path  of  research  products   www Data  Metadata   Recreated  from  Klump  et  al.  2006  
  16. 16. The  path  of  research  products   www Data   wwwMetadata   Recreated  from  Klump  et  al.  2006  
  17. 17. Data   Reuse   Data   Sharing   Data  Management  
  18. 18. Roadmap   4.  Toolbox     3.  Control   2.  Chaos  1.  Who  are  you?    
  19. 19. Roadmap   4.  Toolbox     3.  Control   2.  Chaos  1.  Who  are  you?    
  20. 20. •  Unrestricted  access  to  articles*  via  internet   digital   online   free  of  charge   free  of  most  copyright/licensing  restrictions   •  Compatible  with  conventional  scholarly  literature   •  Bills  not  paid  by  readers:  no  barriers  to  access     *Open  access  easily  extends  to  data  
  21. 21. Roadmap   4.  Toolbox     3.  Control   2.  Chaos  1.  Who  are  you?    
  22. 22. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  Stewardship  &  reuse  
  23. 23. 1.  Planning   What  is  a  data  management  plan?  A  document  that  describes  what  you  will  do  with  your  data   during  and  after  you  complete  your  research   From  Flicker  by  Ikelee  
  24. 24. 1.  Planning   Why  should  I  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  protect  their  investment    
  25. 25. 1.  Planning  Components  of  a  DMP    1.  Information  about  data  &  data  format  2.  Metadata  content  and  format  3.  Policies  for  access,  sharing  and  re-­‐use  4.  Long-­‐term  storage  and  data  management  5.  Budget  
  26. 26. 1.  Planning   dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &updates  
  27. 27. 2.  Data  collection  &  organization   Personal  data  management  problems  build  up   over  time,  &  in  collaboration   plumbinghelptoday.com  
  28. 28. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats  Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  29. 29. 2.  Data  collection  &  organization   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Excel  lists   Data Google  Docs     Forms   validataion  Modified  from  K.  Vanderbilt    
  30. 30. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777  From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  31. 31. 2.  Data  collection  &  organization  Use  descriptive  file  names   PhDcomics.com  
  32. 32. 2.  Data  collection  &  organization    Use  descriptive  file  names   •  Unique   •  Reflect  contents  Bad:    Mydata.xls   Better:  Eaffinis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  33. 33. 2.  Data  collection  &  organization  Organize  files    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  34. 34. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  35. 35. 3.  Quality  control  and  quality  assurance   Define  &  enforce  standards   Double  data  entry   Document  changes   No  missing,  impossible,  or  anomalous  values   •  Perform  statistical  summaries   •  Use  illegal  data  filter   60   •  Look  for  outliers   50   40     30   20   10   0   0   5   10   15   20   25   30   35  
  36. 36. 4.  Metadata  basics   What  is  metadata?   Data  reporting     •  WHO  created  the  data?   •  WHAT  is  the  content  of  the  data  set?   •  WHEN  was  it  created?   •  WHERE  was  it  collected?   •  HOW  was  it  developed?   •  WHY  was  it  developed?  
  37. 37. •  Scientific  context   4.  Metadata  basics   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected  •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  file(s)  in  the  data   •  Environmental  conditions  during  collection   set   •  Where  collected  &  spatial  resolution  When   •  Date  the  data  set  was  last  modified   collected  &  temporal  resolution   •  Example  data  file  records  for  each  data   •  Standards  or  calibrations  used   type  file   •  Information  about  parameters   •  Pertinent  companion  files   •  How  each  was  measured  or  produced   •  List  of  related  or  ancillary  data  sets   •  Units  of  measure   •  Software  (including  version  number)   •  Format  used  in  the  data  set   used  to  prepare/read    the  data  set   •  Precision  &  accuracy  if  known   •  Data  processing  that  was  performed   •  Information  about  data  •  Personnel  &  stakeholders   •  Definitions  of  codes  used   •  Who  collected     •  Quality  assurance  &  control  measures   •  Who  to  contact  with  questions   •  Known  problems  that  limit  data  use  (e.g.   •  Funders   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  38. 38. 4.  Metadata  basics   What  is  a  metadata  standard?  •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure  •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…    •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)    
  39. 39. 4.  Metadata  basics   What  does  a  metadata  record  look  like?  
  40. 40. 5.  Workflows   Simplest  workflows:  commented  scripts,  flow  charts   Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  41. 41. 5.  Workflows  Fancy  Schmancy:  Kepler   Resulting  output   https://kepler-­‐project.org  
  42. 42. 5.  Workflows   Workflows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  43. 43. 6.  Data  stewardship  &  reuse   Data   Reuse   Data   Sharing   Data   Management  
  44. 44. 6.  Data  stewardship  &  reuse   From  Flickr  by  greensambaman   The 20-Year Rule The  metadata  accompanying  a   data  set  should  be  written  for  a   user  20  years  into  the  future   RULE       (National  Research  Council  1991)  
  45. 45. 6.  Data  stewardship  &  reuse  Use  stable  formats      csv,  txt,  tiff  Create  back-­‐up  copies     original,  near,  far  Periodically  test  ability  to  restore  information   Modified from R. Cook  
  46. 46. 6.  Data  stewardship  &  reuse   Where  do  I  put  it?   Insitutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  47. 47. 6.  Data  stewardship  &  reuse   Data  Citation:  Why  everyone  should  do  it   Allow  readers  to  find  data  products   Get  credit  for  data  and  publications   Promote  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     Modified from R. Cook  
  48. 48. Roadmap   4.  Toolbox     3.  How  to  be  good   2.  Bad  scientists  1.  Who  are  you?    
  49. 49. NSF  funded  DataNet  Project  Office  of  Cyberinfrastructure   Enabling  universal  access  to  data  about  life  on  earth   and  the  environment  that  sustains  it  
  50. 50. B  A   C  
  51. 51. B  A   C  
  52. 52. B  A   C  
  53. 53. www.dataone.org  •  Data  Education  Tutorials  
  54. 54. www.dataone.org  •  Data  Education  Tutorials  •  Primer  on  data  management  
  55. 55. www.dataone.org  •  Data  Education  Tutorials  •  Primer  on  data  management  •  Database  of  best  practices  &  software  tools  •  List  of  repositories  &  metadata  standards  •  Links  to  DMP  Tool   Investigator  Toolkit  •  ONE-­‐R  •  ONE-­‐Mercury  •  ONE-­‐Drive  
  56. 56. E-­‐notebooks   •  NoteBook   •  ORNL  eNote     •  Evernote   •  Google  Docs   •  Blogs   •  wikis   •  TheLabNotebook.com   •  iPad  ELN   •  NoteBookMaker  iPad ELN, the flexibleelectronic laboratory notebook TheLabNotebook.com!
  57. 57. CDL  Services  for  UC  Community  •  Precise  identification  of  a  dataset  •  Credit  to  data  producers  and  data  publishers  •  A  link  from  the  traditional  literature  to  the  data  •  Research  metrics  for  datasets  •  Deposit  content  (i.e.  data)  •  Manage  (metadata,  versions  etc.)  •  Share  •  Access  •  Preserve   www.cdlib.org/services/uc3  
  58. 58. •  Open  source  add-­‐in  •  Facilitate  data  management,  sharing,  archiving  for  scientists  •  Part  of  DataONE  investigator  toolkit  •  Collecting  requirements  for  add-­‐in  from  scientists,  data   centers,  libraries   dcxl.cdlib.org     Funders:  Gordon  and  Betty  Moore  Foundation,  Microsoft  Research  
  59. 59. Christy  Hightower   Katie  Forney   Ann  Hubble   Cynthia  Moriconi     www.carlystrasser.net   carlystrasser@gmail.com  dcxl.cdlib.org   @carlystrasser  @dcxlCDL  www.facebook.com/DCXLatCDL  

×