Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
From	  Calisphere	  via	  California	  State	  University	  Libraries,	  	                                                ...
C.	  Strasser	                       C.	  Strasser	                                                                 Courte...
C.	  Strasser	                       C.	  Strasser	                                            North	  Atlantic	  right	  ...
Roadmap	                                        5.  Landscape	                               4.  Barriers	                ...
A	  Brief	  From	  Calisphere	  via	  Santa	  Clara	  University,	  	                                                     ...
The	  lab/field	  notebook	                                                      Curie	                                    ...
The	  lab/field	  notebook	  From	  Calisphere	  via	  Fullerton	  College,	  	  ark:/13030/kt5c60273t	  
From	  Flickr	  by	  	  DW0825	                                                                                           ...
From	  Flickr	  by	  	  DW0825	                                                                                           ...
Digital	  data	       +	  	   Complex	  workflows	  
Data	                               Models	                      Maximum	                      Likelihood	                ...
From	  Flickr	  by	  stevecadman	                                            The	  Wide	  World	  of	  Data	  
Data	  Types	  
Dimensions	  of	  Data	       Datum	      Data	  file	                                     Metadata	       Dataset	  Data	 ...
Data	  Diversity	   Temporal	                           File	  size	                    Units	                            ...
Big	  Data	  OSTP	  March	  2012	  Big	  Data	  Effort	  Launched	  
Guys?	                                                The	  Little	  From	  Flickr	  by	  jason	  tinder	  
The	  Long	  Tail	   Size	  of	  dataset	  grant	  ($)	                          #	  datasets	                         #	 ...
The	  Long	  Tail	                     300                                                                                ...
From	  Flickr	  by	  Old	  Shoe	  Woman	  The	  Fallout	  
UGLY TRUTH                                                    Many	  (most?)	                                             ...
Information	  Entropy	  	    Fig.	  1	  of	  Michener	  et	  al.	  1997	  
From	  Calisphere	  via	  San	  Jose	  Public	  Library	                                                                  ...
2	  tables	                             Random	  notes	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Grad...
Wash	  Cres	  Lake	  Dec	  15	  Dont_Use.xls	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Semin...
Random	  stats	  output	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake De...
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1  ...
The	  Fallout:	  Where	  data	  end	  up	                                                         From	  Flickr	  by	  diy...
The	  Fallout	                          Data	                          Reuse	                          Data	              ...
Is data produced                                      Is the data produced            100	  NSF	  Dare data               ...
Why?	  Barriers	  to	  Data	                                                   Stewardship	  From	  Flickr	  by	  iowa_spi...
From	  Flickr	  by	  indigoprime	                                                                                         ...
Barriers:	  Sociocultural	                                   From	  Flickr	  by	  freefotouk	   Not	  the	  norm	         ...
Barriers:	  Sociocultural	   Not	  the	  norm	                          	    Lack	  of	  /	  too	  many	  standards	  
Barriers:	  Sociocultural	    Not	  the	  norm	                           	     Lack	  of	  /	  too	                      ...
Barriers:	  Sociocultural	                                    From	  Flickr	  by	  uniinnsbruck	    Not	  the	  norm	     ...
From	  Flickr	  by	  Christina	  Ann	  VanMeter	        Missed	        opportunities	                                     ...
Barriers:	  Sociocultural	                                        Lack	  of	  incentives	                                 ...
From	  Flickr	  by	  	  Marquette	  University	  generation?	  But	  what	  about	  the	  next	  
Are	  Undergrads	  Learning	  About	  Data	  Management?	  •    Metadata	  generation	                 40	  •    Software	...
Are	  Undergrads	  Learning	  About	  Data	  Management?	                                                   Barriers:	    ...
C.	  Strasser	   The	  Current	  Landscape	  
Who	  cares?	         	                                                    From	  Flickr	  by	  Redden-­‐McAllister	     F...
Where	  data	  end	  up	                                                                      From	  Flickr	  by	  diylibr...
Trends	  in	  Data	  Archiving	  Journal	  publishers	  Joint	  Data	  Archiving	  Agreement	  	  Data	  Papers	  etc.	  E...
What	  is	  a	  data	  management	  plan?	  A	  document	  that	  describes	  what	  you	  will	  do	  with	  your	  data	...
Why	  should	  a	  scientist	  prepare	  a	  DMP?	                                    	           	           Saves	  time...
From	  Flickr	  by	  einalem	                                        The	  Fallout	  
NSF	  DMP	  Requirements	   From	  Grant	  Proposal	  Guidelines:	  	  DMP	  supplement	  may	  include:	       1.  the	  ...
NSF’s	  Vision*	      DMPs	  and	  their	  evaluation	  will	  grow	  &	  change	  over	  time	      (similar	  to	  broad...
dmp.cdlib.org	                      dmponline.dcc.ac.uk	  
Individual	  Challenges	                                                                                  What	  is	  a	  ...
NSF	  funded	  DataNet	  Project	  Office	  of	  Cyberinfrastructure	                                                Communi...
What	  role	  can	                                                               libraries	  play	  in	                   ...
dataup.cdlib.org	  @DataUpCDL	  facebook.com/DataUpCDL	                                    carlystrasser.net	             ...
Data Management from a Scientist's Perspective
Upcoming SlideShare
Loading in …5
×

Data Management from a Scientist's Perspective

649 views

Published on

Presentation for University of Florida librarians on data management

Published in: Sports, Technology
  • Be the first to comment

Data Management from a Scientist's Perspective

  1. 1. From  Calisphere  via  California  State  University  Libraries,     Data   Management   A  Scientist’s    ark:/13030/c818356g   Perspective  Carly  Strasser  California  Digital  Library   University  of  Florida  Libraries  University  of  California  Curation  Center   August  2012  
  2. 2. C.  Strasser   C.  Strasser   Courtesy  of  WHOI  C.  Strasser   C.  Strasser  
  3. 3. C.  Strasser   C.  Strasser   North  Atlantic  right  whale  mother  and  calf,  C.  Strasser   by  Gill  Braulik  under  Permit  No.  655-­‐1652  
  4. 4. Roadmap   5.  Landscape   4.  Barriers     3.  The  Fallout     2.  The  world  of  data  1.  A  brief  history  of  data  collection     C.  Strasser  
  5. 5. A  Brief  From  Calisphere  via  Santa  Clara  University,     History  of   Data  ark:/13030/kt696nc7j2   Collection   Or…  how  scientists  came  to  be  so   bad  at  data  management  
  6. 6. The  lab/field  notebook   Curie   Newton   Darwin   Da  Vinci  classicalschool.blogspot.com  
  7. 7. The  lab/field  notebook  From  Calisphere  via  Fullerton  College,    ark:/13030/kt5c60273t  
  8. 8. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   The  lab/field  notebook…?   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  9. 9. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  10. 10. Digital  data   +     Complex  workflows  
  11. 11. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  12. 12. From  Flickr  by  stevecadman   The  Wide  World  of  Data  
  13. 13. Data  Types  
  14. 14. Dimensions  of  Data   Datum   Data  file   Metadata   Dataset  Data  collection  Data  repository  
  15. 15. Data  Diversity   Temporal   File  size   Units   File   structure   organization   File  type  Documentation   Spatial   extent   structure   Metadata   Collection   Codes   Project  intent   practices   Analysis  
  16. 16. Big  Data  OSTP  March  2012  Big  Data  Effort  Launched  
  17. 17. Guys?   The  Little  From  Flickr  by  jason  tinder  
  18. 18. The  Long  Tail   Size  of  dataset  grant  ($)   #  datasets   #  researchers   #  grants  
  19. 19. The  Long  Tail   300 NSF  DEB  2005-­‐2010   250 n  =  1234  Number of Awards 200 150 100 50 0 0.1 0.5 1 1.5 2 >2.5 Award Amount (millions of dollars) Hampton  et  al.,  In  press,  Frontiers  in  Ecology  and  Evolution  
  20. 20. From  Flickr  by  Old  Shoe  Woman  The  Fallout  
  21. 21. UGLY TRUTH Many  (most?)   researchers…    5shortessays.blogspot.com       are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  22. 22. Information  Entropy     Fig.  1  of  Michener  et  al.  1997  
  23. 23. From  Calisphere  via  San  Jose  Public  Library   How  bad  can  it  be?  
  24. 24. 2  tables   Random  notes  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   From  Stephanie  Hampton  
  25. 25. Wash  Cres  Lake  Dec  15  Dont_Use.xls  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices   From  Stephanie  Hampton  
  26. 26. Random  stats  output  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 From  Stephanie  Hampton  
  27. 27. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 From  Stephanie  Hampton  
  28. 28. The  Fallout:  Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data  Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  29. 29. The  Fallout   Data   Reuse   Data   Sharing   Data   Management  
  30. 30. Is data produced Is the data produced 100  NSF  Dare data Where EB  awards   or reused? shared? 2005-­‐2009   shared?   Is data produced or reused? Is the data produced shared? One  paper  from  each   Where areor GenBank data shared? Shared TreeBase Produced all award   Else- Reused Shared where none GenBank or Shared Shared Produced TreeBase Is data produced Both Is the data some produced Where are data all Else- or reused? shared? shared? Reused Shared where none Shared Produced: 57% (37) Shared all: 28% (17) some GenBank or Both GenBank or Reused: 8% (5) Shared some: 15% (9) TreeBase: Produced Shared TreeBase (21) 81% Both: 35% (23) Shared none: 57% (34) all Elsewhere: 19% (5) Else- Reused Shared where Produced: 57% (37) Shared all: 28% (17) GenBank or none Reused: 8% (5) Shared Shared some: 15% (9) TreeBase: 81% (21) Both: Both (23) 35% some Shared none: 57% (34) Elsewhere: 19% (5) Produced: 57% (37) Shared all: 28% (17) GenBank or Reused: 8% (5) Shared some: 15% (9) TreeBase: 81% (21) Both: 35% (23) Shared none: 57% (34) Elsewhere: 19% (5)Hampton  et  al.,  In  press,  Frontiers  in  Ecology  and  Evolution  
  31. 31. Why?  Barriers  to  Data   Stewardship  From  Flickr  by  iowa_spirit_walker  
  32. 32. From  Flickr  by  indigoprime   Barriers:  Cost   From  Flickr  by  kobiz7  C.  Strasser  
  33. 33. Barriers:  Sociocultural   From  Flickr  by  freefotouk   Not  the  norm    
  34. 34. Barriers:  Sociocultural   Not  the  norm     Lack  of  /  too  many  standards  
  35. 35. Barriers:  Sociocultural   Not  the  norm     Lack  of  /  too   From  Flickr  by  toucanradio  many  standards     Disparate  data   From  Flickr  by  Chris  Campbell  
  36. 36. Barriers:  Sociocultural   From  Flickr  by  uniinnsbruck   Not  the  norm     Lack  of  /  too  many  standards     Disparate  data    Lack  of  training  
  37. 37. From  Flickr  by  Christina  Ann  VanMeter   Missed   opportunities   Loss  of  rights  or  benefits  From  Flickr  by  pnh   Barriers:  Sociocultural   Conflict   From  Flickr  by  tymesynk   Misuse  
  38. 38. Barriers:  Sociocultural   Lack  of  incentives   Time  consuming   &  expensive     No   requirements  From  Flickr  by  bthomso     Reward   structure  
  39. 39. From  Flickr  by    Marquette  University  generation?  But  what  about  the  next  
  40. 40. Are  Undergrads  Learning  About  Data  Management?  •  Metadata  generation   40  •  Software  choice   35  •  File  naming  •  QAQC   30   Important  •  Backing  up     25  •  Workflows   20  •  Data  sharing  •  Data  re-­‐use   15  •  Meta-­‐analysis   10  •  Reproducibility  •  Notebook  protocols   5  •  Databases     0   If  it’s  important,  why  0   10   Assessed   20   30   40   isn’t  it  taught?  
  41. 41. Are  Undergrads  Learning  About  Data  Management?   Barriers:   Too   Not  a   Not   advanced   priority   appropriate   level   Students   Time   don’t  know   No   software   Lab   No   training   Covered   Too   in  Lab   big  
  42. 42. C.  Strasser   The  Current  Landscape  
  43. 43. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  44. 44. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   wwwMetadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  45. 45. Trends  in  Data  Archiving  Journal  publishers  Joint  Data  Archiving  Agreement    Data  Papers  etc.  Ecological  Archives,  Beyond  the  PDF    Funders  Data  management  requirements    
  46. 46. What  is  a  data  management  plan?  A  document  that  describes  what  you  will  do  with  your  data  during  your  research  and  after  you  complete  your  research  
  47. 47. Why  should  a  scientist  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  48. 48. From  Flickr  by  einalem   The  Fallout  
  49. 49. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  50. 50. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over  time   (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  definitions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &  program   officer    *Unofficially   Help  from  Jennifer  Schopf,  NSF  
  51. 51. dmp.cdlib.org   dmponline.dcc.ac.uk  
  52. 52. Individual  Challenges   What  is  a  data   Will  I  get  credit   for  my  work?   Collect   management   plan?   Analyze   Assure   What  is  What  tools  do  I   metadata?   use?   Are  there   standards?   Integrate   Describe   How  much  will   it  cost?   Who  can  help   me?   Discover   Deposit   Where  do  I   How  do  I   preserve  my   Preserve   preserve  my   data?   data?  
  53. 53. NSF  funded  DataNet  Project  Office  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   Courtesy  of  DataONE  
  54. 54. What  role  can   libraries  play  in   data  education?   What  barriers  to  sharing   can  we  eliminate?   Why  don’t  people   share  data?   Is  data  management  Do  attitudes  about   being  taught?   sharing  differ  among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  55. 55. dataup.cdlib.org  @DataUpCDL  facebook.com/DataUpCDL   carlystrasser.net   carlystrasser@gmail.com   @carlystrasser  

×