Data	  Management	  and	  the	  Digital	  Curation	  for	  Excel	  (DCXL)	  Project	  	  Carly	  Strasser	  	  University	...
NSF	  funded	  DataNet	  Project	  Of@ice	  of	  Cyberinfrastructure	  Enabling	  universal	  access	  to	  data	  about	 ...
B	  A	             C	  
NSF	  funded	  DataNet	  Project	  Of@ice	  of	  Cyberinfrastructure	                                                     ...
What	  role	  can	                                                         libraries	  play	  in	                         ...
Roadmap	                                          5.  Tools	                              4.  DCXL	  	                    ...
From	  Flickr	  by	  	  DW0825	                                                                                           ...
Digital	  data	         +	  	  Complex	  analyses	  
Data	                               Models	                      Maximum	                      Likelihood	                ...
2	  tables	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use...
Random	  notes	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont...
Wash	  Cres	  Lake	  Dec	  15	  Dont_Use.xls	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Semin...
Collaboration	  and	  Data	  Sharing	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash...
Random	  stats	  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont...
Where	  data	  end	  up	                                                       From	  Flickr	  by	  diylibrarian	         ...
Who	  cares?	         	                                                 From	  Flickr	  by	  Redden-­‐McAllister	   From	 ...
Where	  data	  end	  up	                                   From	  Flickr	  by	  diylibrarian	                             ...
Data	     Reuse	     Data	    Sharing	     Data	  Management	  
UGLY TRUTH                                                  Many	                                                    Earth...
Roadmap	                                          5.  Tools	                              4.  DCXL	  	                    ...
Barriers	  Cost	                     Time	                                cultblender.wordpress.com	                      ...
Barriers	  Cost:	  time,	  personnel,	  software,	  hardware	  Culture	  of	  Science	    •  Not	  the	  norm	    •  Lack	...
Barriers	  Cost:	  time,	  personnel,	  software,	  hardware	  Culture	  of	  Science	  Loss	  of	  rights	  or	  bene:its...
Barriers	  Cost:	  time,	  personnel,	  software,	  hardware	  Culture	  of	  Science	  Loss	  of	  rights	  or	  bene:its...
Are	  Undergrads	  Learning	  About	  Data	  Management?	                                              Importance	  Versus...
Barriers	  to	  Teaching	  Data	  Management	                              Too	                            Not	  a	       ...
Roadmap	                                          5.  Tools	                              4.  DCXL	  	                    ...
Best	  Practices	  for	  Data	  Management	     1.  Planning	     2.  Data	  collection	  &	         organization	     3. ...
2.	  Data	  collection	  &	  organization	  Create	  unique	  identiTiers	      •  Decide	  on	  naming	  scheme	  early	 ...
2.	  Data	  collection	  &	  organization	          Standardize	                       •  Consistent	  within	  columns	  ...
2.	  Data	  collection	  &	  organization	           Standardize	                        •  Reduce	                       ...
2.	  Data	  collection	  &	  organization	  	  	             Create	  parameter	  table	             Create	  a	  site	  t...
2.	  Data	  collection	  &	  organization	     	  Use	  descriptive	  Tile	  names	   *	         •  Unique	         •  Re@...
2.	  Data	  collection	  &	  organization	  Organize	  Tiles	  	  logically	                       Biodiversity	          ...
2.	  Data	  collection	  &	  organization	  	  Preserve	  information	                                   R	  script	  for	...
2.	  Data	  collection	  &	  organization	                                      All	  of	  the	  things	  that	           ...
3.	  Quality	  control	  and	  quality	  assurance	   De@ine	  &	  enforce	  standards	   Double	  data	  entry	   Documen...
•    ScientiTic	  context	   4.	  Metadata	  basics	                                                         •       Scien...
4.	  Metadata	  basics	  •  Provides	  structure	  to	  describe	  data	                Common	  terms	  	  |	  	  deVinit...
5.	  WorkTlows	   Simplest	  workTlows:	  commented	  scripts,	  Vlow	  charts	  Temperature	     data	                   ...
5.	  WorkTlows	  Fancy	  Schmancy:	  Kepler	                                                         Resulting	  output	  ...
5.	  WorkTlows	   WorkTlows	  enable	   	                                                                                 ...
6.	  Data	  stewardship	  &	  reuse	  Use	  stable	  formats	            	                   	  csv,	  txt,	  tiff	  Creat...
6.	  Data	  stewardship	  &	  reuse	                      Where	  do	  I	  put	  it?	                    Insitutional	  ar...
6.	  Data	  stewardship	  &	  reuse	              Data	  Citation:	  Why	  everyone	  should	  do	  it	                  A...
Best	  Practices	  for	  Data	  Management	     1.  Planning	     2.  Data	  collection	  &	         organization	     3. ...
1.	  Planning	  What	  is	  a	  data	  management	  plan?	     A	  document	  that	  describes	  what	  you	  will	  do	  ...
1.	  Planning	         Why	  should	  scientists	  prepare	  a	  DMP?	          	                                	        ...
A	  few	  words	  about	  NSF	  Data	         Management	  Plans	  
NSF	  DMP	  Requirements	   From	  Grant	  Proposal	  Guidelines:	  	  DMP	  supplement	  may	  include:	       1.  the	  ...
Don’t	  forget:	  Budget	  •  Costs	  of	  data	  preparation	  &	  documentation	            Hardware,	  software	       ...
NSF’s	  Vision*	      DMPs	  and	  their	  evaluation	  will	  grow	  &	  change	  over	      time	  (similar	  to	  broad...
NSF’s	  Vision*	    DMPs	  are	  a	  good	  Tirst	  step	  towards	  improving	  data	    stewardship	           –  starti...
dmp.cdlib.org	                   Step-­‐by-­‐step	  wizard	  for	  generating	  DMP	        Create	  	  |	  	  edit	  	  |...
Roadmap	                                          5.  Tools	                              4.  DCXL	  	                    ...
“A	  transformation	  in	  the	  conduct	  of	  a	  segment	  of	  scientiVic	      research	  by	  enabling	  and	  promo...
Open	  Source	  &	  Free	  	             Excel	  Add-­‐in	                       	  Software	  program	  that	  extends	  ...
DCXL	  Project	  Deliverables	  •  Excel	  add-­‐in	  •  Publicly	  available	  source	  code	  •  Technical	  documentati...
Process	  Assess	  needs	  •  Quantitative	     –  Surveys	  
Process	  Assess	  needs	  •  Quantitative	                            ?   –  Surveys	     –  Quick	  poll	  •  Qualitativ...
Process	  Assess	  needs	  Gather	  requirements	  	          Locations	               	  Conferences	               	  UC...
Process	  Assess	  needs	  Gather	  requirements	  	          Stakeholders	  &	  contributors	  	                   	  Lib...
Process	  Assess	  needs	  Gather	  requirements	                        !Build	  requirements	  document	             !  ...
Requirements	  1 	  Ensure	  compatibility	  for	  Excel	  users	  without	  the	  add-­‐in	  2 	  Check	  the	  data	  Ti...
Requirements	   4 	  Generate	  a	  citation	  for	  the	  data	  Tile	   5 	  Deposit	  into	  a	  repository	           ...
Process	  Assess	  needs	  Gather	  requirements	  Build	  requirements	  document	  Build	  community	    Libraries	    S...
Why	  are	  you	                                                         promoting	                                       ...
Get	  Involved	  dcxl.cdlib.org	  	  	        @dcxlCDL	                         	                        www.facebook.com/...
Roadmap	                                          5.  Tools	                              4.  DCXL	  	                    ...
UC3	  Services	     Where	   should	  I	  put	                             Data	  Repository	    my	  data?	              ...
UC3	  Services	   How	  do	  I	  get	     a	  unique	    identiVier?	              Create	  &	  manage	  persistent	  iden...
DataONE	                                             www.dataone.org	   •    Data	  Education	  Tutorials	   •    Database...
Data Management 101"dcxl.cdlib.org	  •    Data	  Education	  Tutorials	  •    Primer	  on	  data	  management	  •    Other...
Toolbox:	  	  DCXL	  blog:	  dcxl.cdlib.org	  
Lisa	  Federer	                                                        	  dcxl.cdlib.org	  @dcxlCDL	  www.facebook.com/DCX...
Upcoming SlideShare
Loading in …5
×

UCLA: Data Management for Librarians

543 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
543
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

UCLA: Data Management for Librarians

  1. 1. Data  Management  and  the  Digital  Curation  for  Excel  (DCXL)  Project    Carly  Strasser    University  of  California  Curation  Center  at  CDL  
  2. 2. NSF  funded  DataNet  Project  Of@ice  of  Cyberinfrastructure  Enabling  universal  access  to  data  about  life  on  earth   and  the  environment  that  sustains  it  
  3. 3. B  A   C  
  4. 4. NSF  funded  DataNet  Project  Of@ice  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   From  Flickr  by  wetwebwork   Courtesy  of  DataONE  
  5. 5. What  role  can   libraries  play  in   data  education?   Why  don’t  people   What  barriers  to   share  data?   sharing  can  we   eliminate?   Is  data  management  Do  attitudes  about   being  taught?   sharing  differ   among   disciplines?   How  can  we  promote   storing  data  in   repositories?  
  6. 6. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices  1.  Mistakes  scientists  make    
  7. 7. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  8. 8. Digital  data   +    Complex  analyses  
  9. 9. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  10. 10. 2  tables  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  11. 11. Random  notes  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  12. 12. Wash  Cres  Lake  Dec  15  Dont_Use.xls  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  13. 13. Collaboration  and  Data  Sharing  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 13  
  14. 14. Random  stats  C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peters lab Dont use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
  15. 15. Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data  Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  16. 16. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  17. 17. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   wwwMetadata   Recreated  from  Klump  et  al.  2006  
  18. 18. Data   Reuse   Data   Sharing   Data  Management  
  19. 19. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…      5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  20. 20. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices  1.  Mistakes  scientists  make    
  21. 21. Barriers  Cost   Time   cultblender.wordpress.com   Software,   Personnel   hardware  
  22. 22. Barriers  Cost:  time,  personnel,  software,  hardware  Culture  of  Science   •  Not  the  norm   •  Lack  of  training   •  Disparate  data  
  23. 23. Barriers  Cost:  time,  personnel,  software,  hardware  Culture  of  Science  Loss  of  rights  or  bene:its   Misuse  of   data   Missed   opportunities   Con@lict  
  24. 24. Barriers  Cost:  time,  personnel,  software,  hardware  Culture  of  Science  Loss  of  rights  or  bene:its  Lack  of  incentives   Time  consuming   &  expensive   Reward   structure   Few   requirements  
  25. 25. Are  Undergrads  Learning  About  Data  Management?   Importance  Versus  Assessment   •  Metadata  generation   40   •  Software  choice   35   •  File  naming   30   •  QAQC   Important   25   •  Backing  up     20   •  Work@lows   15   •  Data  sharing   10   •  Data  re-­‐use   •  Meta-­‐analysis   5   •  Reproducibility   0  If  it’s  important,  why   0   •  Notebook  protocols   10   20   30   40   Assessed   isn’t  it  taught?   •  Databases    
  26. 26. Barriers  to  Teaching  Data  Management   Too   Not  a   Not   advanced   priority   appropriate   level   Students   Time   don’t  know   No   software   Lab   No   training   Covered   Too   in  Lab   big  
  27. 27. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices  1.  Mistakes  scientists  make    
  28. 28. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Work@lows   6.  Data  stewardship  &  reuse  
  29. 29. 2.  Data  collection  &  organization  Create  unique  identiTiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   From  Flickr  by  zebbie   From  Flickr  by  sjbresnahan  
  30. 30. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats  ModiVied  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  31. 31. 2.  Data  collection  &  organization   Standardize   •  Reduce   possibility  of   manual  error  by   constraining   entry  choices   Excel  lists   Google  Docs  Data  validataion   Forms   ModiVied  from  K.  Vanderbilt    
  32. 32. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777  From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  33. 33. 2.  Data  collection  &  organization    Use  descriptive  Tile  names   *   •  Unique   •  Re@lect  contents  Bad:    Mydata.xls   Better:  Eaf@inis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     *Not  for  everyone   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  34. 34. 2.  Data  collection  &  organization  Organize  Tiles    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  35. 35. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  36. 36. 2.  Data  collection  &  organization   All  of  the  things  that   make  Excel  great  for  data   organization  are  bad  for   archiving!  What  to  do?  1.  Create  archive-­‐ready  raw  data  2.  Put  it  somewhere  special  3.  Have  your  fun  with  fancy  Excel   techniques  4.  Keep  archiving  in  mind  
  37. 37. 3.  Quality  control  and  quality  assurance   De@ine  &  enforce  standards   Double  data  entry   Document  changes   Minimize  manual  data  entry   No  missing,  impossible,  or  anomalous  values   •  Perform  statistical  summaries   •  Use  illegal  data  @ilter   •  Look  for  outliers   60   50     40   30   20   10   0   0   5   10   15   20   25   30   35  
  38. 38. •  ScientiTic  context   4.  Metadata  basics   •  Scienti@ic  reason  why  the  data  were   collected   •  What  data  were  collected  •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  @ile(s)  in  the   •  Environmental  conditions  during   data  set   collection   •  Date  the  data  set  was  last  modi@ied   •  Where  collected  &  spatial  resolution   •  Example  data  @ile  records  for  each  data   When  collected  &  temporal  resolution   type  @ile   •  Standards  or  calibrations  used   •  Pertinent  companion  @iles   •  Information  about  parameters   •  List  of  related  or  ancillary  data  sets   •  How  each  was  measured  or  produced   •  Software  (including  version  number)   •  Units  of  measure   used  to  prepare/read    the  data  set   •  Format  used  in  the  data  set   •  Data  processing  that  was  performed   •  Precision  &  accuracy  if  known  •  Personnel  &  stakeholders   •  Information  about  data   •  Who  collected     •  De@initions  of  codes  used   •  Who  to  contact  with  questions   •  Quality  assurance  &  control  measures   •  Funders   •  Known  problems  that  limit  data  use  (e.g.   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  39. 39. 4.  Metadata  basics  •  Provides  structure  to  describe  data   Common  terms    |    deVinitions    |    language    |    structure  •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…    •  Tools  for  creating  metadata  @iles    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)    
  40. 40. 5.  WorkTlows   Simplest  workTlows:  commented  scripts,  Vlow  charts  Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  41. 41. 5.  WorkTlows  Fancy  Schmancy:  Kepler   Resulting  output   https://kepler-­‐project.org  
  42. 42. 5.  WorkTlows   WorkTlows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  Vindings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  43. 43. 6.  Data  stewardship  &  reuse  Use  stable  formats      csv,  txt,  tiff  Create  back-­‐up  copies     original,  near,  far  Periodically  test  ability  to  restore  information   Modified from R. Cook  
  44. 44. 6.  Data  stewardship  &  reuse   Where  do  I  put  it?   Insitutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  45. 45. 6.  Data  stewardship  &  reuse   Data  Citation:  Why  everyone  should  do  it   Allow  readers  to  @ind  data  products   Get  credit  for  data  and  publications   Promote  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversi@ication  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  @ishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20    Learn  more  at  www.datacite.org   Modified from R. Cook  
  46. 46. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Work@lows   6.  Data  stewardship  &  reuse   7.  Planning  
  47. 47. 1.  Planning  What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your   data  during  and  after  you  complete  your  research  
  48. 48. 1.  Planning   Why  should  scientists  prepare  a  DMP?       Saves  time   Increases  ef@iciency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  49. 49. A  few  words  about  NSF  Data   Management  Plans  
  50. 50. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content   (where  existing  standards  are  absent  or  deemed  inadequate,  this  should   be  documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  con@identiality,  security,  intellectual  property,  or   other  rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  51. 51. Don’t  forget:  Budget  •  Costs  of  data  preparation  &  documentation   Hardware,  software   Personnel   Archive  fees  •  How  costs  will  be  paid     Request  funding!   dorrvs.com  
  52. 52. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over   time  (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  de@initions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &   program  of@icer    *UnofVicially   Help  from  Jennifer  Schopf,  NSF  
  53. 53. NSF’s  Vision*   DMPs  are  a  good  Tirst  step  towards  improving  data   stewardship   –  starting  discussion   –  scientists  learning  about  data  management   Additional  expertise  on  panels  to  effectively   evaluate  DMPs  (?)   Working  group  will  assess  outcomes    *UnofVicially      
  54. 54. dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &  updates  
  55. 55. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices  1.  Mistakes  scientists  make    
  56. 56. “A  transformation  in  the  conduct  of  a  segment  of  scientiVic   research  by  enabling  and  promoting  publishing,  sharing,   and  archiving  of  tabular  data”  Increase    interoperability   =  Sharing      publishability   =  Publishing      archivability               =  Archiving    Focus  on  atmospheric,  ecological,  hydrological,  and  oceanographic  data  
  57. 57. Open  Source  &  Free     Excel  Add-­‐in    Software  program  that  extends  the  capabilities  of  larger  programs  Complements  basic  Excel  functionality   From  www.webopedia.com   www.ablebits.com  
  58. 58. DCXL  Project  Deliverables  •  Excel  add-­‐in  •  Publicly  available  source  code  •  Technical  documentation  •  End  user  documentation    •  Publicly  available   requirements  
  59. 59. Process  Assess  needs  •  Quantitative   –  Surveys  
  60. 60. Process  Assess  needs  •  Quantitative   ? –  Surveys   –  Quick  poll  •  Qualitative   –  Interviews  
  61. 61. Process  Assess  needs  Gather  requirements     Locations    Conferences    UC  campus  visits    Remote/web-­‐based    
  62. 62. Process  Assess  needs  Gather  requirements     Stakeholders  &  contributors      Libraries    Scientists    Repositories    Experts:  MSR,  GBMF    Personnel  on  related  projects        
  63. 63. Process  Assess  needs  Gather  requirements   !Build  requirements  document   ! "#$%&!"()*+!#,-*)./!0.-!1234+!5-.643)! ! !"#$%&#()$#*#+,"% %%%%%!+)-#$"),.%/0%123)0/$+)2%1($2,)/+%1#+,#$4%123)0/$+)2%5)6),23%7)8$2$.% %%
  64. 64. Requirements  1  Ensure  compatibility  for  Excel  users  without  the  add-­‐in  2  Check  the  data  Tile  for  CSV  compatibility    2.1  Excel  performs  a  CSV  compatibility  check  on  the  data  Vile    2.2  Excel  generates  a  Compatibility  Report    3  Generate  metadata  that  is  linked  to  the  data  Tile    3.1  The  user  opens  an  existing  metadata  document  as  a  template    3.2  The  user  initiates  a  new  metadata  document    3.3  Excel  populates  Level  1  metadata  Vields    3.4  The  user  populates  Level  2  metadata  Vields    3.5  The  user  generates  labels  for  parameter  metadata    3.6  The  user  requests  standards  for  keywords  
  65. 65. Requirements   4  Generate  a  citation  for  the  data  Tile   5  Deposit  into  a  repository   5.1  The  user  authenticates  via  an  existing  relationship  with  the   designated  repository   5.2  The  user  is  directed  to  establish  a  relationship  with  a  repository   5.3  The  user  links  an  identiVier  to  the  data  Vile  via  the  designated   repository   5.4  Excel  performs  Pre-­‐Archiving  Tasks   5.5  The  user  submits  the  Excel  Vile  for  deposition   6  Appendix  A:  Metadata  Types   7  Appendix  B:  Citation  Format   8  Appendix  C:  Dictionary  of  Terms        
  66. 66. Process  Assess  needs  Gather  requirements  Build  requirements  document  Build  community   Libraries   Scientists   Repositories   Programmers/ Developers      
  67. 67. Why  are  you   promoting   Excel?  •  Everyone  uses  it  •  Features  that  make  it  good  for  data  organization  make  it   bad  for  archiving  •  Stopgap  measure  
  68. 68. Get  Involved  dcxl.cdlib.org       @dcxlCDL     www.facebook.com/ DCXLatCDL    
  69. 69. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices  1.  Mistakes  scientists  make    
  70. 70. UC3  Services   Where   should  I  put   Data  Repository   my  data?   Deposit    |    Manage    |    Share    |    Preserve   www.cdlib.org/services/uc3  
  71. 71. UC3  Services   How  do  I  get   a  unique   identiVier?   Create  &  manage  persistent  identi@iers   •  Precise  identi@ication  of  a  dataset   •  Credit  to  data  producers  and  data  publishers   •  A  link  from  the  traditional  literature  to  the   data   •  Research  metrics  for  datasets   www.cdlib.org/services/uc3  
  72. 72. DataONE   www.dataone.org   •  Data  Education  Tutorials   •  Database  of  best  practices     &  software  tools   •  Links  to  DMPTool   •  Primer  on  data   management   From  Flickr  by  Robert  Hruzek  
  73. 73. Data Management 101"dcxl.cdlib.org  •  Data  Education  Tutorials  •  Primer  on  data  management  •  Other  resources  
  74. 74. Toolbox:    DCXL  blog:  dcxl.cdlib.org  
  75. 75. Lisa  Federer    dcxl.cdlib.org  @dcxlCDL  www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser  

×