SlideShare a Scribd company logo
1 of 94
Download to read offline
Data	
  Management	
  for	
  Scientists	
  
                     	
  
                                                                                Reduce	
  your	
  workload	
  
                                                                                Reuse	
  your	
  ideas	
  
                                                                                Recycle	
  your	
  data	
  
                                                                                	
  

        From	
  Flickr	
  by	
  Mark	
  McLaughlin	
  	
  




Carly	
  Strasser,	
  PhD	
                                                                        UC	
  Santa	
  Cruz	
  
California	
  Digital	
  Library,	
  UC	
  Office	
  of	
  the	
  President	
                        February	
  2012	
  
carly.strasser@ucop.edu	
  
www.carlystrasser.net	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                3.  How	
  to	
  improve	
  
         2.  Data	
  management	
  landscape	
  
1.  Background	
  
	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  
B	
  




                                      A	
                                                                                                                                       C	
  




	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Pre	
  DataONE	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  .	
     DataONE	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  

                                                         Community	
  
           Cyberinfrastructure	
                        Engagement	
  &	
  
                                                          Outreach	
  




            From	
  Flickr	
  by	
  wetwebwork	
     Courtesy	
  of	
  DataONE	
  
What	
  role	
  can	
  
                                                             libraries	
  play	
  in	
  
                                                             data	
  education?	
  


                                         What	
  barriers	
  to	
  sharing	
  
                                           can	
  we	
  eliminate?	
  
            Why	
  don’t	
  people	
  
              share	
  data?	
  
                                     Is	
  data	
  management	
  
Do	
  attitudes	
  about	
  
                                            being	
  taught?	
  
  sharing	
  differ	
  
among	
  disciplines?	
  
                                           How	
  can	
  we	
  promote	
  storing	
  
                                              data	
  in	
  repositories?	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                3.  How	
  to	
  improve	
  
         2.  Data	
  management	
  landscape	
  
1.  Background	
  
	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
                                                                                                                 From	
  Flickr	
  by	
  Flickmor	
  




                                                          From	
  Flickr	
  by	
  	
  deltaMike	
  
                                                                                                                                                                       Digital	
  data	
  




                                             www.woodrow.org	
  
                                                                                            C.	
  Strasser	
  




                                                                                                                                                        Courtesey	
  of	
  WHOI	
  
 From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
Digital	
  data	
  
       +	
  	
  
Complex	
  analyses	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
UGLY TRUTH
                                                    Many	
  
                                                    Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                    scientists…	
  	
  
                                                    	
  
5shortessays.blogspot.com	
  



                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
Data	
  Hangover	
  
                   	
  


What	
  happened?	
  



                        From	
  Flickr	
  by	
  SteveMcN	
  
Where	
  data	
  end	
  up	
  
                                                       From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                  www




                         blog.order2disorder.com	
  




                                                                                                  From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                      From	
  Flickr	
  by	
  csessums	
  
                                                                          Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Who	
  cares?	
  
       	
  

                                                      From	
  Flickr	
  by	
  Redden-­‐McAllister	
  




       From	
  Flickr	
  by	
  AJC1	
     www.rba.gov.au	
  
Where	
  data	
  end	
  up	
  
                                                                    From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                               www




  Data	
  
                                                                                         www
Metadata	
  
                             From	
  Flickr	
  by	
  torkildr	
  




                                                                                       Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Data	
  
   Reuse	
  

   Data	
  
  Sharing	
  

   Data	
  
Management	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
Trends	
  in	
  Data	
  Archiving	
  
Journal	
  publishers	
  
Joint	
  Data	
  Archiving	
  Agreement	
  
	
  
Data	
  Papers	
  etc.	
  
Ecological	
  Archives,	
  Beyond	
  the	
  PDF	
  
	
  
Funders	
  
Data	
  management	
  requirements	
  
	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                3.  Best	
  practices	
  
         2.  Data	
  management	
  landscape	
  
1.  Background	
  
	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
  
2	
  tables	
                             Random	
  notes	
  

C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab     Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore           Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26          -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26            0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
          	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Random	
  stats	
  output	
  


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00


                                                                                                                                                                                                                                                                                    27	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Create	
  unique	
  identifiers	
  
     •  Decide	
  on	
  naming	
  scheme	
  early	
  
     •  Create	
  a	
  key	
  
     •  Different	
  for	
  each	
  sample	
  




   From	
  Flickr	
  by	
  zebbie	
          From	
  Flickr	
  by	
  sjbresnahan	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Consistent	
  within	
  columns	
  
                                    – only	
  numbers,	
  dates,	
  or	
  text	
  
                      •  Consistent	
  names,	
  codes,	
  formats	
  




Modified	
  from	
  K.	
  Vanderbilt	
  	
  
                                                                                     From	
  Pink	
  Floyd,	
  The	
  Wall	
  	
  	
  themurkyfringe.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Reduce	
  possibility	
  
                         of	
  manual	
  error	
  by	
  
                         constraining	
  entry	
  
                         choices	
  


                    Excel	
  lists	
  
                         Data   Google	
  Docs	
  
                                  	
  
                                       Forms	
  
                   validataion	
  

Modified	
  from	
  K.	
  Vanderbilt	
  	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Identify	
  missing	
  data	
  
   •  Numeric	
  fields:	
  distinct	
  value	
  (e.g.	
  9999)	
  
   •  Text	
  fields:	
  NULL	
  or	
  NA	
  	
  
   •  Use	
  data	
  flags	
  in	
  a	
  separate	
  column	
  to	
  qualify	
  empty	
  cells	
  


                                                                   M1	
  =	
  missing;	
  no	
  
                                                                   sample	
  collected	
  
                                                                   E1	
  =	
  estimated	
  from	
  
                                                                   grab	
  sample	
  
2.	
  Data	
  collection	
  &	
  organization	
  
	
  	
  
           Create	
  parameter	
  table	
  
           Create	
  a	
  site	
  table	
  




                                              From	
  doi:10.3334/ORNLDAAC/777	
  

From	
  doi:10.3334/ORNLDAAC/777	
  


                                                                      From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  
      SPREADSHEETS:
           THE GOOD


Quick	
  on	
  the	
  draw	
  	
  
       Clickety-­‐click	
  and	
  you’re	
  ready	
  to	
  fire	
  
Always	
  there	
  in	
  time	
  	
  
   	
  Everyone	
  has	
  Excel	
  
Smarter	
  than	
  he	
  lets	
  on	
  
   	
  Stats,	
  Pivot	
  tables,	
  VB	
  scripts	
  
Cleans	
  up	
  real	
  pretty	
  
   	
  Graphics,	
  fonts,	
  colors,	
  borders	
  
                                                                     From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  
      SPREADSHEETS:
            THE BAD


Shoot	
  first	
  ask	
  later	
  
       Click&fire	
  Click&fire	
  Click&fire	
  
No	
  scruples	
  
   	
  Delete	
  row,	
  click&fire,	
  ctrl-­‐x/ctrl-­‐c,	
  click&fire,	
  re-­‐sort,	
  save	
  
Talks	
  a	
  good	
  story	
  but	
  not	
  much	
  education	
  
   	
  Stats	
  


                                                                                     From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  
       SPREADSHEETS:
            THE UGLY

Ill-­‐mannered	
  
       Takes	
  data	
  prisoner;	
  conflates	
  raw	
  and	
  summary	
  data	
  
Gaudy	
  
       Use	
  of	
  visual	
  cues	
  as	
  metadata:	
  color,	
  font,	
  border	
  
Shifty	
  
       Cross-­‐linking	
  worksheets	
  sets	
  up	
  “invisible”	
  dependencies	
  
Shiftless	
  
       No	
  provenance	
  
The	
  more	
  complicated	
  your	
  spreadsheet,	
  the	
  uglier	
  it	
  gets	
  for	
  
use	
  with	
  other	
  software	
  
	
                                                                                     From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  
                                          All	
  of	
  the	
  things	
  
                                          that	
  make	
  Excel	
  
                                         great	
  for	
  data	
  are	
  
                                         bad	
  for	
  archiving!	
  


1.    Create	
  archive-­‐ready	
  raw	
  data	
  
2.    Put	
  it	
  somewhere	
  special	
  
3.    Have	
  your	
  fun	
  with	
  fancy	
  Excel	
  techniques	
  
4.    Keep	
  archiving	
  in	
  mind	
  
2.	
  Data	
  collection	
  &	
  organization	
  
                                                             What	
  about	
  
                                                             databases?	
  
A	
  relational	
  database	
  is	
  	
  
      	
  A	
  set	
  of	
  tables	
  
      	
  Relationships	
  among	
  the	
  tables	
  
      	
  A	
  language	
  to	
  specify	
  &	
  query	
  the	
  tables	
  




                                                                              From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Sample	
  sites	
             samples	
  
                              Samples	
                           Species	
  
*siteID	
                     *sampleID	
  
                              *sampleID	
                         *speciesID	
  
site_name	
                   siteID	
  
                              siteID	
  
                              sample_date	
  
                                                                  species_name	
  
latitude	
                    sample_date	
                       common_name	
  
                              speciesID	
  
longitude	
                   speciesID	
  
                              height	
                            family	
  
description	
                 height	
  
                              flowering	
                          order	
  
                              flowering	
  
                              flag	
  
                              comments	
  
                              flag	
  
                              comments	
  

                      *	
  Denotes	
  the	
  primary	
  key	
  




                                                                                From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Databases	
  often	
  enforce	
  good	
  practice	
  
	
  
Must	
  define	
  	
                                         A	
   B	
   C	
  
                                                                                             D	
      E	
  

     	
  Tables	
                                           1	
   2	
   3	
  
                                                                                             10	
     11	
  


     	
  Attributes	
                                       4	
   5	
   6	
  
                                                                                             12	
     13	
  

                                                                                             14	
     15	
  
     	
  Relationships	
  (constraints)	
                   7	
   8	
   9	
  
                                                                                             16	
     17	
  
	
  
Databases	
  provide:	
  
     	
  Scalability:	
  millions+	
  records	
  
     	
  Features	
  for	
  sub-­‐setting,	
  querying,	
  sorting	
  
     	
  Scripted	
  language:	
  SQL	
  	
  
     	
  Reduced	
  redundancy	
  &	
  potential	
  data	
  entry	
  errors	
  

                                                                                  From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Spreadsheets	
                                              Databases	
  
•  Good	
  for	
  simple,	
  self-­‐contained	
             •  Works	
  well	
  with	
  lots	
  of	
  data	
  
   charts,	
  graphs,	
  calculations	
                     •  Easy	
  to	
  query	
  and	
  subset	
  data	
  
•  Handy	
  for	
  collecting	
  raw	
  data	
              •  Data	
  fields	
  are	
  constrainted	
  
•  Flexible	
  cell	
  content	
  type	
                    •  Columns	
  cannot	
  be	
  sorted	
  
But…	
                                                         independently	
  of	
  each	
  other	
  
•  Hard	
  to	
  subset	
  or	
  sort	
                     •  Normalization	
  reduces	
  data	
  entry	
  
•  Lack	
  “record”	
  integrity:	
  can	
  sort	
  a	
        and	
  potential	
  for	
  error	
  
   column	
  independently	
  of	
  all	
  others	
         But…	
  
•  Harder	
  to	
  maintain	
  as	
  complexity	
           •  More	
  to	
  learn	
  	
  
   and	
  size	
  of	
  data	
  grows	
                     •  Harder	
  to	
  use	
  




                                                                                                     From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  


         You	
  should	
  invest	
  time	
  in	
  learning	
  databases	
  if	
  	
  
          	
  your	
  data	
  sets	
  are	
  large	
  or	
  complex	
  
         	
  


         Consider	
  investing	
  time	
  in	
  learning	
  databases	
  if	
  
          	
  your	
  data	
  are	
  small	
  and	
  humble	
  
          	
  you	
  ever	
  intend	
  to	
  share	
  your	
  data	
  
          	
  you	
  are	
  <	
  30	
  years	
  old	
  



From	
  Mark	
  Schildhauer	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Use	
  descriptive	
  file	
  names	
  




                                         PhDcomics.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

   	
  Use	
  descriptive	
  file	
  names	
  *	
  
       •  Unique	
  
       •  Reflect	
  contents	
  

Bad:	
       	
  Mydata.xls	
              Better: 	
  Eaffinis_nanaimo_2010_counts.xls	
  
   	
        	
  2001_data.csv	
  
   	
        	
  best	
  version.txt	
  
                                                Study	
                          Year	
  
                                              organism	
      Site	
  
                                                             name	
                                       What	
  was	
  
                                                                                                          measured	
  	
  



           *Not	
  for	
  everyone	
  
                                                                         From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Organize	
  files	
  	
  logically	
  


                      Biodiversity	
  


                              Lake	
  


                              Experiments	
   Biodiv_H20_heatExp_2005to2008.csv	
  
                                                 Biodiv_H20_predatorExp_2001to2003.csv	
  
                                                 …	
  
                               Field	
  work	
   Biodiv_H20_PlanktonCount_2001toActive.csv	
  
                                                 Biodiv_H20_ChlAprofiles_2003.csv	
  
                                                 …	
  
                                                 	
  
                           Grassland	
  
                                                                                            From	
  S.	
  Hampton	
  
2.	
  Data	
  collection	
  &	
  organization	
  

	
  Preserve	
  information	
                                            R	
  script	
  for	
  processing	
  &	
  
                                                                                                   analysis	
  
 •  Keep	
  raw	
  data	
  raw	
  
 •  Use	
  scripts	
  to	
  process	
  data	
                     	
  
        	
  &	
  save	
  them	
  with	
  data	
  

                                  Raw	
  data	
  as	
  .csv	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  

Before	
  data	
  collection	
  
•  Define	
  &	
  enforce	
  standards	
  
•  Assign	
  responsibility	
  for	
  data	
  quality	
  




                                                            From	
  Flickr	
  by	
  StacieBee	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  

During	
  data	
  collection/entry	
  
    •  Minimize	
  manual	
  entry	
  
    •  Use	
  double	
  entry	
  
    •  Use	
  text-­‐to-­‐speech	
  program	
  
       to	
  read	
  data	
  back	
  
    •  Use	
  a	
  database	
  
    •  Document	
  changes	
  




                                                              From	
  Flickr	
  by	
  schock	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  

After	
  data	
  entry	
  
•  Check	
  for	
  missing,	
  impossible,	
  
   anomalous	
  values	
  
•  Perform	
  statistical	
  summaries	
  	
  
•  Look	
  for	
  outliers	
  
        •  Normal	
  probability	
  plots	
  
        •  Regression	
  
        •  Scatter	
  plots	
                    60	
  
                                                 50	
  
                                                 40	
  
        •  Maps	
                                30	
  
                                                 20	
  
                                                 10	
  
                                                   0	
  
                                                           0	
     10	
     20	
     30	
     40	
  




	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
  
4.	
  Metadata	
  basics	
     Why	
  are	
  you	
  
                                What	
  is	
  
                               promoting	
  
                               metadata?	
  
                                 Excel?	
  
4.	
  Metadata	
  basics	
  

    	
  	
  Metadata	
  =	
  Data	
  reporting	
  
                                            	
  



      WHO	
  created	
  the	
  data?	
  
      WHAT	
  is	
  the	
  content	
  of	
  the	
  data	
  set?	
  
      WHEN	
  was	
  it	
  created?	
  
      WHERE	
  was	
  it	
  collected?	
  
      HOW	
  was	
  it	
  developed?	
  
      WHY	
  was	
  it	
  developed?	
  
•    Scientific	
  context	
  

       4.	
  Metadata	
  basics	
                                                          •       Scientific	
  reason	
  why	
  the	
  data	
  were	
  
                                                                                                   collected	
  
                                                                                           •       What	
  data	
  were	
  collected	
  
•    Digital	
  context	
                                                                  •       What	
  instruments	
  (including	
  model	
  &	
  
      •     Name	
  of	
  the	
  data	
  set	
                                                     serial	
  number)	
  were	
  used	
  
      •     The	
  name(s)	
  of	
  the	
  data	
  file(s)	
  in	
  the	
  data	
           •       Environmental	
  conditions	
  during	
  collection	
  
            set	
                                                                          •       Where	
  collected	
  &	
  spatial	
  resolution	
  When	
  
      •     Date	
  the	
  data	
  set	
  was	
  last	
  modified	
                                 collected	
  &	
  temporal	
  resolution	
  
      •     Example	
  data	
  file	
  records	
  for	
  each	
  data	
                     •       Standards	
  or	
  calibrations	
  used	
  
            type	
  file	
                                                            •    Information	
  about	
  parameters	
  
      •     Pertinent	
  companion	
  files	
                                               •       How	
  each	
  was	
  measured	
  or	
  produced	
  
      •     List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
                     •       Units	
  of	
  measure	
  
      •     Software	
  (including	
  version	
  number)	
                                 •       Format	
  used	
  in	
  the	
  data	
  set	
  
            used	
  to	
  prepare/read	
  	
  the	
  data	
  set	
  
                                                                                           •       Precision	
  &	
  accuracy	
  if	
  known	
  
      •     Data	
  processing	
  that	
  was	
  performed	
  
                                                                                     •    Information	
  about	
  data	
  
•    Personnel	
  &	
  stakeholders	
  
                                                                                           •       Definitions	
  of	
  codes	
  used	
  
      •     Who	
  collected	
  	
  
                                                                                           •       Quality	
  assurance	
  &	
  control	
  measures	
  
      •     Who	
  to	
  contact	
  with	
  questions	
  
                                                                                           •       Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
      •     Funders	
                                                                              uncertainty,	
  sampling	
  problems)	
  	
  
                                                                                     •    How	
  to	
  cite	
  the	
  data	
  set	
  
4.	
  Metadata	
  basics	
  
                                                                                                            What	
  is	
  
                                                                                                           metadata?	
  
Select	
  the	
  appropriate	
  
metadata	
  standard	
  
•  Provides	
  structure	
  to	
  describe	
  data	
  
              Common	
  terms	
  	
  |	
  	
  definitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  

•  Lots	
  of	
  different	
  standards	
  
            	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  
•  Tools	
  for	
  creating	
  metadata	
  files	
  
            	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
     	
  
     	
  
4.	
  Metadata	
  basics	
     What	
  ds	
  a	
  
                               What	
  ioes	
  
                               metadata	
  
                               standard?	
  
                               look	
  like?	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
  
5.	
  Workflows	
  
  Workflow:	
  how	
  you	
  get	
  from	
  the	
  raw	
  data	
  to	
  the	
  final	
  
  products	
  of	
  your	
  research	
  
                                                                        	
  



         Simple	
  workflows:	
  flow	
  charts	
  
       Temperature	
  
          data	
  
                                                                    Data	
  import	
  into	
  R	
     Data	
  in	
  R	
  
            Salinity	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                       format	
  
             data	
  
                                                                     Quality	
  control	
  &	
  
                                               “Clean”	
  T	
         data	
  cleaning	
  
                                               &	
  S	
  data	
  

                                                                    Analysis:	
  mean,	
  SD	
  
                                                                                                       Summary	
  
                                                                                                       statistics	
  

                                                                    Graph	
  production	
  
5.	
  Workflows	
  
  Workflow:	
  how	
  you	
  get	
  from	
  the	
  raw	
  data	
  to	
  the	
  final	
  
  products	
  of	
  your	
  research	
  
                                              	
  



         Simple	
  workflows:	
  commented	
  scripts	
  
         •  R,	
  SAS,	
  MATLAB	
  
         •  Well-­‐documented	
  code	
  is…	
  
                   Easier	
  to	
  review	
  
                   Easier	
  to	
  share	
                                       %	
  
                                                                              #	
   $	
  
                   Easier	
  to	
  repeat	
  analysis	
  

                                                                               &	
  
5.	
  Workflows	
  
Fancy	
  Schmancy	
  workflows:	
  Kepler	
  
                                                        Resulting	
  output	
  




                     https://kepler-­‐project.org	
  
5.	
  Workflows	
  

 Workflows	
  enable	
  
 	
  
                                                                                                       From	
  Flickr	
  by	
  merlinprincesse	
  
        Reproducibility	
  
               	
  can	
  someone	
  independently	
  validate	
  findings?	
  
        Transparency	
  	
  
               	
  others	
  can	
  understand	
  how	
  you	
  arrived	
  at	
  your	
  results	
  
        Executability	
  	
  
               	
  others	
  can	
  re-­‐run	
  or	
  re-­‐use	
  your	
  analysis	
  
        	
  
5.	
  Workflows	
  

Minimally:	
  document	
  your	
  analysis	
  
  	
   	
  commented	
  code;	
  simple	
  flow-­‐chart	
  
	
                                                                             www.littlebytesoflife.com	
  


Emerging	
  workflow	
  applications	
  will…	
  
       −  Link	
  software	
  for	
  executable	
  end-­‐to-­‐end	
  analysis	
  
       −  Provide	
  detailed	
  info	
  about	
  data	
  &	
  analysis	
  
       −  Facilitate	
  re-­‐use	
  &	
  refinement	
  of	
  complex,	
  multi-­‐step	
  
          analyses	
  
       −  Enable	
  efficient	
  swapping	
  of	
  alternative	
  models	
  &	
  
          algorithms	
  
       −  Help	
  automate	
  tedious	
  tasks	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                                                                          From	
  Flickr	
  by	
  greensambaman	
  




          The	
  20-­‐Year	
  Rule	
  
     The	
  metadata	
  accompanying	
  a	
  
     data	
  set	
  should	
  be	
  written	
  for	
  a	
  
      user	
  20	
  years	
  into	
  the	
  future	
                    RULE	
  
                            	
  
                                 	
  



                                                              (National	
  Research	
  Council	
  1991)	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

Use	
  stable	
  formats	
  
     	
     	
  csv,	
  txt,	
  tiff	
  
Create	
  back-­‐up	
  copies	
  	
  
             original,	
  near,	
  far	
  
Periodically	
  test	
  ability	
  to	
  restore	
  information	
  




                                                                      Modified from R. Cook	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

         Store	
  your	
  data	
  in	
  a	
  repository	
  
                    Institutional	
  archive	
  
               Discipline/specialty	
  archive	
  
              DataCite	
  list	
  of	
  repostiories:	
  
                	
  www.datacite.org/repolist	
  
                                                          	
  
                                                          	
  
                                                                 	
  

                   From	
  Flickr	
  by	
  torkildr	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
   Data	
  Citation	
  
              Allows	
  readers	
  to	
  find	
  data	
  products	
  
              Get	
  credit	
  for	
  data	
  and	
  publications	
  
              Promotes	
  reproducibility	
  
              Better	
  measure	
  of	
  research	
  impact	
  
   Example:	
  
   Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
   diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
   characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
   	
  



   Learn	
  more	
  at	
  www.datacite.org	
  
                                                                                                          Modified from R. Cook	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Workflows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
   &	
  data	
  management	
  plans	
  in	
  
                    particular	
  
1.	
  Planning	
  

    What	
  is	
  a	
  data	
  management	
  plan?	
  
A	
  document	
  that	
  describes	
  what	
  you	
  will	
  do	
  with	
  your	
  data	
  
during	
  your	
  research	
  and	
  after	
  you	
  complete	
  your	
  research	
  



                                Data	
  
                              Hangover	
  
                                 	
  
1.	
  Planning	
  
              Why	
  should	
  I	
  prepare	
  a	
  DMP?	
  
        	
                           	
  
        Saves	
  time	
  
        Increases	
  efficiency	
  
        Easier	
  to	
  use	
  data	
  	
  	
  
        Others	
  can	
  understand	
  &	
  use	
  data	
  
        Credit	
  for	
  data	
  products	
  
        Funders	
  require	
  it	
  
	
  
NSF	
  DMP	
  Requirements	
  
 From	
  Grant	
  Proposal	
  Guidelines:	
  
	
  DMP	
  supplement	
  may	
  include:	
  
     1.  the	
  types	
  of	
  data,	
  samples,	
  physical	
  collections,	
  software,	
  curriculum	
  
         materials,	
  and	
  other	
  materials	
  to	
  be	
  produced	
  in	
  the	
  course	
  of	
  the	
  project	
  
  2.  	
  the	
  standards	
  to	
  be	
  used	
  for	
  data	
  and	
  metadata	
  format	
  and	
  content	
  (where	
  
      existing	
  standards	
  are	
  absent	
  or	
  deemed	
  inadequate,	
  this	
  should	
  be	
  
      documented	
  along	
  with	
  any	
  proposed	
  solutions	
  or	
  remedies)	
  
  3.  	
  policies	
  for	
  access	
  and	
  sharing	
  including	
  provisions	
  for	
  appropriate	
  
      protection	
  of	
  privacy,	
  confidentiality,	
  security,	
  intellectual	
  property,	
  or	
  other	
  
      rights	
  or	
  requirements	
  
  4.  	
  policies	
  and	
  provisions	
  for	
  re-­‐use,	
  re-­‐distribution,	
  and	
  the	
  production	
  of	
  
      derivatives	
  
  5.  	
  plans	
  for	
  archiving	
  data,	
  samples,	
  and	
  other	
  research	
  products,	
  and	
  for	
  
      preservation	
  of	
  access	
  to	
  them	
  
1.  Types	
  of	
  data	
  &	
  other	
  information	
  

•  Types	
  of	
  data	
  produced	
  
•  Relationship	
  to	
  existing	
  data	
  
•  How/when/where	
  will	
  the	
  data	
  be	
  captured	
  or	
  
   created?	
                                                                      C.	
  Strasser	
  




•  How	
  will	
  the	
  data	
  be	
  processed?	
  
•  Quality	
  assurance	
  &	
  quality	
  control	
  measures	
  
•  Security:	
  version	
  control,	
  backing	
  up	
                  biology.kenyon.edu	
  



•  Who	
  will	
  be	
  responsible	
  for	
  data	
  management	
  
   during/after	
  project?	
  

                                                                       From	
  Flickr	
  by	
  Lazurite	
  
2.  Data	
  &	
  metadata	
  standards	
  

•  What	
  metadata	
  are	
  needed	
  to	
  make	
  the	
  data	
  meaningful?	
  
•  How	
  will	
  you	
  create	
  or	
  capture	
  these	
  metadata?	
  	
  
                                                                                 Wired.com	
  

•  Why	
  have	
  you	
  chosen	
  particular	
  standards	
  and	
  approaches	
  
   for	
  metadata?	
  
3.  Policies	
  for	
  access	
  &	
  sharing	
  
       4.  Policies	
  for	
  re-­‐use	
  &	
  re-­‐distribution	
  
•  Are	
  you	
  under	
  any	
  obligation	
  to	
  share	
  data?	
  	
  
•  How,	
  when,	
  &	
  where	
  will	
  you	
  make	
  the	
  data	
  available?	
  	
  
•  What	
  is	
  the	
  process	
  for	
  gaining	
  access	
  to	
  the	
  data?	
  	
  
•  Who	
  owns	
  the	
  copyright	
  and/or	
  intellectual	
  property?	
  
•    Will	
  you	
  retain	
  rights	
  before	
  opening	
  data	
  to	
  wider	
  use?	
  How	
  long?	
  
•    Are	
  permission	
  restrictions	
  necessary?	
  
•    Embargo	
  periods	
  for	
  political/commercial/patent	
  reasons?	
  	
  
•    Ethical	
  and	
  privacy	
  issues?	
  
•    Who	
  are	
  the	
  foreseeable	
  data	
  users?	
  
•    How	
  should	
  your	
  data	
  be	
  cited?	
  
5.  Plans	
  for	
  archiving	
  &	
  preservation	
  

•  What	
  data	
  will	
  be	
  preserved	
  for	
  the	
  long	
  term?	
  For	
  how	
  long?	
  	
  	
  
•  Where	
  will	
  data	
  be	
  preserved?	
  
•  What	
  data	
  transformations	
  need	
  to	
  occur	
  before	
  
   preservation?	
  
•  What	
  metadata	
  will	
  be	
  submitted	
  
   alongside	
  the	
  datasets?	
  
•  Who	
  will	
  be	
  responsible	
  for	
  preparing	
  
   data	
  for	
  preservation?	
  Who	
  will	
  be	
  the	
  
   main	
  contact	
  person	
  for	
  the	
  archived	
  
   data?	
  

                                                                              From	
  Flickr	
  by	
  theManWhoSurfedTooMuch	
  
Don’t	
  forget:	
  Budget	
  
•  Costs	
  of	
  data	
  preparation	
  &	
  documentation	
  
           Hardware,	
  software	
  
           Personnel	
  
           Archive	
  fees	
  
•  How	
  costs	
  will	
  be	
  paid	
  	
  
           Request	
  funding!	
  



                                                                  dorrvs.com	
  
NSF’s	
  Vision*	
  


    DMPs	
  and	
  their	
  evaluation	
  will	
  grow	
  &	
  change	
  over	
  time	
  
    (similar	
  to	
  broader	
  impacts)	
  
    Peer	
  review	
  will	
  determine	
  next	
  steps	
  
    Community-­‐driven	
  guidelines	
  	
  
           –  Different	
  disciplines	
  have	
  different	
  definitions	
  of	
  acceptable	
  
              data	
  sharing	
  
           –  Flexibility	
  at	
  the	
  directorate	
  and	
  division	
  levels	
  
           –  Tailor	
  implementation	
  of	
  DMP	
  requirement	
  

    Evaluation	
  will	
  vary	
  with	
  directorate,	
  division,	
  &	
  program	
  
    officer	
  
    	
  
*Unofficially	
  
                                                                                Help	
  from	
  Jennifer	
  Schopf,	
  NSF	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                3.  Best	
  practices	
  
         2.  Data	
  management	
  landscape	
  
1.  Background	
  
	
  
E-­‐notebooks	
  &	
  online	
  science	
  	
  	
  

•    NoteBook	
  
•    ORNL	
  eNote	
  	
  
•    Evernote	
  
•    Google	
  Docs	
  
•    Blogs	
  
•    wikis	
  
•    TheLabNotebook.com	
  
•    NoteBookMaker	
  




                  TheLabNotebook.com!
DMPTool:	
  	
  	
  	
  	
  dmp.cdlib.org	
  




                         Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
               Create	
  	
  |	
  	
  edit	
  	
  |	
  	
  re-­‐use	
  	
  |	
  	
  share	
  	
  |	
  	
  save	
  	
  |	
  	
  generate	
  	
  
                                                   Open	
  to	
  community	
  	
  
                                      Links	
  to	
  institutional	
  resources	
  
                                Directorate	
  information	
  &	
  updates	
  
CDL	
  Services	
  for	
  UC	
  Community	
  


  Where	
  
should	
  I	
  put	
                             Data	
  Repository	
  
 my	
  data?	
           Deposit	
  	
  |	
  	
  Manage	
  	
  |	
  	
  Share	
  	
  |	
  	
  Preserve	
  




                                                  www.cdlib.org/services/uc3	
  
CDL	
  Services	
  for	
  UC	
  Community	
  


                Create	
  &	
  manage	
  persistent	
  identifiers	
  
                   •     Precise	
  identification	
  of	
  a	
  dataset	
  
                   •     Credit	
  to	
  data	
  producers	
  and	
  data	
  publishers	
  
                   •     A	
  link	
  from	
  the	
  traditional	
  literature	
  to	
  the	
  data	
  
                   •     Research	
  metrics	
  for	
  datasets	
  


Example:	
  
Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
	
  

                                                             www.cdlib.org/services/uc3	
  
Why	
  are	
  you	
  
                                                                                                 promoting	
  
                                                                                                   Excel?	
  


•    Open	
  source	
  add-­‐in	
  
•    Facilitate	
  data	
  management,	
  sharing,	
  archiving	
  for	
  scientists	
  
•    Focus	
  on	
  atmospheric,	
  ecological,	
  hydrological,	
  and	
  
     oceanographic	
  data	
  
•    Collecting	
  requirements	
  for	
  add-­‐in	
  from	
  scientists,	
  data	
  
     centers,	
  libraries	
  




                   Funders:	
  Gordon	
  and	
  Betty	
  Moore	
  Foundation,	
  Microsoft	
  Research	
  
Why	
  are	
  you	
  
                                                                                     promoting	
  
                                                                                       Excel?	
  




Everyone	
  uses	
  it	
  
Stopgap	
  measure	
  
	
  




	
  


       Funders:	
  Gordon	
  and	
  Betty	
  Moore	
  Foundation,	
  Microsoft	
  Research	
  
www.dataone.org	
  
 •    Data	
  Education	
  Tutorials	
  
 •    Database	
  of	
  best	
  practices	
  	
  &	
  software	
  tools	
  
 •    Links	
  to	
  DMPTool	
  
 •    Primer	
  on	
  data	
  management	
  
Data Management 101"




dcxl.cdlib.org	
  
www.carlystrasser.net	
  
                                                    Resources"




                            Slideshare link: this
                                presentation"
Handy	
  References	
  

Best	
  Practices	
  for	
  Preparing	
  Environmental	
  Data	
  Sets	
  
to	
  Share	
  and	
  Archive.	
  September	
  2010.	
  Hook,	
  
Santhana	
  Vannan,	
  Beaty,	
  Cook,	
  &	
  Wilson	
  
http://daac.ornl.gov/PI/BestPractices-­‐2010.pdf	
  

Some	
  Simple	
  Guidelines	
  for	
  Effective	
  Data	
  
Management.	
  Borer,	
  Seabloom,	
  Jones,	
  &	
  Schildhauer.	
  	
  
Bull	
  Ecol	
  Soc	
  Amer,	
  April	
  2009:	
  205-­‐214.	
  
	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                3.  Best	
  practices	
  
         2.  Data	
  management	
  landscape	
  
1.  Background	
  
	
  
Getting	
  down	
  &	
  




                                                               www.catfishingtipstoday.com	
  
dirty	
  with	
  your	
  data	
  


                        1.  Take	
  stock	
  
 Where	
  to	
          2.  Take	
  a	
  time	
  machine	
  
  begin?	
              3.  Break	
  it	
  down	
  
                        4.  Get	
  smart	
  
1.  Take	
  stock	
  

•  What	
  data	
  do	
  you	
  have?	
  
•  What	
  data	
  are	
  you	
  still	
  generating?	
  
•  What	
  does	
  your	
  workflow	
  look	
  like?	
  
•  Are	
  you	
  backing	
  up?	
  
•  How’s	
  your	
  filing	
  system?	
  
•  Etc…	
  




                                                            From	
  Flickr	
  by	
  charlie	
  llewellin	
  
2.  Take	
  a	
  time	
  machine	
  
                          Knowing	
  what	
  you	
  know	
  now,	
  how	
  would	
  you	
  plan	
  
                          for	
  this	
  project?	
  	
  
                                        –  File	
  structures	
  
                                        –  Metadata	
  generation	
  
                                        –  Naming	
  conventions	
  
                          Consider	
  writing	
  up	
  a	
  formal	
  data	
  management	
  plan	
  




From	
  Flickr	
  by	
  F1RSTBORN	
  
3.  Break	
  it	
  down	
  

You	
  now	
  have	
  a	
  vision.	
  
Break	
  into	
  manageable	
  




                                                                                           From	
  www.gonomad.com	
  
chunks	
  
     –  Set	
  a	
  final	
  deadline	
  
     –  Set	
  intermediate	
  deadlines	
  
     –  Break	
  down	
  tasks	
  to	
  meet	
  
        those	
  deadlines	
  




                                                        From	
  www.collegehumor.com	
  
     –  Be	
  reasonable	
  
4.  Get	
  smart	
  


Learn	
  from	
  mistakes	
  
Plan	
  better	
  next	
  time	
                     static.tvtropes.org	
  



Remember:	
  good	
  data	
  management	
  takes	
  	
  
        Time	
  
        Thoughtfulness	
  
        Planning	
  
        Resources	
  
dcxl.cdlib.org	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  

                        www.carlystrasser.net	
  
                    carlystrasser@gmail.com	
  
                               @carlystrasser	
  

More Related Content

What's hot

Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current LandscapeCarly Strasser
 
Open Data & Open Access
Open Data & Open AccessOpen Data & Open Access
Open Data & Open AccessCarly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
DCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsDCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsCarly Strasser
 
Data Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveData Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveCarly Strasser
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessCarly Strasser
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - finalKathy Fontaine
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...University of California Curation Center
 
DMPTool Webinar Series 1: Introduction to DMPTool
DMPTool Webinar Series 1: Introduction to DMPTool DMPTool Webinar Series 1: Introduction to DMPTool
DMPTool Webinar Series 1: Introduction to DMPTool Carly Strasser
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013University of Washington
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14Carly Strasser
 
DMPTool: Data Management Made Easier at CNI 2012
DMPTool: Data Management Made Easier at CNI 2012DMPTool: Data Management Made Easier at CNI 2012
DMPTool: Data Management Made Easier at CNI 2012Carly Strasser
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesASIS&T
 

What's hot (20)

Data Management: The Current Landscape
Data Management: The Current LandscapeData Management: The Current Landscape
Data Management: The Current Landscape
 
Open Data & Open Access
Open Data & Open AccessOpen Data & Open Access
Open Data & Open Access
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
DCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small DatasetsDCXL Lightning Talk: Archiving Small Datasets
DCXL Lightning Talk: Archiving Small Datasets
 
Data Management from a Scientist's Perspective
Data Management from a Scientist's PerspectiveData Management from a Scientist's Perspective
Data Management from a Scientist's Perspective
 
DMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for SuccessDMPTool at NNLM Research Lifecycle: Partnering for Success
DMPTool at NNLM Research Lifecycle: Partnering for Success
 
Rda nitrd 2015 berman - final
Rda nitrd 2015 berman  - finalRda nitrd 2015 berman  - final
Rda nitrd 2015 berman - final
 
DMPTool webinar 2011-10-19
DMPTool webinar 2011-10-19DMPTool webinar 2011-10-19
DMPTool webinar 2011-10-19
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
DMPTool Webinar 8: Data Curation Profiles and the DMPTool (presented by Jake ...
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
DataUp at ACRL 2013
DataUp at ACRL 2013DataUp at ACRL 2013
DataUp at ACRL 2013
 
Domain-specific Knowledge Extraction from the Web of Data
Domain-specific Knowledge Extraction from the Web of DataDomain-specific Knowledge Extraction from the Web of Data
Domain-specific Knowledge Extraction from the Web of Data
 
Polinter13
Polinter13Polinter13
Polinter13
 
DMPTool Webinar Series 1: Introduction to DMPTool
DMPTool Webinar Series 1: Introduction to DMPTool DMPTool Webinar Series 1: Introduction to DMPTool
DMPTool Webinar Series 1: Introduction to DMPTool
 
Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013Big Data Curricula at the UW eScience Institute, JSM 2013
Big Data Curricula at the UW eScience Institute, JSM 2013
 
DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14DMPTool for IMLS #WebWise14
DMPTool for IMLS #WebWise14
 
DMPTool: Data Management Made Easier at CNI 2012
DMPTool: Data Management Made Easier at CNI 2012DMPTool: Data Management Made Easier at CNI 2012
DMPTool: Data Management Made Easier at CNI 2012
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
RDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data servicesRDAP 15: You’re in good company: Unifying campus research data services
RDAP 15: You’re in good company: Unifying campus research data services
 

Viewers also liked

Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFCarly Strasser
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalJackie Wirz, PhD
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data LocallyErin D. Foster
 
Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015Jackie Wirz, PhD
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...mhaendel
 

Viewers also liked (9)

Science101 slideshare
Science101 slideshareScience101 slideshare
Science101 slideshare
 
Data Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UFData Herding for Scientists - IGERT Symposium at UF
Data Herding for Scientists - IGERT Symposium at UF
 
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_finalData101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Love Your Data Locally
Love Your Data LocallyLove Your Data Locally
Love Your Data Locally
 
Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
NGP Retreat Open Science 2015
NGP Retreat Open Science 2015NGP Retreat Open Science 2015
NGP Retreat Open Science 2015
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Deep phenotyping to aid identification of coding & non-coding rare disease v...
Deep phenotyping to aid identification  of coding & non-coding rare disease v...Deep phenotyping to aid identification  of coding & non-coding rare disease v...
Deep phenotyping to aid identification of coding & non-coding rare disease v...
 

Similar to Organize Scientific Data for Reuse

Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
 
Webinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataWebinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataCarly Strasser
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Carly Strasser
 
UCLA: Data Management for Scientists
UCLA: Data Management for ScientistsUCLA: Data Management for Scientists
UCLA: Data Management for ScientistsCarly Strasser
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Carly Strasser
 
Data Management Planning and the DMPTool
Data Management Planning and the DMPToolData Management Planning and the DMPTool
Data Management Planning and the DMPToolCarly Strasser
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekCarly Strasser
 
The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning Carly Strasser
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsCarly Strasser
 
UC Riverside: Data Management for Scientists
UC Riverside: Data Management for ScientistsUC Riverside: Data Management for Scientists
UC Riverside: Data Management for ScientistsCarly Strasser
 
DataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users GroupDataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users GroupCarly Strasser
 
DataUp Overview: AGU 2012
DataUp Overview: AGU 2012DataUp Overview: AGU 2012
DataUp Overview: AGU 2012Carly Strasser
 
DataUp: Data Curation for Excel
DataUp: Data Curation for Excel DataUp: Data Curation for Excel
DataUp: Data Curation for Excel Carly Strasser
 
Crisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeCrisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeAxel101
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeLiz Lyon
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsJian Qin
 

Similar to Organize Scientific Data for Reuse (20)

Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA Week
 
Webinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share DataWebinar on DataUp: Describe, Manage, and Share Data
Webinar on DataUp: Describe, Manage, and Share Data
 
Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012Landscape of Data Curation - Microsoft eScience 2012
Landscape of Data Curation - Microsoft eScience 2012
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
UCLA: Data Management for Scientists
UCLA: Data Management for ScientistsUCLA: Data Management for Scientists
UCLA: Data Management for Scientists
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012
 
Data Management Planning and the DMPTool
Data Management Planning and the DMPToolData Management Planning and the DMPTool
Data Management Planning and the DMPTool
 
DMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research WeekDMPTool Overview for UC Merced Research Week
DMPTool Overview for UC Merced Research Week
 
The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning The DMPTool: A Resource for Data Management Planning
The DMPTool: A Resource for Data Management Planning
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 
UC Riverside: Data Management for Scientists
UC Riverside: Data Management for ScientistsUC Riverside: Data Management for Scientists
UC Riverside: Data Management for Scientists
 
DataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users GroupDataUp: An overview for the DataONE Users Group
DataUp: An overview for the DataONE Users Group
 
Data Management Plans: Tips, Tricks and Tools
Data Management Plans: Tips, Tricks and ToolsData Management Plans: Tips, Tricks and Tools
Data Management Plans: Tips, Tricks and Tools
 
DataUp Overview: AGU 2012
DataUp Overview: AGU 2012DataUp Overview: AGU 2012
DataUp Overview: AGU 2012
 
DataUp: Data Curation for Excel
DataUp: Data Curation for Excel DataUp: Data Curation for Excel
DataUp: Data Curation for Excel
 
Crisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 AgeCrisis Information Management in the Web 3.0 Age
Crisis Information Management in the Web 3.0 Age
 
Informatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data DecadeInformatics Transform : Re-engineering Libraries for the Data Decade
Informatics Transform : Re-engineering Libraries for the Data Decade
 
Data Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future JobsData Science: An Emerging Field for Future Jobs
Data Science: An Emerging Field for Future Jobs
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 

More from Carly Strasser

Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsCarly Strasser
 
Cal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCarly Strasser
 
PLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsPLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsCarly Strasser
 
NISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLNISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLCarly Strasser
 
"Undergrad ecologists aren't learning data management" - ESA 2013
"Undergrad ecologists aren't learning data management" -  ESA 2013"Undergrad ecologists aren't learning data management" -  ESA 2013
"Undergrad ecologists aren't learning data management" - ESA 2013Carly Strasser
 

More from Carly Strasser (19)

Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 
Bren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheetsBren - UCSB - Spooky spreadsheets
Bren - UCSB - Spooky spreadsheets
 
Cal Poly - An Overview of Open Science
Cal Poly - An Overview of Open ScienceCal Poly - An Overview of Open Science
Cal Poly - An Overview of Open Science
 
PLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and AltmetricsPLOS ALM Talk on UC3 Services and Altmetrics
PLOS ALM Talk on UC3 Services and Altmetrics
 
NISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDLNISO Webinar on data curation services at the CDL
NISO Webinar on data curation services at the CDL
 
"Undergrad ecologists aren't learning data management" - ESA 2013
"Undergrad ecologists aren't learning data management" -  ESA 2013"Undergrad ecologists aren't learning data management" -  ESA 2013
"Undergrad ecologists aren't learning data management" - ESA 2013
 

Recently uploaded

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Organize Scientific Data for Reuse

  • 1. Data  Management  for  Scientists     Reduce  your  workload   Reuse  your  ideas   Recycle  your  data     From  Flickr  by  Mark  McLaughlin     Carly  Strasser,  PhD   UC  Santa  Cruz   California  Digital  Library,  UC  Office  of  the  President   February  2012   carly.strasser@ucop.edu   www.carlystrasser.net  
  • 2. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Data  management  landscape   1.  Background    
  • 3. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure  
  • 4. B   A   C                                                          Pre  DataONE                                                                                  .   DataONE  
  • 5. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   From  Flickr  by  wetwebwork   Courtesy  of  DataONE  
  • 6. What  role  can   libraries  play  in   data  education?   What  barriers  to  sharing   can  we  eliminate?   Why  don’t  people   share  data?   Is  data  management   Do  attitudes  about   being  taught?   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  • 7.
  • 8. Roadmap   4.  Toolbox     3.  How  to  improve   2.  Data  management  landscape   1.  Background    
  • 9. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  • 10. Digital  data   +     Complex  analyses  
  • 11. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 12. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 13. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 14. Data  Hangover     What  happened?   From  Flickr  by  SteveMcN  
  • 15. Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 16. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  • 17. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   www Metadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  • 18. Data   Reuse   Data   Sharing   Data   Management  
  • 19. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement  
  • 20. Trends  in  Data  Archiving   Journal  publishers   Joint  Data  Archiving  Agreement     Data  Papers  etc.   Ecological  Archives,  Beyond  the  PDF     Funders   Data  management  requirements    
  • 21. Roadmap   4.  Toolbox     3.  Best  practices   2.  Data  management  landscape   1.  Background    
  • 22. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse  
  • 23. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 24. 2  tables   Random  notes   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 25. Wash  Cres  Lake  Dec  15  Dont_Use.xls   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 26. Random  stats  output   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
  • 27. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 27  
  • 28. 2.  Data  collection  &  organization   Create  unique  identifiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   From  Flickr  by  zebbie   From  Flickr  by  sjbresnahan  
  • 29. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  • 30. 2.  Data  collection  &  organization   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Excel  lists   Data Google  Docs     Forms   validataion   Modified  from  K.  Vanderbilt    
  • 31. 2.  Data  collection  &  organization   Identify  missing  data   •  Numeric  fields:  distinct  value  (e.g.  9999)   •  Text  fields:  NULL  or  NA     •  Use  data  flags  in  a  separate  column  to  qualify  empty  cells   M1  =  missing;  no   sample  collected   E1  =  estimated  from   grab  sample  
  • 32. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 33. 2.  Data  collection  &  organization   SPREADSHEETS: THE GOOD Quick  on  the  draw     Clickety-­‐click  and  you’re  ready  to  fire   Always  there  in  time      Everyone  has  Excel   Smarter  than  he  lets  on    Stats,  Pivot  tables,  VB  scripts   Cleans  up  real  pretty    Graphics,  fonts,  colors,  borders   From  Mark  Schildhauer  
  • 34. 2.  Data  collection  &  organization   SPREADSHEETS: THE BAD Shoot  first  ask  later   Click&fire  Click&fire  Click&fire   No  scruples    Delete  row,  click&fire,  ctrl-­‐x/ctrl-­‐c,  click&fire,  re-­‐sort,  save   Talks  a  good  story  but  not  much  education    Stats   From  Mark  Schildhauer  
  • 35. 2.  Data  collection  &  organization   SPREADSHEETS: THE UGLY Ill-­‐mannered   Takes  data  prisoner;  conflates  raw  and  summary  data   Gaudy   Use  of  visual  cues  as  metadata:  color,  font,  border   Shifty   Cross-­‐linking  worksheets  sets  up  “invisible”  dependencies   Shiftless   No  provenance   The  more  complicated  your  spreadsheet,  the  uglier  it  gets  for   use  with  other  software     From  Mark  Schildhauer  
  • 36. 2.  Data  collection  &  organization   All  of  the  things   that  make  Excel   great  for  data  are   bad  for  archiving!   1.  Create  archive-­‐ready  raw  data   2.  Put  it  somewhere  special   3.  Have  your  fun  with  fancy  Excel  techniques   4.  Keep  archiving  in  mind  
  • 37. 2.  Data  collection  &  organization   What  about   databases?   A  relational  database  is      A  set  of  tables    Relationships  among  the  tables    A  language  to  specify  &  query  the  tables   From  Mark  Schildhauer  
  • 38. 2.  Data  collection  &  organization   Sample  sites   samples   Samples   Species   *siteID   *sampleID   *sampleID   *speciesID   site_name   siteID   siteID   sample_date   species_name   latitude   sample_date   common_name   speciesID   longitude   speciesID   height   family   description   height   flowering   order   flowering   flag   comments   flag   comments   *  Denotes  the  primary  key   From  Mark  Schildhauer  
  • 39. 2.  Data  collection  &  organization   Databases  often  enforce  good  practice     Must  define     A   B   C   D   E    Tables   1   2   3   10   11    Attributes   4   5   6   12   13   14   15    Relationships  (constraints)   7   8   9   16   17     Databases  provide:    Scalability:  millions+  records    Features  for  sub-­‐setting,  querying,  sorting    Scripted  language:  SQL      Reduced  redundancy  &  potential  data  entry  errors   From  Mark  Schildhauer  
  • 40. 2.  Data  collection  &  organization   Spreadsheets   Databases   •  Good  for  simple,  self-­‐contained   •  Works  well  with  lots  of  data   charts,  graphs,  calculations   •  Easy  to  query  and  subset  data   •  Handy  for  collecting  raw  data   •  Data  fields  are  constrainted   •  Flexible  cell  content  type   •  Columns  cannot  be  sorted   But…   independently  of  each  other   •  Hard  to  subset  or  sort   •  Normalization  reduces  data  entry   •  Lack  “record”  integrity:  can  sort  a   and  potential  for  error   column  independently  of  all  others   But…   •  Harder  to  maintain  as  complexity   •  More  to  learn     and  size  of  data  grows   •  Harder  to  use   From  Mark  Schildhauer  
  • 41. 2.  Data  collection  &  organization   You  should  invest  time  in  learning  databases  if      your  data  sets  are  large  or  complex     Consider  investing  time  in  learning  databases  if    your  data  are  small  and  humble    you  ever  intend  to  share  your  data    you  are  <  30  years  old   From  Mark  Schildhauer  
  • 42. 2.  Data  collection  &  organization   Use  descriptive  file  names   PhDcomics.com  
  • 43. 2.  Data  collection  &  organization    Use  descriptive  file  names  *   •  Unique   •  Reflect  contents   Bad:    Mydata.xls   Better:  Eaffinis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     *Not  for  everyone   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 44. 2.  Data  collection  &  organization   Organize  files    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  • 45. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  • 46. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 47. 3.  Quality  control  and  quality  assurance   Before  data  collection   •  Define  &  enforce  standards   •  Assign  responsibility  for  data  quality   From  Flickr  by  StacieBee  
  • 48. 3.  Quality  control  and  quality  assurance   During  data  collection/entry   •  Minimize  manual  entry   •  Use  double  entry   •  Use  text-­‐to-­‐speech  program   to  read  data  back   •  Use  a  database   •  Document  changes   From  Flickr  by  schock  
  • 49. 3.  Quality  control  and  quality  assurance   After  data  entry   •  Check  for  missing,  impossible,   anomalous  values   •  Perform  statistical  summaries     •  Look  for  outliers   •  Normal  probability  plots   •  Regression   •  Scatter  plots   60   50   40   •  Maps   30   20   10   0   0   10   20   30   40    
  • 50. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 51. 4.  Metadata  basics   Why  are  you   What  is   promoting   metadata?   Excel?  
  • 52. 4.  Metadata  basics      Metadata  =  Data  reporting     WHO  created  the  data?   WHAT  is  the  content  of  the  data  set?   WHEN  was  it  created?   WHERE  was  it  collected?   HOW  was  it  developed?   WHY  was  it  developed?  
  • 53. •  Scientific  context   4.  Metadata  basics   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected   •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  file(s)  in  the  data   •  Environmental  conditions  during  collection   set   •  Where  collected  &  spatial  resolution  When   •  Date  the  data  set  was  last  modified   collected  &  temporal  resolution   •  Example  data  file  records  for  each  data   •  Standards  or  calibrations  used   type  file   •  Information  about  parameters   •  Pertinent  companion  files   •  How  each  was  measured  or  produced   •  List  of  related  or  ancillary  data  sets   •  Units  of  measure   •  Software  (including  version  number)   •  Format  used  in  the  data  set   used  to  prepare/read    the  data  set   •  Precision  &  accuracy  if  known   •  Data  processing  that  was  performed   •  Information  about  data   •  Personnel  &  stakeholders   •  Definitions  of  codes  used   •  Who  collected     •  Quality  assurance  &  control  measures   •  Who  to  contact  with  questions   •  Known  problems  that  limit  data  use  (e.g.   •  Funders   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  • 54. 4.  Metadata  basics   What  is   metadata?   Select  the  appropriate   metadata  standard   •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…   •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)        
  • 55. 4.  Metadata  basics   What  ds  a   What  ioes   metadata   standard?   look  like?  
  • 56. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 57. 5.  Workflows   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflows:  flow  charts   Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  • 58. 5.  Workflows   Workflow:  how  you  get  from  the  raw  data  to  the  final   products  of  your  research     Simple  workflows:  commented  scripts   •  R,  SAS,  MATLAB   •  Well-­‐documented  code  is…   Easier  to  review   Easier  to  share   %   #   $   Easier  to  repeat  analysis   &  
  • 59. 5.  Workflows   Fancy  Schmancy  workflows:  Kepler   Resulting  output   https://kepler-­‐project.org  
  • 60. 5.  Workflows   Workflows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  • 61. 5.  Workflows   Minimally:  document  your  analysis      commented  code;  simple  flow-­‐chart     www.littlebytesoflife.com   Emerging  workflow  applications  will…   −  Link  software  for  executable  end-­‐to-­‐end  analysis   −  Provide  detailed  info  about  data  &  analysis   −  Facilitate  re-­‐use  &  refinement  of  complex,  multi-­‐step   analyses   −  Enable  efficient  swapping  of  alternative  models  &   algorithms   −  Help  automate  tedious  tasks  
  • 62. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 63. 6.  Data  stewardship  &  reuse   From  Flickr  by  greensambaman   The  20-­‐Year  Rule   The  metadata  accompanying  a   data  set  should  be  written  for  a   user  20  years  into  the  future   RULE       (National  Research  Council  1991)  
  • 64. 6.  Data  stewardship  &  reuse   Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   Modified from R. Cook  
  • 65. 6.  Data  stewardship  &  reuse   Store  your  data  in  a  repository   Institutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  • 66. 6.  Data  stewardship  &  reuse   Data  Citation   Allows  readers  to  find  data  products   Get  credit  for  data  and  publications   Promotes  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     Learn  more  at  www.datacite.org   Modified from R. Cook  
  • 67. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  stewardship  &  reuse   7.  Planning   &  data  management  plans  in   particular  
  • 68. 1.  Planning   What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your  data   during  your  research  and  after  you  complete  your  research   Data   Hangover    
  • 69. 1.  Planning   Why  should  I  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  • 70. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content  (where   existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  • 71. 1.  Types  of  data  &  other  information   •  Types  of  data  produced   •  Relationship  to  existing  data   •  How/when/where  will  the  data  be  captured  or   created?   C.  Strasser   •  How  will  the  data  be  processed?   •  Quality  assurance  &  quality  control  measures   •  Security:  version  control,  backing  up   biology.kenyon.edu   •  Who  will  be  responsible  for  data  management   during/after  project?   From  Flickr  by  Lazurite  
  • 72. 2.  Data  &  metadata  standards   •  What  metadata  are  needed  to  make  the  data  meaningful?   •  How  will  you  create  or  capture  these  metadata?     Wired.com   •  Why  have  you  chosen  particular  standards  and  approaches   for  metadata?  
  • 73. 3.  Policies  for  access  &  sharing   4.  Policies  for  re-­‐use  &  re-­‐distribution   •  Are  you  under  any  obligation  to  share  data?     •  How,  when,  &  where  will  you  make  the  data  available?     •  What  is  the  process  for  gaining  access  to  the  data?     •  Who  owns  the  copyright  and/or  intellectual  property?   •  Will  you  retain  rights  before  opening  data  to  wider  use?  How  long?   •  Are  permission  restrictions  necessary?   •  Embargo  periods  for  political/commercial/patent  reasons?     •  Ethical  and  privacy  issues?   •  Who  are  the  foreseeable  data  users?   •  How  should  your  data  be  cited?  
  • 74. 5.  Plans  for  archiving  &  preservation   •  What  data  will  be  preserved  for  the  long  term?  For  how  long?       •  Where  will  data  be  preserved?   •  What  data  transformations  need  to  occur  before   preservation?   •  What  metadata  will  be  submitted   alongside  the  datasets?   •  Who  will  be  responsible  for  preparing   data  for  preservation?  Who  will  be  the   main  contact  person  for  the  archived   data?   From  Flickr  by  theManWhoSurfedTooMuch  
  • 75. Don’t  forget:  Budget   •  Costs  of  data  preparation  &  documentation   Hardware,  software   Personnel   Archive  fees   •  How  costs  will  be  paid     Request  funding!   dorrvs.com  
  • 76. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over  time   (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  definitions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &  program   officer     *Unofficially   Help  from  Jennifer  Schopf,  NSF  
  • 77. Roadmap   4.  Toolbox     3.  Best  practices   2.  Data  management  landscape   1.  Background    
  • 78. E-­‐notebooks  &  online  science       •  NoteBook   •  ORNL  eNote     •  Evernote   •  Google  Docs   •  Blogs   •  wikis   •  TheLabNotebook.com   •  NoteBookMaker   TheLabNotebook.com!
  • 79. DMPTool:          dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &  updates  
  • 80. CDL  Services  for  UC  Community   Where   should  I  put   Data  Repository   my  data?   Deposit    |    Manage    |    Share    |    Preserve   www.cdlib.org/services/uc3  
  • 81. CDL  Services  for  UC  Community   Create  &  manage  persistent  identifiers   •  Precise  identification  of  a  dataset   •  Credit  to  data  producers  and  data  publishers   •  A  link  from  the  traditional  literature  to  the  data   •  Research  metrics  for  datasets   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     www.cdlib.org/services/uc3  
  • 82. Why  are  you   promoting   Excel?   •  Open  source  add-­‐in   •  Facilitate  data  management,  sharing,  archiving  for  scientists   •  Focus  on  atmospheric,  ecological,  hydrological,  and   oceanographic  data   •  Collecting  requirements  for  add-­‐in  from  scientists,  data   centers,  libraries   Funders:  Gordon  and  Betty  Moore  Foundation,  Microsoft  Research  
  • 83. Why  are  you   promoting   Excel?   Everyone  uses  it   Stopgap  measure       Funders:  Gordon  and  Betty  Moore  Foundation,  Microsoft  Research  
  • 84. www.dataone.org   •  Data  Education  Tutorials   •  Database  of  best  practices    &  software  tools   •  Links  to  DMPTool   •  Primer  on  data  management  
  • 86. www.carlystrasser.net   Resources" Slideshare link: this presentation"
  • 87. Handy  References   Best  Practices  for  Preparing  Environmental  Data  Sets   to  Share  and  Archive.  September  2010.  Hook,   Santhana  Vannan,  Beaty,  Cook,  &  Wilson   http://daac.ornl.gov/PI/BestPractices-­‐2010.pdf   Some  Simple  Guidelines  for  Effective  Data   Management.  Borer,  Seabloom,  Jones,  &  Schildhauer.     Bull  Ecol  Soc  Amer,  April  2009:  205-­‐214.    
  • 88. Roadmap   4.  Toolbox     3.  Best  practices   2.  Data  management  landscape   1.  Background    
  • 89. Getting  down  &   www.catfishingtipstoday.com   dirty  with  your  data   1.  Take  stock   Where  to   2.  Take  a  time  machine   begin?   3.  Break  it  down   4.  Get  smart  
  • 90. 1.  Take  stock   •  What  data  do  you  have?   •  What  data  are  you  still  generating?   •  What  does  your  workflow  look  like?   •  Are  you  backing  up?   •  How’s  your  filing  system?   •  Etc…   From  Flickr  by  charlie  llewellin  
  • 91. 2.  Take  a  time  machine   Knowing  what  you  know  now,  how  would  you  plan   for  this  project?     –  File  structures   –  Metadata  generation   –  Naming  conventions   Consider  writing  up  a  formal  data  management  plan   From  Flickr  by  F1RSTBORN  
  • 92. 3.  Break  it  down   You  now  have  a  vision.   Break  into  manageable   From  www.gonomad.com   chunks   –  Set  a  final  deadline   –  Set  intermediate  deadlines   –  Break  down  tasks  to  meet   those  deadlines   From  www.collegehumor.com   –  Be  reasonable  
  • 93. 4.  Get  smart   Learn  from  mistakes   Plan  better  next  time   static.tvtropes.org   Remember:  good  data  management  takes     Time   Thoughtfulness   Planning   Resources  
  • 94. dcxl.cdlib.org   @dcxlCDL   www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser