SlideShare a Scribd company logo
1 of 75
Data	
  Management	
  and	
  the	
  Digital	
  Curation	
  
for	
  Excel	
  (DCXL)	
  Project	
  	
  

Carly	
  Strasser	
  	
  
University	
  of	
  California	
  Curation	
  Center	
  at	
  CDL	
  
NSF	
  funded	
  DataNet	
  Project	
  
Of@ice	
  of	
  Cyberinfrastructure	
  


Enabling	
  universal	
  access	
  to	
  data	
  about	
  life	
  on	
  earth	
  
       and	
  the	
  environment	
  that	
  sustains	
  it	
  
B	
  




A	
             C	
  
NSF	
  funded	
  DataNet	
  Project	
  
Of@ice	
  of	
  Cyberinfrastructure	
  

                                                       Community	
  
         Cyberinfrastructure	
                        Engagement	
  &	
  
                                                        Outreach	
  




           From	
  Flickr	
  by	
  wetwebwork	
     Courtesy	
  of	
  DataONE	
  
What	
  role	
  can	
  
                                                       libraries	
  play	
  in	
  
                                                       data	
  education?	
  


     Why	
  don’t	
  people	
            What	
  barriers	
  to	
  
       share	
  data?	
                  sharing	
  can	
  we	
  
                                           eliminate?	
  

                                  Is	
  data	
  management	
  
Do	
  attitudes	
  about	
  
                                         being	
  taught?	
  
  sharing	
  differ	
  
         among	
  
      disciplines?	
                       How	
  can	
  we	
  promote	
  
                                             storing	
  data	
  in	
  
                                              repositories?	
  
Roadmap	
  




                                        5.  Tools	
  
                            4.  DCXL	
  	
  
                            	
  
                      3.  Best	
  practices	
  for	
  scientists	
  
           2.  Barriers	
  to	
  best	
  practices	
  
1.  Mistakes	
  scientists	
  make	
  
	
  
From	
  Flickr	
  by	
  	
  DW0825	
  
                                                                                                           From	
  Flickr	
  by	
  Flickmor	
  




                                                       From	
  Flickr	
  by	
  	
  deltaMike	
  
                                                                                                                                                                  Digital	
  data	
  




                                          www.woodrow.org	
  
                                                                                      C.	
  Strasser	
  




                                                                                                                                                  Courtesey	
  of	
  WHOI	
  
 From	
  Flickr	
  by	
  US	
  Army	
  Environmental	
  Command	
  
Digital	
  data	
  
       +	
  	
  
Complex	
  analyses	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
2	
  tables	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                              Peter's lab    Don't use - old data
                         Sample Type: Algal                                                                                                                            Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360          Shore          Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c       -1.26         -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364               1.26           0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                          From	
  Stephanie	
  Hampton	
  (2010)	
           	
  	
  
                                                                                                                                                           ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Random	
  notes	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                              Peter's lab    Don't use - old data
                         Sample Type: Algal                                                                                                                            Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360          Shore          Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c       -1.26         -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364               1.26           0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                          From	
  Stephanie	
  Hampton	
  (2010)	
           	
  	
  
                                                                                                                                                           ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                              Peter's lab    Don't use - old data
                         Sample Type: Algal                                                                                                                            Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360          Shore          Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c       -1.26         -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364               1.26           0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                          From	
  Stephanie	
  Hampton	
  (2010)	
           	
  	
  
                                                                                                                                                           ESA	
  Workshop	
  on	
  Best	
  Practices	
  
Collaboration	
  and	
  Data	
  Sharing	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00



                                                                                                                                                                                                                                                  13	
  
Random	
  stats	
  


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
Where	
  data	
  end	
  up	
  
                                                     From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                www




                       blog.order2disorder.com	
  




                                                                                                From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                     From	
  Flickr	
  by	
  csessums	
  
                                                                      Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Who	
  cares?	
  
       	
  


                                               From	
  Flickr	
  by	
  Redden-­‐McAllister	
  




 From	
  Flickr	
  by	
  AJC1	
     www.rba.gov.au	
  
Where	
  data	
  end	
  up	
  
                                 From	
  Flickr	
  by	
  diylibrarian	
  




                                                                            www




  Data	
  
                                                 www
Metadata	
  



                                                  Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Data	
  
   Reuse	
  

   Data	
  
  Sharing	
  

   Data	
  
Management	
  
UGLY TRUTH
                                                  Many	
  
                                                  Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                  scientists…	
  	
  
                                                  	
  
5shortessays.blogspot.com	
  



                                                                	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                         	
  
Roadmap	
  




                                        5.  Tools	
  
                            4.  DCXL	
  	
  
                            	
  
                      3.  Best	
  practices	
  for	
  scientists	
  
           2.  Barriers	
  to	
  best	
  practices	
  
1.  Mistakes	
  scientists	
  make	
  
	
  
Barriers	
  

Cost	
  




                   Time	
  

                              cultblender.wordpress.com	
  



                                            Software,	
  
   Personnel	
                              hardware	
  
Barriers	
  

Cost:	
  time,	
  personnel,	
  software,	
  hardware	
  
Culture	
  of	
  Science	
  
  •  Not	
  the	
  norm	
  
  •  Lack	
  of	
  training	
  
  •  Disparate	
  data	
  
Barriers	
  

Cost:	
  time,	
  personnel,	
  software,	
  hardware	
  
Culture	
  of	
  Science	
  
Loss	
  of	
  rights	
  or	
  bene:its	
  

                                   Misuse	
  of	
  
                                     data	
  


       Missed	
  
    opportunities	
  
                                                        Con@lict	
  
Barriers	
  

Cost:	
  time,	
  personnel,	
  software,	
  hardware	
  
Culture	
  of	
  Science	
  
Loss	
  of	
  rights	
  or	
  bene:its	
  
Lack	
  of	
  incentives	
                     Time	
  consuming	
  
                                                 &	
  expensive	
  

                         Reward	
  
                        structure	
  
                                               Few	
  
                                           requirements	
  
Are	
  Undergrads	
  Learning	
  About	
  Data	
  Management?	
  


                                            Importance	
  Versus	
  Assessment	
  
 •  Metadata	
  generation	
   40	
  
 •  Software	
  choice	
           35	
  
 •  File	
  naming	
  
                                   30	
  
 •  QAQC	
  
                                Important	
  
                                   25	
  
 •  Backing	
  up	
  	
  
                                   20	
  
 •  Work@lows	
  
                                   15	
  
 •  Data	
  sharing	
  
                                   10	
  
 •  Data	
  re-­‐use	
  
 •  Meta-­‐analysis	
               5	
  

 •  Reproducibility	
               0	
  
If	
  it’s	
  important,	
  why	
   0	
  
 •  Notebook	
  protocols	
  
                                                       10	
        20	
        30	
     40	
  
                                                                Assessed	
  
        isn’t	
  it	
  taught?	
  
 •  Databases	
  	
  
Barriers	
  to	
  Teaching	
  Data	
  Management	
  



                            Too	
                            Not	
  a	
  
        Not	
             advanced	
                        priority	
  
    appropriate	
  
       level	
  

                          Students	
             Time	
  
                         don’t	
  know	
                               No	
  
                          software	
  
                                                                       Lab	
  
             No	
  
          training	
                                        Covered	
  
                                       Too	
                 in	
  Lab	
  
                                       big	
  
Roadmap	
  




                                        5.  Tools	
  
                            4.  DCXL	
  	
  
                            	
  
                      3.  Best	
  practices	
  for	
  scientists	
  
           2.  Barriers	
  to	
  best	
  practices	
  
1.  Mistakes	
  scientists	
  make	
  
	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  
       organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Work@lows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Create	
  unique	
  identiTiers	
  
    •  Decide	
  on	
  naming	
  scheme	
  early	
  
    •  Create	
  a	
  key	
  
    •  Different	
  for	
  each	
  sample	
  




  From	
  Flickr	
  by	
  zebbie	
        From	
  Flickr	
  by	
  sjbresnahan	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                     •  Consistent	
  within	
  columns	
  
                                  – only	
  numbers,	
  dates,	
  or	
  text	
  
                     •  Consistent	
  names,	
  codes,	
  formats	
  




ModiVied	
  from	
  K.	
  Vanderbilt	
  	
  
                                                                                   From	
  Pink	
  Floyd,	
  The	
  Wall	
  	
  	
  themurkyfringe.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

         Standardize	
  
                      •  Reduce	
  
                         possibility	
  of	
  
                         manual	
  error	
  by	
  
                         constraining	
  
                         entry	
  choices	
  

         Excel	
  lists
                      	
  
                     Google	
  Docs
                                  	
  
Data	
  validataion   	
   Forms  	
  


 ModiVied	
  from	
  K.	
  Vanderbilt	
  	
  
2.	
  Data	
  collection	
  &	
  organization	
  
	
  	
  
           Create	
  parameter	
  table	
  
           Create	
  a	
  site	
  table	
  




                                          From	
  doi:10.3334/ORNLDAAC/777	
  

From	
  doi:10.3334/ORNLDAAC/777	
  


                                                                  From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

   	
  Use	
  descriptive	
  Tile	
  names	
   *	
  
       •  Unique	
  
       •  Re@lect	
  contents	
  

Bad:	
       	
  Mydata.xls	
              Better: 	
  Eaf@inis_nanaimo_2010_counts.xls	
  
   	
        	
  2001_data.csv	
  
   	
        	
  best	
  version.txt	
  
                                                Study	
                          Year	
  
                                              organism	
      Site	
  
                                                             name	
                                       What	
  was	
  
                                                                                                          measured	
  	
  



           *Not	
  for	
  everyone	
  
                                                                         From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Organize	
  Tiles	
  	
  logically	
  


                     Biodiversity	
  


                             Lake	
  


                             Experiments	
   Biodiv_H20_heatExp_2005to2008.csv	
  
                                                Biodiv_H20_predatorExp_2001to2003.csv	
  
                                                …	
  
                              Field	
  work	
   Biodiv_H20_PlanktonCount_2001toActive.csv	
  
                                                Biodiv_H20_ChlAprofiles_2003.csv	
  
                                                …	
  
                                                	
  
                          Grassland	
  
                                                                                           From	
  S.	
  Hampton	
  
2.	
  Data	
  collection	
  &	
  organization	
  

	
  Preserve	
  information	
                                   R	
  script	
  for	
  processing	
  &	
  
                                                                                          analysis	
  
 •  Keep	
  raw	
  data	
  raw	
  
 •  Use	
  scripts	
  to	
  process	
  data	
   	
  
       	
  &	
  save	
  them	
  with	
  data	
  

                                Raw	
  data	
  as	
  .csv	
  
2.	
  Data	
  collection	
  &	
  organization	
  
                                    All	
  of	
  the	
  things	
  that	
  
                                    make	
  Excel	
  great	
  for	
  data	
  
                                    organization	
  are	
  bad	
  for	
  
                                    archiving!	
  What	
  to	
  do?	
  




1.  Create	
  archive-­‐ready	
  raw	
  data	
  
2.  Put	
  it	
  somewhere	
  special	
  
3.  Have	
  your	
  fun	
  with	
  fancy	
  Excel	
  
    techniques	
  
4.  Keep	
  archiving	
  in	
  mind	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  

 De@ine	
  &	
  enforce	
  standards	
  
 Double	
  data	
  entry	
  
 Document	
  changes	
  
 Minimize	
  manual	
  data	
  entry	
  
 No	
  missing,	
  impossible,	
  or	
  anomalous	
  values	
  
        •  Perform	
  statistical	
  summaries	
  
        •  Use	
  illegal	
  data	
  @ilter	
  
        •  Look	
  for	
  outliers	
                 60	
  

                                                     50	
  
 	
  
                                                     40	
  

                                                     30	
  

                                                     20	
  

                                                     10	
  

                                                       0	
  
                                                               0	
     5	
     10	
     15	
     20	
     25	
     30	
     35	
  
•    ScientiTic	
  context	
  

 4.	
  Metadata	
  basics	
                                                         •       Scienti@ic	
  reason	
  why	
  the	
  data	
  were	
  
                                                                                            collected	
  
                                                                                    •       What	
  data	
  were	
  collected	
  
•    Digital	
  context	
                                                           •       What	
  instruments	
  (including	
  model	
  &	
  
      •     Name	
  of	
  the	
  data	
  set	
                                              serial	
  number)	
  were	
  used	
  
      •     The	
  name(s)	
  of	
  the	
  data	
  @ile(s)	
  in	
  the	
           •       Environmental	
  conditions	
  during	
  
            data	
  set	
                                                                   collection	
  
      •     Date	
  the	
  data	
  set	
  was	
  last	
  modi@ied	
                 •       Where	
  collected	
  &	
  spatial	
  resolution	
  
      •     Example	
  data	
  @ile	
  records	
  for	
  each	
  data	
                     When	
  collected	
  &	
  temporal	
  resolution	
  
            type	
  @ile	
                                                          •       Standards	
  or	
  calibrations	
  used	
  
      •     Pertinent	
  companion	
  @iles	
                                 •    Information	
  about	
  parameters	
  
      •     List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
              •       How	
  each	
  was	
  measured	
  or	
  produced	
  
      •     Software	
  (including	
  version	
  number)	
                          •       Units	
  of	
  measure	
  
            used	
  to	
  prepare/read	
  	
  the	
  data	
  set	
                  •       Format	
  used	
  in	
  the	
  data	
  set	
  
      •     Data	
  processing	
  that	
  was	
  performed	
                        •       Precision	
  &	
  accuracy	
  if	
  known	
  
•    Personnel	
  &	
  stakeholders	
                                         •    Information	
  about	
  data	
  
      •     Who	
  collected	
  	
                                                  •       De@initions	
  of	
  codes	
  used	
  
      •     Who	
  to	
  contact	
  with	
  questions	
                             •       Quality	
  assurance	
  &	
  control	
  measures	
  
      •     Funders	
                                                               •       Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
                                                                                            uncertainty,	
  sampling	
  problems)	
  	
  
                                                                              •    How	
  to	
  cite	
  the	
  data	
  set	
  
4.	
  Metadata	
  basics	
  

•  Provides	
  structure	
  to	
  describe	
  data	
  
              Common	
  terms	
  	
  |	
  	
  deVinitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  

•  Lots	
  of	
  different	
  standards	
  
            	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  
     	
  




•  Tools	
  for	
  creating	
  metadata	
  @iles	
  
            	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
5.	
  WorkTlows	
  

 Simplest	
  workTlows:	
  commented	
  scripts,	
  Vlow	
  charts	
  

Temperature	
  
   data	
  
                                                           Data	
  import	
  into	
  R	
     Data	
  in	
  R	
  
    Salinity	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                              format	
  
     data	
  
                                                            Quality	
  control	
  &	
  
                                     “Clean”	
  T	
          data	
  cleaning	
  
                                      &	
  S	
  data	
  

                                                           Analysis:	
  mean,	
  SD	
  
                                                                                               Summary	
  
                                                                                               statistics	
  

                                                           Graph	
  production	
  
5.	
  WorkTlows	
  
Fancy	
  Schmancy:	
  Kepler	
  
                                                       Resulting	
  output	
  




                    https://kepler-­‐project.org	
  
5.	
  WorkTlows	
  

 WorkTlows	
  enable	
  
 	
  
                                                                                                 From	
  Flickr	
  by	
  merlinprincesse	
  
        Reproducibility	
  
               	
  can	
  someone	
  independently	
  validate	
  Vindings?	
  
        Transparency	
  	
  
               	
  others	
  can	
  understand	
  how	
  you	
  arrived	
  at	
  your	
  results	
  
        Executability	
  	
  
               	
  others	
  can	
  re-­‐run	
  or	
  re-­‐use	
  your	
  analysis	
  
        	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

Use	
  stable	
  formats	
  
          	
                   	
  csv,	
  txt,	
  tiff	
  
Create	
  back-­‐up	
  copies	
  	
  
                 original,	
  near,	
  far	
  
Periodically	
  test	
  ability	
  to	
  restore	
  information	
  




                                                                      Modified from R. Cook	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                    Where	
  do	
  I	
  put	
  it?	
  
                  Insitutional	
  archive	
  
            Discipline/specialty	
  archive	
  
            DataCite	
  list	
  of	
  repostiories:	
  
              	
  www.datacite.org/repolist	
  
                                                         	
  
                                                         	
  
                                                                	
  

                  From	
  Flickr	
  by	
  torkildr	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
            Data	
  Citation:	
  Why	
  everyone	
  should	
  do	
  it	
  

                Allow	
  readers	
  to	
  @ind	
  data	
  products	
  
                Get	
  credit	
  for	
  data	
  and	
  publications	
  
                Promote	
  reproducibility	
  
                Better	
  measure	
  of	
  research	
  impact	
  
     Example:	
  
     Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
     diversi@ication	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
     characiform	
  @ishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
     	
  


Learn	
  more	
  at	
  www.datacite.org	
                                                          Modified from R. Cook	
  
Best	
  Practices	
  for	
  Data	
  Management	
  

   1.  Planning	
  
   2.  Data	
  collection	
  &	
  
       organization	
  
   3.  Quality	
  control	
  &	
  assurance	
  
   4.  Metadata	
  
   5.  Work@lows	
  
   6.  Data	
  stewardship	
  &	
  reuse	
  
   7.  Planning	
  
1.	
  Planning	
  

What	
  is	
  a	
  data	
  management	
  plan?	
  

   A	
  document	
  that	
  describes	
  what	
  you	
  will	
  do	
  with	
  your	
  
      data	
  during	
  and	
  after	
  you	
  complete	
  your	
  research	
  
1.	
  Planning	
  
       Why	
  should	
  scientists	
  prepare	
  a	
  DMP?	
  
        	
  
                              	
  
         Saves	
  time	
  
         Increases	
  ef@iciency	
  
         Easier	
  to	
  use	
  data	
  	
  	
  
         Others	
  can	
  understand	
  &	
  use	
  data	
  
         Credit	
  for	
  data	
  products	
  
         Funders	
  require	
  it	
  
	
  
A	
  few	
  words	
  about	
  NSF	
  Data	
  
       Management	
  Plans	
  
NSF	
  DMP	
  Requirements	
  

 From	
  Grant	
  Proposal	
  Guidelines:	
  
	
  DMP	
  supplement	
  may	
  include:	
  
     1.  the	
  types	
  of	
  data,	
  samples,	
  physical	
  collections,	
  software,	
  curriculum	
  
         materials,	
  and	
  other	
  materials	
  to	
  be	
  produced	
  in	
  the	
  course	
  of	
  the	
  project	
  
  2.  	
  the	
  standards	
  to	
  be	
  used	
  for	
  data	
  and	
  metadata	
  format	
  and	
  content	
  
      (where	
  existing	
  standards	
  are	
  absent	
  or	
  deemed	
  inadequate,	
  this	
  should	
  
      be	
  documented	
  along	
  with	
  any	
  proposed	
  solutions	
  or	
  remedies)	
  
  3.  	
  policies	
  for	
  access	
  and	
  sharing	
  including	
  provisions	
  for	
  appropriate	
  
      protection	
  of	
  privacy,	
  con@identiality,	
  security,	
  intellectual	
  property,	
  or	
  
      other	
  rights	
  or	
  requirements	
  
  4.  	
  policies	
  and	
  provisions	
  for	
  re-­‐use,	
  re-­‐distribution,	
  and	
  the	
  production	
  of	
  
      derivatives	
  
  5.  	
  plans	
  for	
  archiving	
  data,	
  samples,	
  and	
  other	
  research	
  products,	
  and	
  for	
  
      preservation	
  of	
  access	
  to	
  them	
  
Don’t	
  forget:	
  Budget	
  

•  Costs	
  of	
  data	
  preparation	
  &	
  documentation	
  
          Hardware,	
  software	
  
          Personnel	
  
          Archive	
  fees	
  
•  How	
  costs	
  will	
  be	
  paid	
  	
  
          Request	
  funding!	
  



                                                              dorrvs.com	
  
NSF’s	
  Vision*	
  

    DMPs	
  and	
  their	
  evaluation	
  will	
  grow	
  &	
  change	
  over	
  
    time	
  (similar	
  to	
  broader	
  impacts)	
  
    Peer	
  review	
  will	
  determine	
  next	
  steps	
  
    Community-­‐driven	
  guidelines	
  	
  
           –  Different	
  disciplines	
  have	
  different	
  de@initions	
  of	
  acceptable	
  
              data	
  sharing	
  
           –  Flexibility	
  at	
  the	
  directorate	
  and	
  division	
  levels	
  
           –  Tailor	
  implementation	
  of	
  DMP	
  requirement	
  

    Evaluation	
  will	
  vary	
  with	
  directorate,	
  division,	
  &	
  
    program	
  of@icer	
  
    	
  
*UnofVicially	
  
                                                                                Help	
  from	
  Jennifer	
  Schopf,	
  NSF	
  
NSF’s	
  Vision*	
  

  DMPs	
  are	
  a	
  good	
  Tirst	
  step	
  towards	
  improving	
  data	
  
  stewardship	
  
         –  starting	
  discussion	
  
         –  scientists	
  learning	
  about	
  data	
  management	
  

  Additional	
  expertise	
  on	
  panels	
  to	
  effectively	
  
  evaluate	
  DMPs	
  (?)	
  
  Working	
  group	
  will	
  assess	
  outcomes	
  
  	
  
*UnofVicially	
  
 	
  

  	
  
dmp.cdlib.org	
  




                 Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
      Create	
  	
  |	
  	
  edit	
  	
  |	
  	
  re-­‐use	
  	
  |	
  	
  share	
  	
  |	
  	
  save	
  	
  |	
  	
  generate	
  	
  
                                           Open	
  to	
  community	
  	
  
                             Links	
  to	
  institutional	
  resources	
  
                         Directorate	
  information	
  &	
  updates	
  
Roadmap	
  




                                        5.  Tools	
  
                            4.  DCXL	
  	
  
                            	
  
                      3.  Best	
  practices	
  for	
  scientists	
  
           2.  Barriers	
  to	
  best	
  practices	
  
1.  Mistakes	
  scientists	
  make	
  
	
  
“A	
  transformation	
  in	
  the	
  conduct	
  of	
  a	
  segment	
  of	
  scientiVic	
  
    research	
  by	
  enabling	
  and	
  promoting	
  publishing,	
  sharing,	
  
                       and	
  archiving	
  of	
  tabular	
  data”	
  

Increase	
   	
  interoperability	
   =	
  Sharing	
  
        	
       	
  publishability	
                          =	
  Publishing	
  
        	
       	
  archivability	
  	
  	
  	
  	
  	
  	
   =	
  Archiving	
  
   	
  
Focus	
  on	
  atmospheric,	
  ecological,	
  hydrological,	
  and	
  
oceanographic	
  data	
  
Open	
  Source	
  &	
  Free	
  	
  
           Excel	
  Add-­‐in	
  
                     	
  
Software	
  program	
  that	
  extends	
  the	
  capabilities	
  
of	
  larger	
  programs	
  
Complements	
  basic	
  Excel	
  functionality	
  
                                      From	
  www.webopedia.com	
  


                                                                      www.ablebits.com	
  
DCXL	
  Project	
  Deliverables	
  

•  Excel	
  add-­‐in	
  
•  Publicly	
  available	
  source	
  code	
  
•  Technical	
  documentation	
  
•  End	
  user	
  documentation	
  	
  
•  Publicly	
  available	
  
   requirements	
  
Process	
  

Assess	
  needs	
  
•  Quantitative	
  
   –  Surveys	
  
Process	
  

Assess	
  needs	
  
•  Quantitative	
  


                          ?
   –  Surveys	
  
   –  Quick	
  poll	
  
•  Qualitative	
  
   –  Interviews	
  
Process	
  

Assess	
  needs	
  
Gather	
  requirements	
  
	
  
        Locations	
  
             	
  Conferences	
  
             	
  UC	
  campus	
  visits	
  
             	
  Remote/web-­‐based	
  
        	
  
Process	
  

Assess	
  needs	
  
Gather	
  requirements	
  
	
  
        Stakeholders	
  &	
  contributors	
  	
  
                 	
  Libraries	
  
                 	
  Scientists	
  
                 	
  Repositories	
  
                 	
  Experts:	
  MSR,	
  GBMF	
  
                 	
  Personnel	
  on	
  related	
  projects	
  
        	
  	
  
        	
  
Process	
  

Assess	
  needs	
  
Gather	
  requirements	
                        !




Build	
  requirements	
  document	
             !
                              "#$%&!"'(')*+!#,-*)'./!0.-!1234+!5-.643)!
                                                                          !


                                         !"#$%&#'()$#*#+,"%
                                         %%%%%!+)-#$"),.%/0%123)0/$+)2%1($2,)/+%1#+,#$4%123)0/$+)2%5)6),23%7)8$2$.%
                                           %%
Requirements	
  
1 	
  Ensure	
  compatibility	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2 	
  Check	
  the	
  data	
  Tile	
  for	
  CSV	
  compatibility	
  
  	
  2.1 	
  Excel	
  performs	
  a	
  CSV	
  compatibility	
  check	
  on	
  the	
  data	
  Vile	
  
     	
  2.2 	
  Excel	
  generates	
  a	
  Compatibility	
  Report	
  	
  

3 	
  Generate	
  metadata	
  that	
  is	
  linked	
  to	
  the	
  data	
  Tile	
  
     	
  3.1   	
  The	
  user	
  opens	
  an	
  existing	
  metadata	
  document	
  as	
  a	
  template	
  
     	
  3.2   	
  The	
  user	
  initiates	
  a	
  new	
  metadata	
  document	
  
     	
  3.3   	
  Excel	
  populates	
  Level	
  1	
  metadata	
  Vields	
  
     	
  3.4   	
  The	
  user	
  populates	
  Level	
  2	
  metadata	
  Vields	
  
     	
  3.5   	
  The	
  user	
  generates	
  labels	
  for	
  parameter	
  metadata	
  
     	
  3.6   	
  The	
  user	
  requests	
  standards	
  for	
  keywords	
  
Requirements	
  
 4 	
  Generate	
  a	
  citation	
  for	
  the	
  data	
  Tile	
  
 5 	
  Deposit	
  into	
  a	
  repository	
  
            5.1 	
  The	
  user	
  authenticates	
  via	
  an	
  existing	
  relationship	
  with	
  the	
  
                    designated	
  repository	
  
            5.2 	
  The	
  user	
  is	
  directed	
  to	
  establish	
  a	
  relationship	
  with	
  a	
  repository	
  
            5.3 	
  The	
  user	
  links	
  an	
  identiVier	
  to	
  the	
  data	
  Vile	
  via	
  the	
  designated	
  
                    repository	
  
            5.4 	
  Excel	
  performs	
  Pre-­‐Archiving	
  Tasks	
  
            5.5 	
  The	
  user	
  submits	
  the	
  Excel	
  Vile	
  for	
  deposition	
  

 6 	
  Appendix	
  A:	
  Metadata	
  Types	
  
 7 	
  Appendix	
  B:	
  Citation	
  Format	
  
 8 	
  Appendix	
  C:	
  Dictionary	
  of	
  Terms	
  
 	
  	
  
 	
  
Process	
  

Assess	
  needs	
  
Gather	
  requirements	
  
Build	
  requirements	
  document	
  
Build	
  community	
  
  Libraries	
  
  Scientists	
  
  Repositories	
  
  Programmers/
  Developers	
  	
  
  	
  
Why	
  are	
  you	
  
                                                       promoting	
  
                                                         Excel?	
  
•    Everyone	
  uses	
  it	
  
•    Features	
  that	
  make	
  it	
  good	
  for	
  data	
  organization	
  make	
  it	
  
     bad	
  for	
  archiving	
  
•    Stopgap	
  measure	
  
Get	
  Involved	
  
dcxl.cdlib.org	
  	
  
	
        @dcxlCDL	
  
                       	
  




                      www.facebook.com/
                      DCXLatCDL	
  
                      	
  
Roadmap	
  




                                        5.  Tools	
  
                            4.  DCXL	
  	
  
                            	
  
                      3.  Best	
  practices	
  for	
  scientists	
  
           2.  Barriers	
  to	
  best	
  practices	
  
1.  Mistakes	
  scientists	
  make	
  
	
  
UC3	
  Services	
  


   Where	
  
 should	
  I	
  put	
                             Data	
  Repository	
  
  my	
  data?	
  
                          Deposit	
  	
  |	
  	
  Manage	
  	
  |	
  	
  Share	
  	
  |	
  	
  Preserve	
  




                                        www.cdlib.org/services/uc3	
  
UC3	
  Services	
  

 How	
  do	
  I	
  get	
  
   a	
  unique	
  
  identiVier?	
              Create	
  &	
  manage	
  persistent	
  identi@iers	
  
                             •  Precise	
  identi@ication	
  of	
  a	
  dataset	
  
                             •  Credit	
  to	
  data	
  producers	
  and	
  data	
  publishers	
  
                             •  A	
  link	
  from	
  the	
  traditional	
  literature	
  to	
  the	
  
                                data	
  
                             •  Research	
  metrics	
  for	
  datasets	
  




                                    www.cdlib.org/services/uc3	
  
DataONE	
  

                                           www.dataone.org	
  

 •    Data	
  Education	
  Tutorials	
  
 •    Database	
  of	
  best	
  practices	
  	
  
      &	
  software	
  tools	
  
 •    Links	
  to	
  DMPTool	
  
 •    Primer	
  on	
  data	
  
      management	
  




                                                                 From	
  Flickr	
  by	
  Robert	
  Hruzek	
  
Data Management 101"




dcxl.cdlib.org	
  
•    Data	
  Education	
  Tutorials	
  
•    Primer	
  on	
  data	
  management	
  
•    Other	
  resources	
  
Toolbox:	
  
	
  DCXL	
  blog:	
  dcxl.cdlib.org	
  
Lisa	
  Federer	
  
                                                      	
  



dcxl.cdlib.org	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  


                       www.carlystrasser.net	
  
                      carlystrasser@gmail.com	
  
                           @carlystrasser	
  

More Related Content

Similar to UCLA: Data Management for Librarians

Data Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc WorkshopData Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc WorkshopCarly Strasser
 
Data Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL courseData Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL courseCarly Strasser
 
Data Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopData Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopCarly Strasser
 
Data Stewardship for Researchers at UC Riverside
Data Stewardship for Researchers at UC RiversideData Stewardship for Researchers at UC Riverside
Data Stewardship for Researchers at UC RiversideCarly Strasser
 
DataUp Presentation at Cal Poly
DataUp Presentation at Cal PolyDataUp Presentation at Cal Poly
DataUp Presentation at Cal PolyCarly Strasser
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekCarly Strasser
 
CLIR Synchronous Session: DataUp
CLIR Synchronous Session: DataUpCLIR Synchronous Session: DataUp
CLIR Synchronous Session: DataUpCarly Strasser
 
CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325Jason Price, PhD
 
Asset Management for Infrastructure: Identification, Prioritization and Justi...
Asset Management for Infrastructure: Identification, Prioritization and Justi...Asset Management for Infrastructure: Identification, Prioritization and Justi...
Asset Management for Infrastructure: Identification, Prioritization and Justi...OHM Advisors
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsCarly Strasser
 

Similar to UCLA: Data Management for Librarians (10)

Data Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc WorkshopData Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc Workshop
 
Data Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL courseData Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL course
 
Data Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopData Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation Workshop
 
Data Stewardship for Researchers at UC Riverside
Data Stewardship for Researchers at UC RiversideData Stewardship for Researchers at UC Riverside
Data Stewardship for Researchers at UC Riverside
 
DataUp Presentation at Cal Poly
DataUp Presentation at Cal PolyDataUp Presentation at Cal Poly
DataUp Presentation at Cal Poly
 
Data Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA WeekData Herding for Scientists - UC Davis OA Week
Data Herding for Scientists - UC Davis OA Week
 
CLIR Synchronous Session: DataUp
CLIR Synchronous Session: DataUpCLIR Synchronous Session: DataUp
CLIR Synchronous Session: DataUp
 
CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325
 
Asset Management for Infrastructure: Identification, Prioritization and Justi...
Asset Management for Infrastructure: Identification, Prioritization and Justi...Asset Management for Infrastructure: Identification, Prioritization and Justi...
Asset Management for Infrastructure: Identification, Prioritization and Justi...
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 

More from Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 

More from Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 

Recently uploaded

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONHumphrey A Beña
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 

Recently uploaded (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATIONTHEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
THEORIES OF ORGANIZATION-PUBLIC ADMINISTRATION
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 

UCLA: Data Management for Librarians

  • 1. Data  Management  and  the  Digital  Curation   for  Excel  (DCXL)  Project     Carly  Strasser     University  of  California  Curation  Center  at  CDL  
  • 2. NSF  funded  DataNet  Project   Of@ice  of  Cyberinfrastructure   Enabling  universal  access  to  data  about  life  on  earth   and  the  environment  that  sustains  it  
  • 3. B   A   C  
  • 4. NSF  funded  DataNet  Project   Of@ice  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   From  Flickr  by  wetwebwork   Courtesy  of  DataONE  
  • 5. What  role  can   libraries  play  in   data  education?   Why  don’t  people   What  barriers  to   share  data?   sharing  can  we   eliminate?   Is  data  management   Do  attitudes  about   being  taught?   sharing  differ   among   disciplines?   How  can  we  promote   storing  data  in   repositories?  
  • 6. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices   1.  Mistakes  scientists  make    
  • 7. From  Flickr  by    DW0825   From  Flickr  by  Flickmor   From  Flickr  by    deltaMike   Digital  data   www.woodrow.org   C.  Strasser   Courtesey  of  WHOI   From  Flickr  by  US  Army  Environmental  Command  
  • 8. Digital  data   +     Complex  analyses  
  • 9. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 10. 2  tables   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 11. Random  notes   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 12. Wash  Cres  Lake  Dec  15  Dont_Use.xls   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 13. Collaboration  and  Data  Sharing   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 13  
  • 14. Random  stats   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569
  • 15. Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 16. Who  cares?     From  Flickr  by  Redden-­‐McAllister   From  Flickr  by  AJC1   www.rba.gov.au  
  • 17. Where  data  end  up   From  Flickr  by  diylibrarian   www Data   www Metadata   Recreated  from  Klump  et  al.  2006  
  • 18. Data   Reuse   Data   Sharing   Data   Management  
  • 19. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 20. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices   1.  Mistakes  scientists  make    
  • 21. Barriers   Cost   Time   cultblender.wordpress.com   Software,   Personnel   hardware  
  • 22. Barriers   Cost:  time,  personnel,  software,  hardware   Culture  of  Science   •  Not  the  norm   •  Lack  of  training   •  Disparate  data  
  • 23. Barriers   Cost:  time,  personnel,  software,  hardware   Culture  of  Science   Loss  of  rights  or  bene:its   Misuse  of   data   Missed   opportunities   Con@lict  
  • 24. Barriers   Cost:  time,  personnel,  software,  hardware   Culture  of  Science   Loss  of  rights  or  bene:its   Lack  of  incentives   Time  consuming   &  expensive   Reward   structure   Few   requirements  
  • 25. Are  Undergrads  Learning  About  Data  Management?   Importance  Versus  Assessment   •  Metadata  generation   40   •  Software  choice   35   •  File  naming   30   •  QAQC   Important   25   •  Backing  up     20   •  Work@lows   15   •  Data  sharing   10   •  Data  re-­‐use   •  Meta-­‐analysis   5   •  Reproducibility   0   If  it’s  important,  why   0   •  Notebook  protocols   10   20   30   40   Assessed   isn’t  it  taught?   •  Databases    
  • 26. Barriers  to  Teaching  Data  Management   Too   Not  a   Not   advanced   priority   appropriate   level   Students   Time   don’t  know   No   software   Lab   No   training   Covered   Too   in  Lab   big  
  • 27. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices   1.  Mistakes  scientists  make    
  • 28. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Work@lows   6.  Data  stewardship  &  reuse  
  • 29. 2.  Data  collection  &  organization   Create  unique  identiTiers   •  Decide  on  naming  scheme  early   •  Create  a  key   •  Different  for  each  sample   From  Flickr  by  zebbie   From  Flickr  by  sjbresnahan  
  • 30. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   ModiVied  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  • 31. 2.  Data  collection  &  organization   Standardize   •  Reduce   possibility  of   manual  error  by   constraining   entry  choices   Excel  lists   Google  Docs   Data  validataion   Forms   ModiVied  from  K.  Vanderbilt    
  • 32. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 33. 2.  Data  collection  &  organization    Use  descriptive  Tile  names   *   •  Unique   •  Re@lect  contents   Bad:    Mydata.xls   Better:  Eaf@inis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     *Not  for  everyone   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 34. 2.  Data  collection  &  organization   Organize  Tiles    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  • 35. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  • 36. 2.  Data  collection  &  organization   All  of  the  things  that   make  Excel  great  for  data   organization  are  bad  for   archiving!  What  to  do?   1.  Create  archive-­‐ready  raw  data   2.  Put  it  somewhere  special   3.  Have  your  fun  with  fancy  Excel   techniques   4.  Keep  archiving  in  mind  
  • 37. 3.  Quality  control  and  quality  assurance   De@ine  &  enforce  standards   Double  data  entry   Document  changes   Minimize  manual  data  entry   No  missing,  impossible,  or  anomalous  values   •  Perform  statistical  summaries   •  Use  illegal  data  @ilter   •  Look  for  outliers   60   50     40   30   20   10   0   0   5   10   15   20   25   30   35  
  • 38. •  ScientiTic  context   4.  Metadata  basics   •  Scienti@ic  reason  why  the  data  were   collected   •  What  data  were  collected   •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  @ile(s)  in  the   •  Environmental  conditions  during   data  set   collection   •  Date  the  data  set  was  last  modi@ied   •  Where  collected  &  spatial  resolution   •  Example  data  @ile  records  for  each  data   When  collected  &  temporal  resolution   type  @ile   •  Standards  or  calibrations  used   •  Pertinent  companion  @iles   •  Information  about  parameters   •  List  of  related  or  ancillary  data  sets   •  How  each  was  measured  or  produced   •  Software  (including  version  number)   •  Units  of  measure   used  to  prepare/read    the  data  set   •  Format  used  in  the  data  set   •  Data  processing  that  was  performed   •  Precision  &  accuracy  if  known   •  Personnel  &  stakeholders   •  Information  about  data   •  Who  collected     •  De@initions  of  codes  used   •  Who  to  contact  with  questions   •  Quality  assurance  &  control  measures   •  Funders   •  Known  problems  that  limit  data  use  (e.g.   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  • 39. 4.  Metadata  basics   •  Provides  structure  to  describe  data   Common  terms    |    deVinitions    |    language    |    structure   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…     •  Tools  for  creating  metadata  @iles    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)    
  • 40. 5.  WorkTlows   Simplest  workTlows:  commented  scripts,  Vlow  charts   Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  • 41. 5.  WorkTlows   Fancy  Schmancy:  Kepler   Resulting  output   https://kepler-­‐project.org  
  • 42. 5.  WorkTlows   WorkTlows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  Vindings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  • 43. 6.  Data  stewardship  &  reuse   Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   Modified from R. Cook  
  • 44. 6.  Data  stewardship  &  reuse   Where  do  I  put  it?   Insitutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  • 45. 6.  Data  stewardship  &  reuse   Data  Citation:  Why  everyone  should  do  it   Allow  readers  to  @ind  data  products   Get  credit  for  data  and  publications   Promote  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversi@ication  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  @ishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     Learn  more  at  www.datacite.org   Modified from R. Cook  
  • 46. Best  Practices  for  Data  Management   1.  Planning   2.  Data  collection  &   organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Work@lows   6.  Data  stewardship  &  reuse   7.  Planning  
  • 47. 1.  Planning   What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your   data  during  and  after  you  complete  your  research  
  • 48. 1.  Planning   Why  should  scientists  prepare  a  DMP?       Saves  time   Increases  ef@iciency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  require  it    
  • 49. A  few  words  about  NSF  Data   Management  Plans  
  • 50. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content   (where  existing  standards  are  absent  or  deemed  inadequate,  this  should   be  documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  con@identiality,  security,  intellectual  property,  or   other  rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  • 51. Don’t  forget:  Budget   •  Costs  of  data  preparation  &  documentation   Hardware,  software   Personnel   Archive  fees   •  How  costs  will  be  paid     Request  funding!   dorrvs.com  
  • 52. NSF’s  Vision*   DMPs  and  their  evaluation  will  grow  &  change  over   time  (similar  to  broader  impacts)   Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     –  Different  disciplines  have  different  de@initions  of  acceptable   data  sharing   –  Flexibility  at  the  directorate  and  division  levels   –  Tailor  implementation  of  DMP  requirement   Evaluation  will  vary  with  directorate,  division,  &   program  of@icer     *UnofVicially   Help  from  Jennifer  Schopf,  NSF  
  • 53. NSF’s  Vision*   DMPs  are  a  good  Tirst  step  towards  improving  data   stewardship   –  starting  discussion   –  scientists  learning  about  data  management   Additional  expertise  on  panels  to  effectively   evaluate  DMPs  (?)   Working  group  will  assess  outcomes     *UnofVicially      
  • 54. dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &  updates  
  • 55. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices   1.  Mistakes  scientists  make    
  • 56. “A  transformation  in  the  conduct  of  a  segment  of  scientiVic   research  by  enabling  and  promoting  publishing,  sharing,   and  archiving  of  tabular  data”   Increase    interoperability   =  Sharing      publishability   =  Publishing      archivability               =  Archiving     Focus  on  atmospheric,  ecological,  hydrological,  and   oceanographic  data  
  • 57. Open  Source  &  Free     Excel  Add-­‐in     Software  program  that  extends  the  capabilities   of  larger  programs   Complements  basic  Excel  functionality   From  www.webopedia.com   www.ablebits.com  
  • 58. DCXL  Project  Deliverables   •  Excel  add-­‐in   •  Publicly  available  source  code   •  Technical  documentation   •  End  user  documentation     •  Publicly  available   requirements  
  • 59. Process   Assess  needs   •  Quantitative   –  Surveys  
  • 60. Process   Assess  needs   •  Quantitative   ? –  Surveys   –  Quick  poll   •  Qualitative   –  Interviews  
  • 61. Process   Assess  needs   Gather  requirements     Locations    Conferences    UC  campus  visits    Remote/web-­‐based    
  • 62. Process   Assess  needs   Gather  requirements     Stakeholders  &  contributors      Libraries    Scientists    Repositories    Experts:  MSR,  GBMF    Personnel  on  related  projects        
  • 63. Process   Assess  needs   Gather  requirements   ! Build  requirements  document   ! "#$%&!"'(')*+!#,-*)'./!0.-!1234+!5-.643)! ! !"#$%&#'()$#*#+,"% %%%%%!+)-#$"),.%/0%123)0/$+)2%1($2,)/+%1#+,#$4%123)0/$+)2%5)6),23%7)8$2$.% %%
  • 64. Requirements   1  Ensure  compatibility  for  Excel  users  without  the  add-­‐in   2  Check  the  data  Tile  for  CSV  compatibility    2.1  Excel  performs  a  CSV  compatibility  check  on  the  data  Vile    2.2  Excel  generates  a  Compatibility  Report     3  Generate  metadata  that  is  linked  to  the  data  Tile    3.1  The  user  opens  an  existing  metadata  document  as  a  template    3.2  The  user  initiates  a  new  metadata  document    3.3  Excel  populates  Level  1  metadata  Vields    3.4  The  user  populates  Level  2  metadata  Vields    3.5  The  user  generates  labels  for  parameter  metadata    3.6  The  user  requests  standards  for  keywords  
  • 65. Requirements   4  Generate  a  citation  for  the  data  Tile   5  Deposit  into  a  repository   5.1  The  user  authenticates  via  an  existing  relationship  with  the   designated  repository   5.2  The  user  is  directed  to  establish  a  relationship  with  a  repository   5.3  The  user  links  an  identiVier  to  the  data  Vile  via  the  designated   repository   5.4  Excel  performs  Pre-­‐Archiving  Tasks   5.5  The  user  submits  the  Excel  Vile  for  deposition   6  Appendix  A:  Metadata  Types   7  Appendix  B:  Citation  Format   8  Appendix  C:  Dictionary  of  Terms        
  • 66. Process   Assess  needs   Gather  requirements   Build  requirements  document   Build  community   Libraries   Scientists   Repositories   Programmers/ Developers      
  • 67. Why  are  you   promoting   Excel?   •  Everyone  uses  it   •  Features  that  make  it  good  for  data  organization  make  it   bad  for  archiving   •  Stopgap  measure  
  • 68. Get  Involved   dcxl.cdlib.org       @dcxlCDL     www.facebook.com/ DCXLatCDL    
  • 69. Roadmap   5.  Tools   4.  DCXL       3.  Best  practices  for  scientists   2.  Barriers  to  best  practices   1.  Mistakes  scientists  make    
  • 70. UC3  Services   Where   should  I  put   Data  Repository   my  data?   Deposit    |    Manage    |    Share    |    Preserve   www.cdlib.org/services/uc3  
  • 71. UC3  Services   How  do  I  get   a  unique   identiVier?   Create  &  manage  persistent  identi@iers   •  Precise  identi@ication  of  a  dataset   •  Credit  to  data  producers  and  data  publishers   •  A  link  from  the  traditional  literature  to  the   data   •  Research  metrics  for  datasets   www.cdlib.org/services/uc3  
  • 72. DataONE   www.dataone.org   •  Data  Education  Tutorials   •  Database  of  best  practices     &  software  tools   •  Links  to  DMPTool   •  Primer  on  data   management   From  Flickr  by  Robert  Hruzek  
  • 73. Data Management 101" dcxl.cdlib.org   •  Data  Education  Tutorials   •  Primer  on  data  management   •  Other  resources  
  • 74. Toolbox:    DCXL  blog:  dcxl.cdlib.org  
  • 75. Lisa  Federer     dcxl.cdlib.org   @dcxlCDL   www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser