SlideShare a Scribd company logo
1 of 59
Data	
  Management	
  for	
  Scientists	
  
                     	
  
       Reduce	
  your	
  workload	
  
            Reuse	
  your	
  ideas	
  
           Recycle	
  your	
  data	
  
                                  	
  

                                                                                www.oddee.com	
  



Carly	
  Strasser,	
  PhD	
  
California	
  Digital	
  Library,	
  UC	
  Office	
  of	
  the	
  President	
  
carly.strasser@ucop.edu	
  
www.carlystrasser.net	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                  3.  Control	
  
           2.  Chaos	
  
1.  Who	
  are	
  you?	
  
	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                  3.  Control	
  
           2.  Chaos	
  
1.  Who	
  are	
  you?	
  
	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  

                                                            Community	
  
           Cyberinfrastructure	
                           Engagement	
  &	
  
                                                             Outreach	
  




             From	
  Flickr	
  by	
  ThomasThomas	
     From	
  Flickr	
  by	
  Langwitches	
  
What	
  role	
  can	
  
                                                        libraries	
  play	
  in	
  
                                                        data	
  education?	
  


     Why	
  don’t	
  people	
       What	
  barriers	
  to	
  sharing	
  
       share	
  data?	
               can	
  we	
  eliminate?	
  


                                  Is	
  data	
  management	
  
Do	
  attitudes	
  about	
  
                                         being	
  taught?	
  
  sharing	
  differ	
  
among	
  disciplines?	
  
                                       How	
  can	
  we	
  promote	
  storing	
  
                                          data	
  in	
  repositories?	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                  3.  Control	
  
           2.  Chaos	
  
1.  Who	
  are	
  you?	
  
	
  
Digital	
  data	
  
     +	
  	
  
 Complex	
  
workflows	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
Data	
                               Models	
  

                    Maximum	
  
                    Likelihood	
  
                    estimation	
  



                      Matrix	
  
                      Models	
  



       Images	
       Tables	
       Paper	
  
UGLY TRUTH
                                                    Many	
  
                                                    Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                    scientists…	
  	
  
                                                    	
  
5shortessays.blogspot.com	
  



                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
2	
  tables	
                             Random	
  notes	
  

C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab      Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore            Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26           -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26             0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
            	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  oStephanie	
  ractices	
  
                                                                                                                                                                Modified	
  from	
   n	
  Best	
  P Hampton	
  
Wash	
  Cres	
  Lake	
  Dec	
  15	
  Dont_Use.xls	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                               Peter's lab      Don't use - old data
                         Sample Type: Algal                                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                        15
                     Reference statistics: SD for delta        C = 0.07                              SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C   delta 13C_ca         %N               delta 15N   delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05         -24.59           1.96                4.12          3.47       25354
         A2                            ref    0.98              39.78      -25.00         -24.54           2.03                4.01          3.36       25356
         A3                            ref    0.98              40.37      -24.99         -24.53           2.04                4.09          3.44       25358
         A4                            ref    1.01              42.23      -25.06         -24.60           2.17                4.20          3.55       25360           Shore            Avg Con
         A5          ALG01                    3.05              1.88       -24.34         -23.88           0.17               -1.65         -2.30       25362      c        -1.26           -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17         -29.71           0.92                0.87          0.22       25364                1.26             0.32
         A7          ALG03                    2.91              6.85       -21.11         -20.65           0.48               -0.97         -1.62       25366      c
         A8          ALG05                    2.91              35.56      -28.05         -27.59           2.30                0.59         -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56         -29.10           1.68                0.79          0.14       25370
         A10         ALG06                    2.95              41.17      -27.32         -26.86           1.97                2.71          2.06       25372
         B1          ALG04                    3.01              43.74      -27.50         -27.04           1.36                0.99          0.34       25374      c
         B2          ALG02                      3               4.51       -22.68         -22.22           0.34                4.31          3.66       25376
         B3          ALG01                    2.99              1.59       -24.58         -24.12           0.15               -1.69         -2.34       25378      c
         B4          ALG03                    2.92              4.37       -21.06         -20.60           0.34               -1.52         -2.17       25380      c
         B5          ALG07                     2.9              33.58      -29.44         -28.98           1.74                0.62         -0.03       25382
         B6                            ref    1.01              44.94      -25.00         -24.54           2.59                3.96          3.31       25384
         B7                            ref    0.99              42.28      -24.87         -24.41           2.37                4.33          3.68       25386
         B8          Lk Outlet Alg            3.04              31.43      -29.69         -29.23           1.07                0.95          0.30       25388
         B9          ALG06                    3.09              35.57      -27.26         -26.80           1.96                2.79          2.14       25390
         B10         ALG02                    3.05              5.52       -22.31         -21.85           0.45                4.72          4.07       25392
         C1          ALG04                    2.98              37.90      -27.42         -26.96           1.36                1.21          0.56       25394      c
         C2          ALG05                    3.04              31.74      -27.93         -27.47           2.40                0.73          0.08       25396
         C3                            ref    0.99              38.46      -25.09         -24.63           2.40                4.37          3.72       25398
                                                                23.78                                      1.17




                                                                                                                                                             From	
  Stephanie	
  Hampton	
  (2010)	
            	
  	
  
                                                                                                                                                             ESA	
  Workshop	
  oStephanie	
  ractices	
  
                                                                                                                                                                Modified	
  from	
   n	
  Best	
  P Hampton	
  
C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                                          Peter's lab          Don't use - old data
                         Sample Type: Algal                                                                                                                        Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                          13                                                      15
                     Reference statistics: SD for delta        C = 0.07                            SD for delta        N = 0.15


          Position        SampleID         Weight (mg)           %C       delta 13C delta 13C_ca        %N                delta 15N delta 15N_ca   Spec. No.
         A1                            ref    0.98              38.27      -25.05       -24.59         1.96                  4.12        3.47       25354
         A2                            ref    0.98              39.78      -25.00       -24.54         2.03                  4.01        3.36       25356
         A3                            ref    0.98              40.37      -24.99       -24.53         2.04                  4.09        3.44       25358
         A4                            ref    1.01              42.23      -25.06       -24.60         2.17                  4.20        3.55       25360          Shore                Avg Con
         A5          ALG01                    3.05              1.88       -24.34       -23.88         0.17                 -1.65       -2.30       25362 c            -1.26               -27.22
         A6          Lk Outlet Alg            3.06              31.55      -30.17       -29.71         0.92                  0.87        0.22       25364               1.26                 0.32
         A7          ALG03                    2.91              6.85       -21.11       -20.65         0.48                 -0.97       -1.62       25366 c
         A8          ALG05                    2.91              35.56      -28.05       -27.59         2.30                  0.59       -0.06       25368
         A9          ALG07                    3.04              33.49      -29.56       -29.10         1.68                  0.79        0.14       25370
         A10         ALG06                    2.95              41.17      -27.32       -26.86         1.97                  2.71        2.06       25372
         B1          ALG04                    3.01              43.74      -27.50       -27.04         1.36                  0.99        0.34       25374 c                    SUMMARY OUTPUT
         B2          ALG02                      3               4.51            SampleID
                                                                           -22.68       -22.22        ALG03
                                                                                                       0.34               ALG05
                                                                                                                             4.31        3.66         ALG07
                                                                                                                                                    25376           ALG06            ALG04            ALG02                ALG01                  ALG03           ALG07
         B3          ALG01                    2.99              1.59       -24.58       -24.12         0.15                 -1.69       -2.34       25378 c                 Regression Statistics
         B4          ALG03                    2.92              4.37       -21.06       -20.60         0.34                 -1.52       -2.17       25380 c                Multiple R 0.283158
         B5          ALG07                     2.9              33.58         Weight (mg)
                                                                           -29.44       -28.98          2.91
                                                                                                       1.74                  0.62    2.91
                                                                                                                                        -0.03       25382 3.04          2.95 Square 0.080178
                                                                                                                                                                           R            3.01                     3                  2.99               2.92                  2.9
         B6                            ref    1.01              44.94      -25.00       -24.54         2.59                  3.96        3.31       25384                  Adjusted R Square
                                                                                                                                                                                       -0.022024
         B7                            ref    0.99              42.28      -24.87       -24.41         2.37                  4.33        3.68       25386                  Standard Error
                                                                                                                                                                                        1.906378
         B8          Lk Outlet Alg            3.04              31.43      -29.69 %C-29.23              6.85
                                                                                                       1.07                  0.95   35.560.30       25388 33.49        41.17
                                                                                                                                                                           Observations43.74    11              4.51                1.59              4.37               33.58
         B9          ALG06                    3.09              35.57      -27.26       -26.80         1.96                  2.79        2.14       25390
         B10         ALG02                    3.05              5.52       -22.31
                                                                                 delta 13C
                                                                                        -21.85
                                                                                                       -21.11
                                                                                                       0.45                  4.72
                                                                                                                                   -28.054.07       25392
                                                                                                                                                          -29.56       -27.32
                                                                                                                                                                           ANOVA
                                                                                                                                                                                 -27.50                        -22.68             -24.58             -21.06             -29.44
         C1          ALG04                    2.98              37.90         delta 13C_ca
                                                                           -27.42       -26.96         -20.65
                                                                                                       1.36                  1.21  -27.590.56       25394 -29.10
                                                                                                                                                             c         -26.86    -27.04
                                                                                                                                                                                    df              SS         -22.22
                                                                                                                                                                                                                  MS  F           -24.12
                                                                                                                                                                                                                               Significance F        -20.60             -28.98
         C2          ALG05                    3.04              31.74      -27.93       -27.47         2.40                  0.73        0.08       25396                  Regression          1 2.851116 2.851116 0.784507 0.398813
         C3                            ref    0.99              38.46      -25.09       -24.63         2.40                  4.37        3.72       25398                  Residual            9 32.7085 3.634278
                                                                23.78             %N                    0.48
                                                                                                       1.17                          2.30                 1.68          1.97
                                                                                                                                                                           Total          1.3610 35.55962 0.34                0.15                     0.34                  1.74
                                                                              delta 15N                  -0.97                       0.59                 0.79          2.71              0.99                 4.31                -1.69              -1.52                  0.62
                                                                                                                                                                                         Coefficients
                                                                                                                                                                                                   Standard Error t Stat  P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                              Upper 95.0%
                                                                             delta 15N_ca                -1.62                      -0.06                 0.14          2.06
                                                                                                                                                                           Intercept       -4.297428 4.671099 3.66
                                                                                                                                                                                            0.34                                    -2.34              -2.17
                                                                                                                                                                                                                -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341      -0.03
                                                                                                                                                                               X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                                                                   4.00



                                                                                                                                                                                                                                                   3.00



                                                                                                                                                                                                                                                   2.00



                                                                                                                                                                                                                                                   1.00

                                                                                                                                                                                                                                                                      Series1

                                                                                                                                                                                                                                                   0.00
                                                                              -35.00                  -30.00                       -25.00                -20.00                 -15.00                  -10.00                  -5.00                  0.00

                                                                                                                                                                                                                                                  -1.00



                                                                                                                                                                                                                                                  -2.00



                                                                                                                                                                                                                                                  -3.00


                                                                                                                                                                                                                               Modified	
  from	
  Stephanie	
  Hampton	
  
What	
  is	
  this?	
  


C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1
                   Stable Isotope Data Sheet
              Sampling Site / Identifier: Wash Cresc Lake                                                                                               Peter's lab              Don't use - old data
                         Sample Type: Algal                                                                                                             Washed Rocks
                                  Date: Dec. 16
                Tray ID and Sequence: Tray 004

                                                     13                                                   15
                     Reference statistics: SD for delta C = 0.07                              SD for delta N = 0.15


          Position        SampleID        Weight (mg)      %C      delta 13C   delta 13C_ca        %N          delta 15N   delta 15N_ca Spec. No.
         A1                           ref    0.98         38.27     -25.05         -24.59          1.96           4.12          3.47     25354
         A2                           ref    0.98         39.78     -25.00         -24.54          2.03           4.01          3.36     25356
         A3                           ref    0.98         40.37     -24.99         -24.53          2.04           4.09          3.44     25358
         A4                           ref    1.01         42.23     -25.06         -24.60          2.17           4.20          3.55     25360          Shore                    Avg Con
         A5          ALG01                   3.05         1.88      -24.34         -23.88          0.17          -1.65         -2.30     25362      c       -1.26                   -27.22
         A6          Lk Outlet Alg           3.06         31.55     -30.17         -29.71          0.92           0.87          0.22     25364               1.26                     0.32
         A7          ALG03                   2.91         6.85      -21.11         -20.65          0.48          -0.97         -1.62     25366      c
         A8          ALG05                   2.91         35.56     -28.05         -27.59          2.30           0.59         -0.06     25368
         A9          ALG07                   3.04         33.49     -29.56         -29.10          1.68           0.79          0.14     25370
         A10         ALG06                   2.95         41.17     -27.32         -26.86          1.97           2.71          2.06     25372
         B1          ALG04                   3.01         43.74     -27.50         -27.04          1.36           0.99          0.34     25374      c               SUMMARY OUTPUT
         B2          ALG02                     3          4.51      -22.68         -22.22          0.34           4.31          3.66     25376
         B3          ALG01                   2.99         1.59      -24.58         -24.12          0.15          -1.69         -2.34     25378      c                Regression Statistics
         B4          ALG03                   2.92         4.37      -21.06         -20.60          0.34          -1.52         -2.17     25380      c               Multiple R 0.283158
         B5          ALG07                    2.9         33.58     -29.44         -28.98          1.74           0.62         -0.03     25382                      R Square 0.080178
         B6                           ref    1.01         44.94     -25.00         -24.54          2.59           3.96          3.31     25384                      Adjusted R Square
                                                                                                                                                                                -0.022024
         B7                           ref    0.99         42.28     -24.87         -24.41          2.37           4.33          3.68     25386                      Standard Error
                                                                                                                                                                                 1.906378
         B8          Lk Outlet Alg           3.04         31.43     -29.69         -29.23          1.07           0.95          0.30     25388                      Observations         11
         B9          ALG06                   3.09         35.57     -27.26         -26.80          1.96           2.79          2.14     25390
         B10         ALG02                   3.05         5.52      -22.31         -21.85          0.45           4.72          4.07     25392                      ANOVA
         C1          ALG04                   2.98         37.90     -27.42         -26.96          1.36           1.21          0.56     25394      c                                df         SS      MS        F Significance F
         C2          ALG05                   3.04         31.74     -27.93         -27.47          2.40           0.73          0.08     25396                      Regression             1 2.851116 2.851116 0.784507 0.398813
         C3                           ref    0.99         38.46     -25.09         -24.63          2.40           4.37          3.72     25398                      Residual               9 32.7085 3.634278
                                                          23.78                                    1.17                                                             Total                 10 35.55962

                                                                                                                                                                              Coefficients
                                                                                                                                                                                        Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
                                                                                                                                                                                                                                                  Upper 95.0%
                                                                                                                                                                    Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341
                                                                                                                                                                    X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569




                                                                                                                                                                                                    Modified	
  from	
  Stephanie	
  Hampton	
  
The	
  path	
  of	
  research	
  products	
  


                                                                   www




  Data	
  
Metadata	
  



                                                Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
The	
  path	
  of	
  research	
  products	
  


                                                                   www




  Data	
  
                                                www
Metadata	
  



                                                Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Data	
  
   Reuse	
  

   Data	
  
  Sharing	
  

   Data	
  
Management	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                  3.  Control	
  
           2.  Chaos	
  
1.  Who	
  are	
  you?	
  
	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                  3.  Control	
  
           2.  Chaos	
  
1.  Who	
  are	
  you?	
  
	
  
•  Unrestricted	
  access	
  to	
  articles*	
  via	
  internet	
  
                 digital	
  
                 online	
  
                 free	
  of	
  charge	
  
                 free	
  of	
  most	
  copyright/licensing	
  restrictions	
  
       •  Compatible	
  with	
  conventional	
  scholarly	
  literature	
  
       •  Bills	
  not	
  paid	
  by	
  readers:	
  no	
  barriers	
  to	
  access	
  

	
        *Open	
  access	
  easily	
  extends	
  to	
  data	
  
Roadmap	
  



                         4.  Toolbox	
  
                         	
  
                  3.  Control	
  
           2.  Chaos	
  
1.  Who	
  are	
  you?	
  
	
  
Best	
  Practices	
  for	
  
Data	
  Management	
  
  1.  Planning	
  
  2.  Data	
  collection	
  &	
  organization	
  
  3.  Quality	
  control	
  &	
  assurance	
  
  4.  Metadata	
  
  5.  Workflows	
  
  6.  Data	
  Stewardship	
  &	
  reuse	
  
1.	
  Planning	
  

   What	
  is	
  a	
  data	
  management	
  plan?	
  
A	
  document	
  that	
  describes	
  what	
  you	
  will	
  do	
  with	
  your	
  data	
  
         during	
  and	
  after	
  you	
  complete	
  your	
  research	
  




                                                From	
  Flicker	
  by	
  Ikelee	
  
1.	
  Planning	
  
              Why	
  should	
  I	
  prepare	
  a	
  DMP?	
  
        	
                           	
  
        Saves	
  time	
  
        Increases	
  efficiency	
  
        Easier	
  to	
  use	
  data	
  	
  	
  
        Others	
  can	
  understand	
  &	
  use	
  data	
  
        Credit	
  for	
  data	
  products	
  
        Funders	
  protect	
  their	
  investment	
  
	
  
1.	
  Planning	
  

Components	
  of	
  a	
  DMP	
  
	
  


1.     Information	
  about	
  data	
  &	
  data	
  format	
  
2.     Metadata	
  content	
  and	
  format	
  
3.     Policies	
  for	
  access,	
  sharing	
  and	
  re-­‐use	
  
4.     Long-­‐term	
  storage	
  and	
  data	
  management	
  
5.     Budget	
  
1.	
  Planning	
  
                                dmp.cdlib.org	
  




                     Step-­‐by-­‐step	
  wizard	
  for	
  generating	
  DMP	
  
           Create	
  	
  |	
  	
  edit	
  	
  |	
  	
  re-­‐use	
  	
  |	
  	
  share	
  	
  |	
  	
  save	
  	
  |	
  	
  generate	
  	
  
                                               Open	
  to	
  community	
  	
  
                                  Links	
  to	
  institutional	
  resources	
  
                             Directorate	
  information	
  &updates	
  
2.	
  Data	
  collection	
  &	
  organization	
  

 Personal	
  data	
  management	
  problems	
  build	
  up	
  
             over	
  time,	
  &	
  in	
  collaboration	
  




                plumbinghelptoday.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Consistent	
  within	
  columns	
  
                                    – only	
  numbers,	
  dates,	
  or	
  text	
  
                      •  Consistent	
  names,	
  codes,	
  formats	
  




Modified	
  from	
  K.	
  Vanderbilt	
  	
  
                                                                                     From	
  Pink	
  Floyd,	
  The	
  Wall	
  	
  	
  themurkyfringe.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

        Standardize	
  
                      •  Reduce	
  possibility	
  
                         of	
  manual	
  error	
  by	
  
                         constraining	
  entry	
  
                         choices	
  


                    Excel	
  lists	
  
                         Data   Google	
  Docs	
  
                                  	
  
                                       Forms	
  
                   validataion	
  

Modified	
  from	
  K.	
  Vanderbilt	
  	
  
2.	
  Data	
  collection	
  &	
  organization	
  
	
  	
  
           Create	
  parameter	
  table	
  
           Create	
  a	
  site	
  table	
  




                                              From	
  doi:10.3334/ORNLDAAC/777	
  

From	
  doi:10.3334/ORNLDAAC/777	
  


                                                                   From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Use	
  descriptive	
  file	
  names	
  




                                         PhDcomics.com	
  
2.	
  Data	
  collection	
  &	
  organization	
  

   	
  Use	
  descriptive	
  file	
  names	
  
       •  Unique	
  
       •  Reflect	
  contents	
  

Bad:	
     	
  Mydata.xls	
              Better: 	
  Eaffinis_nanaimo_2010_counts.xls	
  
   	
      	
  2001_data.csv	
  
   	
      	
  best	
  version.txt	
  
                                              Study	
                          Year	
  
                                            organism	
      Site	
  
                                                           name	
                                       What	
  was	
  
                                                                                                        measured	
  	
  




                                                                       From	
  R	
  Cook,	
  ESA	
  Best	
  Practices	
  Workshop	
  2010	
  
2.	
  Data	
  collection	
  &	
  organization	
  

Organize	
  files	
  	
  logically	
  


                      Biodiversity	
  


                              Lake	
  


                              Experiments	
   Biodiv_H20_heatExp_2005to2008.csv	
  
                                                 Biodiv_H20_predatorExp_2001to2003.csv	
  
                                                 …	
  
                               Field	
  work	
   Biodiv_H20_PlanktonCount_2001toActive.csv	
  
                                                 Biodiv_H20_ChlAprofiles_2003.csv	
  
                                                 …	
  
                                                 	
  
                           Grassland	
  
                                                                                           From	
  S.	
  Hampton	
  
2.	
  Data	
  collection	
  &	
  organization	
  

	
  Preserve	
  information	
                                            R	
  script	
  for	
  processing	
  &	
  
                                                                                                   analysis	
  
 •  Keep	
  raw	
  data	
  raw	
  
 •  Use	
  scripts	
  to	
  process	
  data	
                     	
  
        	
  &	
  save	
  them	
  with	
  data	
  

                                  Raw	
  data	
  as	
  .csv	
  
3.	
  Quality	
  control	
  and	
  quality	
  assurance	
  
 Define	
  &	
  enforce	
  standards	
  
 Double	
  data	
  entry	
  
 Document	
  changes	
  
 No	
  missing,	
  impossible,	
  or	
  anomalous	
  values	
  
        •  Perform	
  statistical	
  summaries	
  
        •  Use	
  illegal	
  data	
  filter	
   60	
  
        •  Look	
  for	
  outliers	
           50	
  

                                            40	
  
 	
  
                                            30	
  

                                            20	
  

                                            10	
  

                                              0	
  
                                                      0	
     5	
     10	
     15	
     20	
     25	
     30	
     35	
  
4.	
  Metadata	
  basics	
  
                        What	
  is	
  metadata?	
  
                              Data	
  reporting	
  
                                            	
  



      •  WHO	
  created	
  the	
  data?	
  
      •  WHAT	
  is	
  the	
  content	
  of	
  the	
  data	
  set?	
  
      •  WHEN	
  was	
  it	
  created?	
  
      •  WHERE	
  was	
  it	
  collected?	
  
      •  HOW	
  was	
  it	
  developed?	
  
      •  WHY	
  was	
  it	
  developed?	
  
•    Scientific	
  context	
  

       4.	
  Metadata	
  basics	
                                                          •       Scientific	
  reason	
  why	
  the	
  data	
  were	
  
                                                                                                   collected	
  
                                                                                           •       What	
  data	
  were	
  collected	
  
•    Digital	
  context	
                                                                  •       What	
  instruments	
  (including	
  model	
  &	
  
      •     Name	
  of	
  the	
  data	
  set	
                                                     serial	
  number)	
  were	
  used	
  
      •     The	
  name(s)	
  of	
  the	
  data	
  file(s)	
  in	
  the	
  data	
           •       Environmental	
  conditions	
  during	
  collection	
  
            set	
                                                                          •       Where	
  collected	
  &	
  spatial	
  resolution	
  When	
  
      •     Date	
  the	
  data	
  set	
  was	
  last	
  modified	
                                 collected	
  &	
  temporal	
  resolution	
  
      •     Example	
  data	
  file	
  records	
  for	
  each	
  data	
                     •       Standards	
  or	
  calibrations	
  used	
  
            type	
  file	
                                                            •    Information	
  about	
  parameters	
  
      •     Pertinent	
  companion	
  files	
                                               •       How	
  each	
  was	
  measured	
  or	
  produced	
  
      •     List	
  of	
  related	
  or	
  ancillary	
  data	
  sets	
                     •       Units	
  of	
  measure	
  
      •     Software	
  (including	
  version	
  number)	
                                 •       Format	
  used	
  in	
  the	
  data	
  set	
  
            used	
  to	
  prepare/read	
  	
  the	
  data	
  set	
  
                                                                                           •       Precision	
  &	
  accuracy	
  if	
  known	
  
      •     Data	
  processing	
  that	
  was	
  performed	
  
                                                                                     •    Information	
  about	
  data	
  
•    Personnel	
  &	
  stakeholders	
  
                                                                                           •       Definitions	
  of	
  codes	
  used	
  
      •     Who	
  collected	
  	
  
                                                                                           •       Quality	
  assurance	
  &	
  control	
  measures	
  
      •     Who	
  to	
  contact	
  with	
  questions	
  
                                                                                           •       Known	
  problems	
  that	
  limit	
  data	
  use	
  (e.g.	
  
      •     Funders	
                                                                              uncertainty,	
  sampling	
  problems)	
  	
  
                                                                                     •    How	
  to	
  cite	
  the	
  data	
  set	
  
4.	
  Metadata	
  basics	
  
                      What	
  is	
  a	
  metadata	
  standard?	
  


•  Provides	
  structure	
  to	
  describe	
  data	
  
              Common	
  terms	
  	
  |	
  	
  definitions	
  	
  |	
  	
  language	
  	
  |	
  	
  structure	
  

•  Lots	
  of	
  different	
  standards	
  
            	
  EML	
  ,	
  FGDC,	
  ISO19115,	
  DarwinCore,…	
  
     	
  




•  Tools	
  for	
  creating	
  metadata	
  files	
  
            	
  Morpho	
  (EML),	
  Metavist	
  (FGDC),	
  NOAA	
  MERMaid	
  (CSGDM)	
  	
  
4.	
  Metadata	
  basics	
  

   What	
  does	
  a	
  metadata	
  record	
  look	
  like?	
  
5.	
  Workflows	
  

 Simplest	
  workflows:	
  commented	
  scripts,	
  flow	
  charts	
  

 Temperature	
  
    data	
  
                                                             Data	
  import	
  into	
  R	
     Data	
  in	
  R	
  
     Salinity	
  	
  	
  	
  	
  	
  	
  	
  
                                                                                                format	
  
      data	
  
                                                              Quality	
  control	
  &	
  
                                        “Clean”	
  T	
         data	
  cleaning	
  
                                        &	
  S	
  data	
  

                                                             Analysis:	
  mean,	
  SD	
  
                                                                                                Summary	
  
                                                                                                statistics	
  

                                                             Graph	
  production	
  
5.	
  Workflows	
  
Fancy	
  Schmancy:	
  Kepler	
  
                                                         Resulting	
  output	
  




                      https://kepler-­‐project.org	
  
5.	
  Workflows	
  

 Workflows	
  enable	
  
 	
  
                                                                                                       From	
  Flickr	
  by	
  merlinprincesse	
  
        Reproducibility	
  
               	
  can	
  someone	
  independently	
  validate	
  findings?	
  
        Transparency	
  	
  
               	
  others	
  can	
  understand	
  how	
  you	
  arrived	
  at	
  your	
  results	
  
        Executability	
  	
  
               	
  others	
  can	
  re-­‐run	
  or	
  re-­‐use	
  your	
  analysis	
  
        	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

                         Data	
  
                         Reuse	
  

                        Data	
  
                       Sharing	
  

                      Data	
  
                   Management	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                                                                          From	
  Flickr	
  by	
  greensambaman	
  




      The 20-Year Rule
     The	
  metadata	
  accompanying	
  a	
  
     data	
  set	
  should	
  be	
  written	
  for	
  a	
  
      user	
  20	
  years	
  into	
  the	
  future	
                    RULE	
  
                            	
  
                                 	
  



                                                              (National	
  Research	
  Council	
  1991)	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  

Use	
  stable	
  formats	
  
     	
     	
  csv,	
  txt,	
  tiff	
  
Create	
  back-­‐up	
  copies	
  	
  
             original,	
  near,	
  far	
  
Periodically	
  test	
  ability	
  to	
  restore	
  information	
  




                                                                      Modified from R. Cook	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
                         Where	
  do	
  I	
  put	
  it?	
  
                      Insitutional	
  archive	
  
              Discipline/specialty	
  archive	
  
              DataCite	
  list	
  of	
  repostiories:	
  
                	
  www.datacite.org/repolist	
  
                                                          	
  
                                                          	
  
                                                                 	
  

                   From	
  Flickr	
  by	
  torkildr	
  
6.	
  Data	
  stewardship	
  &	
  reuse	
  
          Data	
  Citation:	
  Why	
  everyone	
  should	
  do	
  it	
  

              Allow	
  readers	
  to	
  find	
  data	
  products	
  
              Get	
  credit	
  for	
  data	
  and	
  publications	
  
              Promote	
  reproducibility	
  
              Better	
  measure	
  of	
  research	
  impact	
  


   Example:	
  
   Sidlauskas,	
  B.	
  2007.	
  Data	
  from:	
  Testing	
  for	
  unequal	
  rates	
  of	
  morphological	
  
   diversification	
  in	
  the	
  absence	
  of	
  a	
  detailed	
  phylogeny:	
  a	
  case	
  study	
  from	
  
   characiform	
  fishes.	
  Dryad	
  Digital	
  Repository.	
  doi:10.5061/dryad.20	
  
   	
  
                                                                                                   Modified from R. Cook	
  
Roadmap	
  



                             4.  Toolbox	
  
                             	
  
                  3.  How	
  to	
  be	
  good	
  
           2.  Bad	
  scientists	
  
1.  Who	
  are	
  you?	
  
	
  
NSF	
  funded	
  DataNet	
  Project	
  
Office	
  of	
  Cyberinfrastructure	
  


  Enabling	
  universal	
  access	
  to	
  data	
  about	
  life	
  on	
  earth	
  
         and	
  the	
  environment	
  that	
  sustains	
  it	
  
B	
  




A	
             C	
  
B	
  




A	
             C	
  
B	
  




A	
             C	
  
www.dataone.org	
  

•    Data	
  Education	
  Tutorials	
  
www.dataone.org	
  

•    Data	
  Education	
  Tutorials	
  
•    Primer	
  on	
  data	
  management	
  
www.dataone.org	
  

•    Data	
  Education	
  Tutorials	
  
•    Primer	
  on	
  data	
  management	
  
•    Database	
  of	
  best	
  practices	
  &	
  software	
  tools	
  
•    List	
  of	
  repositories	
  &	
  metadata	
  standards	
  
•    Links	
  to	
  DMP	
  Tool	
  
                                             Investigator	
  Toolkit	
  
•    ONE-­‐R	
  
•    ONE-­‐Mercury	
  
•    ONE-­‐Drive	
  
E-­‐notebooks	
  

  •    NoteBook	
  
  •    ORNL	
  eNote	
  	
  
  •    Evernote	
  
  •    Google	
  Docs	
  
  •    Blogs	
  
  •    wikis	
  
  •    TheLabNotebook.com	
  
  •    iPad	
  ELN	
  
  •    NoteBookMaker	
  

iPad ELN, the flexible
electronic laboratory notebook


       TheLabNotebook.com!
CDL	
  Services	
  for	
  UC	
  Community	
  


•    Precise	
  identification	
  of	
  a	
  dataset	
  
•    Credit	
  to	
  data	
  producers	
  and	
  data	
  publishers	
  
•    A	
  link	
  from	
  the	
  traditional	
  literature	
  to	
  the	
  data	
  
•    Research	
  metrics	
  for	
  datasets	
  



•    Deposit	
  content	
  (i.e.	
  data)	
  
•    Manage	
  (metadata,	
  versions	
  etc.)	
  
•    Share	
  
•    Access	
  
•    Preserve	
                 www.cdlib.org/services/uc3	
  
•    Open	
  source	
  add-­‐in	
  
•    Facilitate	
  data	
  management,	
  sharing,	
  archiving	
  for	
  scientists	
  
•    Part	
  of	
  DataONE	
  investigator	
  toolkit	
  
•    Collecting	
  requirements	
  for	
  add-­‐in	
  from	
  scientists,	
  data	
  
     centers,	
  libraries	
  

                                        dcxl.cdlib.org	
  
                                              	
  

                   Funders:	
  Gordon	
  and	
  Betty	
  Moore	
  Foundation,	
  Microsoft	
  Research	
  
Christy	
  Hightower	
  
             Katie	
  Forney	
  
              Ann	
  Hubble	
  
         Cynthia	
  Moriconi	
  
                            	
  




                                   www.carlystrasser.net	
  
                                   carlystrasser@gmail.com	
  
dcxl.cdlib.org	
                   @carlystrasser	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  

More Related Content

Similar to Data Management for Scientists: Reduce, Reuse, Recycle Your Data

UC Riverside: Data Management for Scientists
UC Riverside: Data Management for ScientistsUC Riverside: Data Management for Scientists
UC Riverside: Data Management for ScientistsCarly Strasser
 
Data Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopData Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopCarly Strasser
 
Gwi data management
Gwi data managementGwi data management
Gwi data managementsusan borda
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsCarly Strasser
 
Data Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL courseData Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL courseCarly Strasser
 
DataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research WeekDataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research WeekCarly Strasser
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Carly Strasser
 
Data Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc WorkshopData Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc WorkshopCarly Strasser
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsCarly Strasser
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Carly Strasser
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Lukas Mandrake
 
CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325Jason Price, PhD
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AIWalid Maalej
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesEunjeong (Lucy) Park
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCarly Strasser
 
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...IOSR Journals
 

Similar to Data Management for Scientists: Reduce, Reuse, Recycle Your Data (20)

UC Riverside: Data Management for Scientists
UC Riverside: Data Management for ScientistsUC Riverside: Data Management for Scientists
UC Riverside: Data Management for Scientists
 
Data Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation WorkshopData Management: Scientist Perspective - UC3 Data Curation Workshop
Data Management: Scientist Perspective - UC3 Data Curation Workshop
 
Gwi data management
Gwi data managementGwi data management
Gwi data management
 
UC Merced: Data Management for Scientists
UC Merced: Data Management for ScientistsUC Merced: Data Management for Scientists
UC Merced: Data Management for Scientists
 
Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)Digital Curation for Excel (DCXL)
Digital Curation for Excel (DCXL)
 
DataUp for USGS CDI
DataUp for USGS CDIDataUp for USGS CDI
DataUp for USGS CDI
 
Data Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL courseData Stewardship for Researchers, SPATIAL course
Data Stewardship for Researchers, SPATIAL course
 
DataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research WeekDataUp Overview for UC Merced Research Week
DataUp Overview for UC Merced Research Week
 
Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012Data Management: Scientist Perspective - DLF 2012
Data Management: Scientist Perspective - DLF 2012
 
Data Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc WorkshopData Stewardship for Scientists, for CLIR Postdoc Workshop
Data Stewardship for Scientists, for CLIR Postdoc Workshop
 
UC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for ScientistsUC Santa Cruz: Data Management for Scientists
UC Santa Cruz: Data Management for Scientists
 
Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012Data Management for Scientists: Workshop at Ocean Sciences 2012
Data Management for Scientists: Workshop at Ocean Sciences 2012
 
Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2Machine Learning Summary for Caltech2
Machine Learning Summary for Caltech2
 
CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325CARLI Usage Stats Keynote 20130325
CARLI Usage Stats Keynote 20130325
 
How Can Software Engineering Support AI
How Can Software Engineering Support AIHow Can Software Engineering Support AI
How Can Software Engineering Support AI
 
Introduction to Data Mining for Newbies
Introduction to Data Mining for NewbiesIntroduction to Data Mining for Newbies
Introduction to Data Mining for Newbies
 
Cal Poly - Data Management for Researchers
Cal Poly - Data Management for ResearchersCal Poly - Data Management for Researchers
Cal Poly - Data Management for Researchers
 
Weka
WekaWeka
Weka
 
Weka_ITB
Weka_ITBWeka_ITB
Weka_ITB
 
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
Protecting Attribute Disclosure for High Dimensionality and Preserving Publis...
 

More from Carly Strasser

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeCarly Strasser
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015Carly Strasser
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceCarly Strasser
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyCarly Strasser
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014Carly Strasser
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataCarly Strasser
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingCarly Strasser
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarCarly Strasser
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopCarly Strasser
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesCarly Strasser
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopCarly Strasser
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Carly Strasser
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Carly Strasser
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Carly Strasser
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCarly Strasser
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumCarly Strasser
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14Carly Strasser
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Carly Strasser
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishCarly Strasser
 

More from Carly Strasser (20)

Funders and Publishers: Agents of Change
Funders and Publishers: Agents of ChangeFunders and Publishers: Agents of Change
Funders and Publishers: Agents of Change
 
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
AIBS Bioinformatics Workforce Needs Workshop, Dec 2015
 
Data Matters for AGU Early Career Conference
Data Matters for AGU Early Career ConferenceData Matters for AGU Early Career Conference
Data Matters for AGU Early Career Conference
 
Lightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14skyLightning Talk on open data for #oaw14sky
Lightning Talk on open data for #oaw14sky
 
CDL Tools for DataCite 2014
CDL Tools for DataCite 2014CDL Tools for DataCite 2014
CDL Tools for DataCite 2014
 
ESA Ignite talk on quality control for data
ESA Ignite talk on quality control for dataESA Ignite talk on quality control for data
ESA Ignite talk on quality control for data
 
ESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharingESA Ignite talk on UC3 Dash platform for data sharing
ESA Ignite talk on UC3 Dash platform for data sharing
 
Data publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminarData publication and Citation for CLIR postdoc seminar
Data publication and Citation for CLIR postdoc seminar
 
Data Management for Mountain Observatories Workshop
Data Management for Mountain Observatories WorkshopData Management for Mountain Observatories Workshop
Data Management for Mountain Observatories Workshop
 
Libraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch LibrariesLibraries & Research Data Management for CO Alliance of Resrch Libraries
Libraries & Research Data Management for CO Alliance of Resrch Libraries
 
Open Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science WorkshopOpen Science for Australian Institute of Marine Science Workshop
Open Science for Australian Institute of Marine Science Workshop
 
Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014Research Life Cycle for GeoData 2014
Research Life Cycle for GeoData 2014
 
Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014Data Stewardship for SPATIAL/IsoCamp 2014
Data Stewardship for SPATIAL/IsoCamp 2014
 
Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014Data management overview and UC3 tools for IASSIST 2014
Data management overview and UC3 tools for IASSIST 2014
 
Dash for IASSIST 2014
Dash for IASSIST 2014Dash for IASSIST 2014
Dash for IASSIST 2014
 
Coping with Data for WHOI JP Students
Coping with Data for WHOI JP StudentsCoping with Data for WHOI JP Students
Coping with Data for WHOI JP Students
 
DMPTool for UMass eScience Symposium
DMPTool for UMass eScience SymposiumDMPTool for UMass eScience Symposium
DMPTool for UMass eScience Symposium
 
DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14DMPTool 2.0 for #IDCC14
DMPTool 2.0 for #IDCC14
 
Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14Data Publication at CDL for IDCC14
Data Publication at CDL for IDCC14
 
Data Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or PerishData Publication for UC Davis Publish or Perish
Data Publication for UC Davis Publish or Perish
 

Recently uploaded

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 

Recently uploaded (20)

CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 

Data Management for Scientists: Reduce, Reuse, Recycle Your Data

  • 1. Data  Management  for  Scientists     Reduce  your  workload   Reuse  your  ideas   Recycle  your  data     www.oddee.com   Carly  Strasser,  PhD   California  Digital  Library,  UC  Office  of  the  President   carly.strasser@ucop.edu   www.carlystrasser.net  
  • 2. Roadmap   4.  Toolbox     3.  Control   2.  Chaos   1.  Who  are  you?    
  • 3. Roadmap   4.  Toolbox     3.  Control   2.  Chaos   1.  Who  are  you?    
  • 4. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Community   Cyberinfrastructure   Engagement  &   Outreach   From  Flickr  by  ThomasThomas   From  Flickr  by  Langwitches  
  • 5. What  role  can   libraries  play  in   data  education?   Why  don’t  people   What  barriers  to  sharing   share  data?   can  we  eliminate?   Is  data  management   Do  attitudes  about   being  taught?   sharing  differ   among  disciplines?   How  can  we  promote  storing   data  in  repositories?  
  • 6. Roadmap   4.  Toolbox     3.  Control   2.  Chaos   1.  Who  are  you?    
  • 7. Digital  data   +     Complex   workflows  
  • 8. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 9. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 10. UGLY TRUTH Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 11. 2  tables   Random  notes   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  oStephanie  ractices   Modified  from   n  Best  P Hampton  
  • 12. Wash  Cres  Lake  Dec  15  Dont_Use.xls   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 23.78 1.17 From  Stephanie  Hampton  (2010)       ESA  Workshop  oStephanie  ractices   Modified  from   n  Best  P Hampton  
  • 13. C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 SampleID -22.68 -22.22 ALG03 0.34 ALG05 4.31 3.66 ALG07 25376 ALG06 ALG04 ALG02 ALG01 ALG03 ALG07 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 Weight (mg) -29.44 -28.98 2.91 1.74 0.62 2.91 -0.03 25382 3.04 2.95 Square 0.080178 R 3.01 3 2.99 2.92 2.9 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 %C-29.23 6.85 1.07 0.95 35.560.30 25388 33.49 41.17 Observations43.74 11 4.51 1.59 4.37 33.58 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 delta 13C -21.85 -21.11 0.45 4.72 -28.054.07 25392 -29.56 -27.32 ANOVA -27.50 -22.68 -24.58 -21.06 -29.44 C1 ALG04 2.98 37.90 delta 13C_ca -27.42 -26.96 -20.65 1.36 1.21 -27.590.56 25394 -29.10 c -26.86 -27.04 df SS -22.22 MS F -24.12 Significance F -20.60 -28.98 C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 %N 0.48 1.17 2.30 1.68 1.97 Total 1.3610 35.55962 0.34 0.15 0.34 1.74 delta 15N -0.97 0.59 0.79 2.71 0.99 4.31 -1.69 -1.52 0.62 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% delta 15N_ca -1.62 -0.06 0.14 2.06 Intercept -4.297428 4.671099 3.66 0.34 -2.34 -2.17 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 -0.03 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 4.00 3.00 2.00 1.00 Series1 0.00 -35.00 -30.00 -25.00 -20.00 -15.00 -10.00 -5.00 0.00 -1.00 -2.00 -3.00 Modified  from  Stephanie  Hampton  
  • 14. What  is  this?   C:Documents and SettingshamptonMy DocumentsNCEAS Distributed Graduate Seminars[Wash Cres Lake Dec 15 Dont_Use.xls]Sheet1 Stable Isotope Data Sheet Sampling Site / Identifier: Wash Cresc Lake Peter's lab Don't use - old data Sample Type: Algal Washed Rocks Date: Dec. 16 Tray ID and Sequence: Tray 004 13 15 Reference statistics: SD for delta C = 0.07 SD for delta N = 0.15 Position SampleID Weight (mg) %C delta 13C delta 13C_ca %N delta 15N delta 15N_ca Spec. No. A1 ref 0.98 38.27 -25.05 -24.59 1.96 4.12 3.47 25354 A2 ref 0.98 39.78 -25.00 -24.54 2.03 4.01 3.36 25356 A3 ref 0.98 40.37 -24.99 -24.53 2.04 4.09 3.44 25358 A4 ref 1.01 42.23 -25.06 -24.60 2.17 4.20 3.55 25360 Shore Avg Con A5 ALG01 3.05 1.88 -24.34 -23.88 0.17 -1.65 -2.30 25362 c -1.26 -27.22 A6 Lk Outlet Alg 3.06 31.55 -30.17 -29.71 0.92 0.87 0.22 25364 1.26 0.32 A7 ALG03 2.91 6.85 -21.11 -20.65 0.48 -0.97 -1.62 25366 c A8 ALG05 2.91 35.56 -28.05 -27.59 2.30 0.59 -0.06 25368 A9 ALG07 3.04 33.49 -29.56 -29.10 1.68 0.79 0.14 25370 A10 ALG06 2.95 41.17 -27.32 -26.86 1.97 2.71 2.06 25372 B1 ALG04 3.01 43.74 -27.50 -27.04 1.36 0.99 0.34 25374 c SUMMARY OUTPUT B2 ALG02 3 4.51 -22.68 -22.22 0.34 4.31 3.66 25376 B3 ALG01 2.99 1.59 -24.58 -24.12 0.15 -1.69 -2.34 25378 c Regression Statistics B4 ALG03 2.92 4.37 -21.06 -20.60 0.34 -1.52 -2.17 25380 c Multiple R 0.283158 B5 ALG07 2.9 33.58 -29.44 -28.98 1.74 0.62 -0.03 25382 R Square 0.080178 B6 ref 1.01 44.94 -25.00 -24.54 2.59 3.96 3.31 25384 Adjusted R Square -0.022024 B7 ref 0.99 42.28 -24.87 -24.41 2.37 4.33 3.68 25386 Standard Error 1.906378 B8 Lk Outlet Alg 3.04 31.43 -29.69 -29.23 1.07 0.95 0.30 25388 Observations 11 B9 ALG06 3.09 35.57 -27.26 -26.80 1.96 2.79 2.14 25390 B10 ALG02 3.05 5.52 -22.31 -21.85 0.45 4.72 4.07 25392 ANOVA C1 ALG04 2.98 37.90 -27.42 -26.96 1.36 1.21 0.56 25394 c df SS MS F Significance F C2 ALG05 3.04 31.74 -27.93 -27.47 2.40 0.73 0.08 25396 Regression 1 2.851116 2.851116 0.784507 0.398813 C3 ref 0.99 38.46 -25.09 -24.63 2.40 4.37 3.72 25398 Residual 9 32.7085 3.634278 23.78 1.17 Total 10 35.55962 Coefficients Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0% Upper 95.0% Intercept -4.297428 4.671099 -0.920003 0.381568 -14.8642 6.269341 -14.8642 6.269341 X Variable 1-0.158022 0.17841 -0.885724 0.398813 -0.561612 0.245569 -0.561612 0.245569 Modified  from  Stephanie  Hampton  
  • 15. The  path  of  research  products   www Data   Metadata   Recreated  from  Klump  et  al.  2006  
  • 16. The  path  of  research  products   www Data   www Metadata   Recreated  from  Klump  et  al.  2006  
  • 17. Data   Reuse   Data   Sharing   Data   Management  
  • 18. Roadmap   4.  Toolbox     3.  Control   2.  Chaos   1.  Who  are  you?    
  • 19. Roadmap   4.  Toolbox     3.  Control   2.  Chaos   1.  Who  are  you?    
  • 20. •  Unrestricted  access  to  articles*  via  internet   digital   online   free  of  charge   free  of  most  copyright/licensing  restrictions   •  Compatible  with  conventional  scholarly  literature   •  Bills  not  paid  by  readers:  no  barriers  to  access     *Open  access  easily  extends  to  data  
  • 21. Roadmap   4.  Toolbox     3.  Control   2.  Chaos   1.  Who  are  you?    
  • 22. Best  Practices  for   Data  Management   1.  Planning   2.  Data  collection  &  organization   3.  Quality  control  &  assurance   4.  Metadata   5.  Workflows   6.  Data  Stewardship  &  reuse  
  • 23. 1.  Planning   What  is  a  data  management  plan?   A  document  that  describes  what  you  will  do  with  your  data   during  and  after  you  complete  your  research   From  Flicker  by  Ikelee  
  • 24. 1.  Planning   Why  should  I  prepare  a  DMP?       Saves  time   Increases  efficiency   Easier  to  use  data       Others  can  understand  &  use  data   Credit  for  data  products   Funders  protect  their  investment    
  • 25. 1.  Planning   Components  of  a  DMP     1.  Information  about  data  &  data  format   2.  Metadata  content  and  format   3.  Policies  for  access,  sharing  and  re-­‐use   4.  Long-­‐term  storage  and  data  management   5.  Budget  
  • 26. 1.  Planning   dmp.cdlib.org   Step-­‐by-­‐step  wizard  for  generating  DMP   Create    |    edit    |    re-­‐use    |    share    |    save    |    generate     Open  to  community     Links  to  institutional  resources   Directorate  information  &updates  
  • 27. 2.  Data  collection  &  organization   Personal  data  management  problems  build  up   over  time,  &  in  collaboration   plumbinghelptoday.com  
  • 28. 2.  Data  collection  &  organization   Standardize   •  Consistent  within  columns   – only  numbers,  dates,  or  text   •  Consistent  names,  codes,  formats   Modified  from  K.  Vanderbilt     From  Pink  Floyd,  The  Wall      themurkyfringe.com  
  • 29. 2.  Data  collection  &  organization   Standardize   •  Reduce  possibility   of  manual  error  by   constraining  entry   choices   Excel  lists   Data Google  Docs     Forms   validataion   Modified  from  K.  Vanderbilt    
  • 30. 2.  Data  collection  &  organization       Create  parameter  table   Create  a  site  table   From  doi:10.3334/ORNLDAAC/777   From  doi:10.3334/ORNLDAAC/777   From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 31. 2.  Data  collection  &  organization   Use  descriptive  file  names   PhDcomics.com  
  • 32. 2.  Data  collection  &  organization    Use  descriptive  file  names   •  Unique   •  Reflect  contents   Bad:    Mydata.xls   Better:  Eaffinis_nanaimo_2010_counts.xls      2001_data.csv      best  version.txt   Study   Year   organism   Site   name   What  was   measured     From  R  Cook,  ESA  Best  Practices  Workshop  2010  
  • 33. 2.  Data  collection  &  organization   Organize  files    logically   Biodiversity   Lake   Experiments   Biodiv_H20_heatExp_2005to2008.csv   Biodiv_H20_predatorExp_2001to2003.csv   …   Field  work   Biodiv_H20_PlanktonCount_2001toActive.csv   Biodiv_H20_ChlAprofiles_2003.csv   …     Grassland   From  S.  Hampton  
  • 34. 2.  Data  collection  &  organization    Preserve  information   R  script  for  processing  &   analysis   •  Keep  raw  data  raw   •  Use  scripts  to  process  data      &  save  them  with  data   Raw  data  as  .csv  
  • 35. 3.  Quality  control  and  quality  assurance   Define  &  enforce  standards   Double  data  entry   Document  changes   No  missing,  impossible,  or  anomalous  values   •  Perform  statistical  summaries   •  Use  illegal  data  filter   60   •  Look  for  outliers   50   40     30   20   10   0   0   5   10   15   20   25   30   35  
  • 36. 4.  Metadata  basics   What  is  metadata?   Data  reporting     •  WHO  created  the  data?   •  WHAT  is  the  content  of  the  data  set?   •  WHEN  was  it  created?   •  WHERE  was  it  collected?   •  HOW  was  it  developed?   •  WHY  was  it  developed?  
  • 37. •  Scientific  context   4.  Metadata  basics   •  Scientific  reason  why  the  data  were   collected   •  What  data  were  collected   •  Digital  context   •  What  instruments  (including  model  &   •  Name  of  the  data  set   serial  number)  were  used   •  The  name(s)  of  the  data  file(s)  in  the  data   •  Environmental  conditions  during  collection   set   •  Where  collected  &  spatial  resolution  When   •  Date  the  data  set  was  last  modified   collected  &  temporal  resolution   •  Example  data  file  records  for  each  data   •  Standards  or  calibrations  used   type  file   •  Information  about  parameters   •  Pertinent  companion  files   •  How  each  was  measured  or  produced   •  List  of  related  or  ancillary  data  sets   •  Units  of  measure   •  Software  (including  version  number)   •  Format  used  in  the  data  set   used  to  prepare/read    the  data  set   •  Precision  &  accuracy  if  known   •  Data  processing  that  was  performed   •  Information  about  data   •  Personnel  &  stakeholders   •  Definitions  of  codes  used   •  Who  collected     •  Quality  assurance  &  control  measures   •  Who  to  contact  with  questions   •  Known  problems  that  limit  data  use  (e.g.   •  Funders   uncertainty,  sampling  problems)     •  How  to  cite  the  data  set  
  • 38. 4.  Metadata  basics   What  is  a  metadata  standard?   •  Provides  structure  to  describe  data   Common  terms    |    definitions    |    language    |    structure   •  Lots  of  different  standards    EML  ,  FGDC,  ISO19115,  DarwinCore,…     •  Tools  for  creating  metadata  files    Morpho  (EML),  Metavist  (FGDC),  NOAA  MERMaid  (CSGDM)    
  • 39. 4.  Metadata  basics   What  does  a  metadata  record  look  like?  
  • 40. 5.  Workflows   Simplest  workflows:  commented  scripts,  flow  charts   Temperature   data   Data  import  into  R   Data  in  R   Salinity                 format   data   Quality  control  &   “Clean”  T   data  cleaning   &  S  data   Analysis:  mean,  SD   Summary   statistics   Graph  production  
  • 41. 5.  Workflows   Fancy  Schmancy:  Kepler   Resulting  output   https://kepler-­‐project.org  
  • 42. 5.  Workflows   Workflows  enable     From  Flickr  by  merlinprincesse   Reproducibility    can  someone  independently  validate  findings?   Transparency      others  can  understand  how  you  arrived  at  your  results   Executability      others  can  re-­‐run  or  re-­‐use  your  analysis    
  • 43. 6.  Data  stewardship  &  reuse   Data   Reuse   Data   Sharing   Data   Management  
  • 44. 6.  Data  stewardship  &  reuse   From  Flickr  by  greensambaman   The 20-Year Rule The  metadata  accompanying  a   data  set  should  be  written  for  a   user  20  years  into  the  future   RULE       (National  Research  Council  1991)  
  • 45. 6.  Data  stewardship  &  reuse   Use  stable  formats      csv,  txt,  tiff   Create  back-­‐up  copies     original,  near,  far   Periodically  test  ability  to  restore  information   Modified from R. Cook  
  • 46. 6.  Data  stewardship  &  reuse   Where  do  I  put  it?   Insitutional  archive   Discipline/specialty  archive   DataCite  list  of  repostiories:    www.datacite.org/repolist         From  Flickr  by  torkildr  
  • 47. 6.  Data  stewardship  &  reuse   Data  Citation:  Why  everyone  should  do  it   Allow  readers  to  find  data  products   Get  credit  for  data  and  publications   Promote  reproducibility   Better  measure  of  research  impact   Example:   Sidlauskas,  B.  2007.  Data  from:  Testing  for  unequal  rates  of  morphological   diversification  in  the  absence  of  a  detailed  phylogeny:  a  case  study  from   characiform  fishes.  Dryad  Digital  Repository.  doi:10.5061/dryad.20     Modified from R. Cook  
  • 48. Roadmap   4.  Toolbox     3.  How  to  be  good   2.  Bad  scientists   1.  Who  are  you?    
  • 49. NSF  funded  DataNet  Project   Office  of  Cyberinfrastructure   Enabling  universal  access  to  data  about  life  on  earth   and  the  environment  that  sustains  it  
  • 50. B   A   C  
  • 51. B   A   C  
  • 52. B   A   C  
  • 53. www.dataone.org   •  Data  Education  Tutorials  
  • 54. www.dataone.org   •  Data  Education  Tutorials   •  Primer  on  data  management  
  • 55. www.dataone.org   •  Data  Education  Tutorials   •  Primer  on  data  management   •  Database  of  best  practices  &  software  tools   •  List  of  repositories  &  metadata  standards   •  Links  to  DMP  Tool   Investigator  Toolkit   •  ONE-­‐R   •  ONE-­‐Mercury   •  ONE-­‐Drive  
  • 56. E-­‐notebooks   •  NoteBook   •  ORNL  eNote     •  Evernote   •  Google  Docs   •  Blogs   •  wikis   •  TheLabNotebook.com   •  iPad  ELN   •  NoteBookMaker   iPad ELN, the flexible electronic laboratory notebook TheLabNotebook.com!
  • 57. CDL  Services  for  UC  Community   •  Precise  identification  of  a  dataset   •  Credit  to  data  producers  and  data  publishers   •  A  link  from  the  traditional  literature  to  the  data   •  Research  metrics  for  datasets   •  Deposit  content  (i.e.  data)   •  Manage  (metadata,  versions  etc.)   •  Share   •  Access   •  Preserve   www.cdlib.org/services/uc3  
  • 58. •  Open  source  add-­‐in   •  Facilitate  data  management,  sharing,  archiving  for  scientists   •  Part  of  DataONE  investigator  toolkit   •  Collecting  requirements  for  add-­‐in  from  scientists,  data   centers,  libraries   dcxl.cdlib.org     Funders:  Gordon  and  Betty  Moore  Foundation,  Microsoft  Research  
  • 59. Christy  Hightower   Katie  Forney   Ann  Hubble   Cynthia  Moriconi     www.carlystrasser.net   carlystrasser@gmail.com   dcxl.cdlib.org   @carlystrasser   @dcxlCDL   www.facebook.com/DCXLatCDL