Facilitating	
  data	
  stewardship	
  
                    practices	
  for	
  scientists	
  
                                   	
  


Carly	
  Strasser	
  |	
  carly.strasser@ucop.edu	
  |	
  www.carlystrasser.net	
  
        Open	
  Access	
  symposium	
  |	
  University	
  of	
  North	
  Texas	
  |	
  May	
  2012	
  
UGLY	
  TRUTH	
  
                                                    Many	
  
                                                    Earth	
  |	
  Environmental	
  |	
  Ecological	
  
                                                    scientists…	
  	
  
                                                    	
  
5shortessays.blogspot.com	
  



                                                                 	
  
                          are	
  not	
  taught	
  data	
  management	
  
                          don’t	
  know	
  what	
  metadata	
  are	
  
                          can’t	
  name	
  data	
  centers	
  or	
  repositories	
  
                          don’t	
  share	
  data	
  publicly	
  or	
  store	
  it	
  in	
  an	
  archive	
  
                          aren’t	
  convinced	
  they	
  should	
  share	
  data	
  

                                                                           	
  
Where	
  data	
  end	
  up	
  
                                                       From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                  www




                         blog.order2disorder.com	
  




                                                                                                  From	
  Flickr	
  by	
  csessums	
  
  Data	
  
Metadata	
  




                                                                                                      From	
  Flickr	
  by	
  csessums	
  
                                                                         Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Where	
  data	
  end	
  up	
  
                                                                    From	
  Flickr	
  by	
  diylibrarian	
  




                                                                                                               www




  Data	
  
                                                                                          www
Metadata	
  
                             From	
  Flickr	
  by	
  torkildr	
  




                                                                                      Recreated	
  from	
  Klump	
  et	
  al.	
  2006	
  
Intercept	
  the	
  
 researchers	
  where	
  
they	
  already	
  work:	
  
Frequency	
  of	
  
                                                           Excel	
  use	
                    Rare	
  or	
  
                                                                                             occasional	
  
                                                                                             use	
  
                                                                                                        Moderate	
  
                                                                                                        use	
  
            Percent	
  of	
  respondents	
  who	
  use	
  
            Excel	
  for	
  these	
  tasks	
  
100	
                                                                                Every	
  day	
  
  90	
                                                                               or	
  almost	
  
  80	
                                                                               every	
  day	
  
  70	
  
  60	
  
  50	
  
  40	
  
  30	
  
  20	
  
  10	
  
    0	
  
             Organizing	
     Visualizing	
     Sta:s:cs	
     Sharing	
  data	
  
                data	
           data	
  
Facilitate	
  
                        Archiving	
  
        Data	
                              Data	
  Reuse	
  
management	
             Sharing	
  
&	
  organization	
                       Reproducibility	
  
                        Publishing	
  
•    Open	
  source	
  add-­‐in	
  &	
  web	
  application	
  
•    Facilitate	
  data	
  management,	
  sharing,	
  archiving	
  for	
  scientists	
  
•    Focus	
  on	
  atmospheric,	
  ecological,	
  hydrological,	
  and	
  
     oceanographic	
  data	
  
•    Collect	
  requirements	
  for	
  add-­‐in	
  from	
  scientists,	
  data	
  
     centers,	
  libraries	
  
Add-­‐in	
  &	
  Web	
  Application?	
  
Add-­‐in	
  	
  
•  Little	
  pieces	
  of	
  software	
  	
  
•  Download	
  to	
  extend	
  the	
  capabilities	
  of	
  Excel	
  
•  Appear	
  as	
  “ribbon”	
  in	
  Excel	
  
•  Only	
  work	
  with	
  Windows	
  Excel	
  2007+	
  
•  Available	
  offline	
  but	
  updates	
  difficult	
  




                                                                  www.ablebits.com	
  
Add-­‐in	
  &	
  Web	
  Application?	
  
Add-­‐in	
  	
  
•  Little	
  pieces	
  of	
  software	
  	
  
•  Download	
  to	
  extend	
  the	
  capabilities	
  of	
  Excel	
  
•  Appear	
  as	
  “ribbon”	
  in	
  Excel	
  
•  Only	
  work	
  with	
  Windows	
  Excel	
  2007+	
  
•  Available	
  offline	
  but	
  updates	
  difficult	
  
Web-­‐based	
  application	
  	
  
•  Websites	
  that	
  do	
  something	
  with	
  info/files	
  provided	
  by	
  user	
  
•  Examples:	
  Facebook,	
  YouTube	
  
•  No	
  program	
  download	
  required	
  but	
  updates	
  easy	
  
•  New	
  user	
  interface	
  to	
  learn	
  
What	
  will	
  DCXL	
  do?	
  




 What	
  do	
  scientists	
  
         need?	
  
~ 150	
  scientists	
  
•  No	
  data	
  preservation	
  
   –  Unaware	
  of	
  archives	
  
   –  Resistant	
  to	
  sharing	
  
•  Poor	
  data	
  documentation	
  
•  90%	
  use	
  other	
  programs	
  along	
  with	
  Excel	
  
Requirements	
  
1.   Must	
  work	
  for	
  Excel	
  users	
  without	
  the	
  add-­‐in	
  
2.   No	
  additional	
  software	
  necessary	
  
3.   Can	
  be	
  used	
  offline	
  
4.   Perform	
  CSV	
  compatibility	
  checks,	
  reporting,	
  and	
  automated	
  fixes	
  
5.   Add	
  Metadata	
  to	
  data	
  file	
  
      a.  Can	
  use	
  existing	
  metadata	
  as	
  a	
  template	
  
      b.  Add-­‐in	
  can	
  automatically	
  generate	
  some	
  of	
  the	
  metadata	
  
            where	
  the	
  info	
  is	
  available	
  from	
  the	
  file	
  
6.  Generate	
  a	
  citation	
  for	
  the	
  data	
  file	
  
7.  Deposit	
  data	
  and	
  metadata	
  in	
  a	
  repository	
  
	
  
Requirements	
  


Features	
  
1.  Compatibility	
  Check	
  
2.  Generate	
  metadata	
  
3.  Generate	
  citation	
  
4.  Post	
  data	
  to	
  repository	
  
DCXL	
  Add-­‐in	
  Ribbon	
  
Open	
  Access?	
  
Vision	
  for	
  Future	
  
•  Community	
  adoption	
  
•  Extension	
  to	
  other	
  programs	
  
   –  Google	
  Docs,	
  OpenOffice	
  
•  Incorporation	
  of	
  other	
  metadata	
  schemas	
  
•  Repository	
  adoption	
  
•  Partnerships:	
  FigShare,	
  F1000,	
  USGS,	
  etc.	
  
Website:	
  dcxl.cdlib.org	
  
dcxl.cdlib.org	
  
@dcxlCDL	
  
www.facebook.com/DCXLatCDL	
  


                                     www.carlystrasser.net	
  
                                 carlystrasser@gmail.com	
  
                                            @carlystrasser	
  

DataUp: Data Curation for Excel

  • 1.
    Facilitating  data  stewardship   practices  for  scientists     Carly  Strasser  |  carly.strasser@ucop.edu  |  www.carlystrasser.net   Open  Access  symposium  |  University  of  North  Texas  |  May  2012  
  • 2.
    UGLY  TRUTH   Many   Earth  |  Environmental  |  Ecological   scientists…       5shortessays.blogspot.com     are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data    
  • 3.
    Where  data  end  up   From  Flickr  by  diylibrarian   www blog.order2disorder.com   From  Flickr  by  csessums   Data   Metadata   From  Flickr  by  csessums   Recreated  from  Klump  et  al.  2006  
  • 4.
    Where  data  end  up   From  Flickr  by  diylibrarian   www Data   www Metadata   From  Flickr  by  torkildr   Recreated  from  Klump  et  al.  2006  
  • 5.
    Intercept  the   researchers  where   they  already  work:  
  • 6.
    Frequency  of   Excel  use   Rare  or   occasional   use   Moderate   use   Percent  of  respondents  who  use   Excel  for  these  tasks   100   Every  day   90   or  almost   80   every  day   70   60   50   40   30   20   10   0   Organizing   Visualizing   Sta:s:cs   Sharing  data   data   data  
  • 8.
    Facilitate   Archiving   Data   Data  Reuse   management   Sharing   &  organization   Reproducibility   Publishing  
  • 9.
    •  Open  source  add-­‐in  &  web  application   •  Facilitate  data  management,  sharing,  archiving  for  scientists   •  Focus  on  atmospheric,  ecological,  hydrological,  and   oceanographic  data   •  Collect  requirements  for  add-­‐in  from  scientists,  data   centers,  libraries  
  • 10.
    Add-­‐in  &  Web  Application?   Add-­‐in     •  Little  pieces  of  software     •  Download  to  extend  the  capabilities  of  Excel   •  Appear  as  “ribbon”  in  Excel   •  Only  work  with  Windows  Excel  2007+   •  Available  offline  but  updates  difficult   www.ablebits.com  
  • 11.
    Add-­‐in  &  Web  Application?   Add-­‐in     •  Little  pieces  of  software     •  Download  to  extend  the  capabilities  of  Excel   •  Appear  as  “ribbon”  in  Excel   •  Only  work  with  Windows  Excel  2007+   •  Available  offline  but  updates  difficult   Web-­‐based  application     •  Websites  that  do  something  with  info/files  provided  by  user   •  Examples:  Facebook,  YouTube   •  No  program  download  required  but  updates  easy   •  New  user  interface  to  learn  
  • 12.
    What  will  DCXL  do?   What  do  scientists   need?  
  • 13.
    ~ 150  scientists   •  No  data  preservation   –  Unaware  of  archives   –  Resistant  to  sharing   •  Poor  data  documentation   •  90%  use  other  programs  along  with  Excel  
  • 14.
    Requirements   1.  Must  work  for  Excel  users  without  the  add-­‐in   2.  No  additional  software  necessary   3.  Can  be  used  offline   4.  Perform  CSV  compatibility  checks,  reporting,  and  automated  fixes   5.  Add  Metadata  to  data  file   a.  Can  use  existing  metadata  as  a  template   b.  Add-­‐in  can  automatically  generate  some  of  the  metadata   where  the  info  is  available  from  the  file   6.  Generate  a  citation  for  the  data  file   7.  Deposit  data  and  metadata  in  a  repository    
  • 15.
    Requirements   Features   1. Compatibility  Check   2.  Generate  metadata   3.  Generate  citation   4.  Post  data  to  repository  
  • 16.
  • 17.
  • 18.
    Vision  for  Future   •  Community  adoption   •  Extension  to  other  programs   –  Google  Docs,  OpenOffice   •  Incorporation  of  other  metadata  schemas   •  Repository  adoption   •  Partnerships:  FigShare,  F1000,  USGS,  etc.  
  • 19.
  • 20.
    dcxl.cdlib.org   @dcxlCDL   www.facebook.com/DCXLatCDL   www.carlystrasser.net   carlystrasser@gmail.com   @carlystrasser