DCXL:	  Digital	  Curation	  for	  Excel	      Funders:	  Gordon	  &	  Betty	  Moore	  Foundation,	  Microsoft	  Research	...
Community	   Build	  on	  existing	                      Engagement	  cyberinfrastructure	                            Crea...
Roadmap	                                   4.  How	  to	  get	  involved	  in	  DCXL	                         3.  Progress...
Digital	  data	       +	  	   Complex	  workAlows	  
Data	                               Models	                      Maximum	                      Likelihood	                ...
UGLY TRUTH                                             Most	  	                                               Earth	  |	  ...
2	  tables	     Random	  notes	                                          From	  Stephanie	  Hampton	  (2010)	          	  ...
Wash	  Cres	  Lake	  Dec	  15	  Dont_Use.xls	                                                     From	  Stephanie	  Hampt...
Collaboration	  and	  Data	  Sharing	                                       9	  
What	  is	  this?	  
The	  path	  of	  research	  products	                                                                                   w...
Data	     Reuse	     Data	    Sharing	     Data	  Management	  
The	  path	  of	  research	  products	                                                                            www     ...
Barriers	       Cost	  ttatteredntornprims.blogspot.com/	                                          Time	                  ...
Barriers	  Cost:	  time,	  personnel,	  software,	  hardware	                                                             ...
Barriers	    Cost:	  time,	  personnel,	  software,	  hardware	    Culture	  of	  Science	    Loss	  of	  rights	  or	  be...
Barriers	  Cost:	  time,	  personnel,	  software,	  hardware	  Culture	  of	  Science	  Loss	  of	  rights	  or	  bene:its...
Roadmap	                                   4.  How	  to	  get	  involved	  in	  DCXL	                         3.  Progress...
DCXL	  Project	  Goals	      “A	  transformation	  in	  the	  conduct	  of	  a	  segment	  of	  scientiTic	       research...
DCXL	  Project	  Goals	                 Open	  Source	  &	  Free	  	                    Excel	  Add-­in	        Software	 ...
DCXL	  Add-­in	  Goals	            Easier	      Archiving	                           Sharing	           Harder	      Publi...
DCXL	  Project	  Deliverables	  •  Excel	  add-­‐in	  •  Publicly	  available	  source	  code	  •  Technical	  documentati...
DCXL	  Project	  Outcomes	       Enable	  citation	  &	  allow	  credit	       Enable	  policy	  enactment	       Enabl...
Process	  Assess	  needs	  •  Quantitative	     –  Surveys	  
Process	  Assess	  needs	  •  Quantitative	     –  Surveys	     –  Quick	  poll	  
Process	  Assess	  needs	  •  Quantitative	                            ?   –  Surveys	     –  Quick	  poll	  •  Qualitativ...
Process	  Assess	  needs	  Gather	  requirements	         Recruitment	  tools	           DCXL/data	  management	  seminars...
Process	  Assess	  needs	  Gather	  requirements	          Locations	            	  Conferences	            	  UC	  campus...
Process	  Assess	  needs	  Gather	  requirements	          Stakeholders	  &	  contributors	  	            	  Libraries	   ...
Process	                Social	  media,	  emails,	                               Social	  media,	                   campus...
Implementation	  Assess	  needs	  Gather	  requirements	  Build	  requirements	  document	  
Implementation	  Assess	  needs	  Gather	  requirements	  Build	  requirements	  document	  Build	  community	    Librarie...
Timeline	              26 Sept     DCXL Kickoff Meeting             7 Oct      Finalize Requirements Gathering Framework  ...
Roadmap	                                   4.  How	  to	  get	  involved	  in	  DCXL	                         3.  Progress...
Ecological	  Society	  of	  America	      Summer	  2011	  Meeting	  
ESA	  Overview	  •  Everyone	  uses	  Excel	      –  Most	  use	  Excel	  for	  organizing	  raw	  data	      –  Most	  im...
Operating	  System	  50	  45	  40	  35	  30	  25	  20	  15	  10	    5	    0	            Mac	                PC	           ...
Use	  Excel	  for...	            Sharing	  Other	  Analyses	          Statistics	    Visualization	    Organization	      ...
How	  often	  do	  you	  use	  Excel?	                         30	                         25	  #	  repsondents	          ...
What	  features	  are	  used	  in	  Excel?	             Comments	           Cell	  shading	                  Macros	  Embe...
Ray	  Troll	  (trollart.com)	  American	  Fisheries	  Society	    Summer	  2011	  Meeting	  
AFS	  Overview	  •  Everyone	  uses	  Excel	  •  Most	  use	  it	  only	  for	  data	  organization	  and	  sharing	  •  3...
How	  often	  do	  you	  use	  Excel?	                         18	                         16	                         14	...
Tasks	  performed	  in	  Excel?	           Sharing	  data	  Simple	  Calculations	                Statistics	      Visuali...
What	  should	  the	  add-­in	  help	  you	  do?	                             60	                             50	  %	  	  ...
AFS	  Overview	  •  Everyone	  uses	  Excel	  •  Most	  use	  it	  only	  for	  data	  organization	  and	  sharing	  •  3...
Roadmap	                                   4.  How	  to	  get	  involved	  in	  DCXL	                         3.  Progress...
Get	  Involved	   dcxl.cdlib.org	  	   Now:	  	   General	  info	   Blog	   Forum	   Calendar	   Later:	  	   Requirements...
Get	  Involved	               @dcxlCDL	                        www.facebook.com/                      DCXLatCDL	  
Acknowledgements	  •  CDL:	  Rachael	  Hu,	  Trisha	  Cruse,	  John	  Kunze,	  Tracy	  Seneca	  •  MSR:	  Lee	  Dirks	  • ...
Upcoming SlideShare
Loading in...5
×

Digital Curation for Excel (DCXL)

1,515

Published on

CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.

Published in: Technology, Education
1 Comment
0 Likes
Statistics
Notes
  • See my other presentations at www.slideshare.net/carlystrasser
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

No Downloads
Views
Total Views
1,515
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
12
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

Digital Curation for Excel (DCXL)

  1. 1. DCXL:  Digital  Curation  for  Excel   Funders:  Gordon  &  Betty  Moore  Foundation,  Microsoft  Research   Carly  Strasser   UC3,  California  Digital  Library   carly.strasser@ucop.edu   22  Sept  2011    UC3  Webinar  Series      California  Digital  Library  
  2. 2. Community   Build  on  existing   Engagement  cyberinfrastructure   Create  new   cyberinfrastructure   Support   communities  
  3. 3. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  Goals  of  DCXL  project  1.  An  overview:  why  is  DCXL  needed?  
  4. 4. Digital  data   +     Complex  workAlows  
  5. 5. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  6. 6. UGLY TRUTH Most     Earth  |  Environmental  |  Ecological   scientists…    5shortessays.blogspot.com   are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data  
  7. 7. 2  tables   Random  notes   From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  8. 8. Wash  Cres  Lake  Dec  15  Dont_Use.xls   From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  9. 9. Collaboration  and  Data  Sharing   9  
  10. 10. What  is  this?  
  11. 11. The  path  of  research  products   www www.collectionco noaa.gov   nnection.alcts.ala. org   www.Tlickr.com/ photos/csessums   Data   blog.disorder2order.com  Metadata   blog.seattlepi.com   Recreated  from  Klump  et  al.  2006  
  12. 12. Data   Reuse   Data   Sharing   Data  Management  
  13. 13. The  path  of  research  products   www www.collectionco noaa.gov   nnection.alcts.ala. org   Data   wwwMetadata   digital-­ servers.com   Recreated  from  Klump  et  al.  2006  
  14. 14. Barriers   Cost  ttatteredntornprims.blogspot.com/   Time   cultblender.wordpress.com   Software,   Personnel   hardware  
  15. 15. Barriers  Cost:  time,  personnel,  software,  hardware   free-­photos.biz  Culture  of  Science   •  Not  the  norm   •  Lack  of  training   •  Disparate  data  
  16. 16. Barriers   Cost:  time,  personnel,  software,  hardware   Culture  of  Science   Loss  of  rights  or  bene:its   wattsupwiththat.com  colouringbook.org   Misuse  of   data   Missed   opportunities   ConZlict  
  17. 17. Barriers  Cost:  time,  personnel,  software,  hardware  Culture  of  Science  Loss  of  rights  or  bene:its  Lack  of  incentives   Time  consuming   &  expensive   Reward   structure   Few   requirements  georgevanantwerp.com  
  18. 18. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  DCXL  project  overview  1.  An  overview:  why  is  DCXL  needed?  
  19. 19. DCXL  Project  Goals   “A  transformation  in  the  conduct  of  a  segment  of  scientiTic   research  by  enabling  and  promoting  publishing,  sharing,   and  archiving  of  tabular  data”  •  Increase    interoperability   =  Sharing        publishability   =  Publishing        archivability               =  Archiving  •  Focus  on  atmospheric,  ecological,  hydrological,   and  oceanographic  data  
  20. 20. DCXL  Project  Goals   Open  Source  &  Free     Excel  Add-­in   Software  program  that  extends  the  capabilities   of  larger  programs   Complements  basic  Excel  functionality   From  www.webopedia.com   www.ablebits.com  
  21. 21. DCXL  Add-­in  Goals   Easier   Archiving   Sharing   Harder   Publishing  
  22. 22. DCXL  Project  Deliverables  •  Excel  add-­‐in  •  Publicly  available  source  code  •  Technical  documentation  •  End  user  documentation    •  Publicly  available   requirements  •  Community     storageplusgulfport.com  
  23. 23. DCXL  Project  Outcomes    Enable  citation  &  allow  credit    Enable  policy  enactment    Enable  re-­‐use  by  eliminating  barriers    Save  time  for  researcher      Encourage  creation  of  extensions  
  24. 24. Process  Assess  needs  •  Quantitative   –  Surveys  
  25. 25. Process  Assess  needs  •  Quantitative   –  Surveys   –  Quick  poll  
  26. 26. Process  Assess  needs  •  Quantitative   ? –  Surveys   –  Quick  poll  •  Qualitative   –  Interviews  
  27. 27. Process  Assess  needs  Gather  requirements   Recruitment  tools   DCXL/data  management  seminars   Listservs  &  email   Blog,  Facebook,  Twitter   Face-­‐to-­‐face  interactions   Flyers  
  28. 28. Process  Assess  needs  Gather  requirements   Locations    Conferences    UC  campus  visits    Remote/web-­‐based  
  29. 29. Process  Assess  needs  Gather  requirements   Stakeholders  &  contributors      Libraries    Scientists    Repositories    Experts:  MSR,  GBMF    Personnel  on  related  projects  
  30. 30. Process   Social  media,  emails,   Social  media,   campus  visits   emails   CDL   Email   Data   Libraries   Seminars   Flyers   Centers   Social  media   Scientists   Quick  poll   Survey   Interview   Related   Funders   projects   Requirements  
  31. 31. Implementation  Assess  needs  Gather  requirements  Build  requirements  document  
  32. 32. Implementation  Assess  needs  Gather  requirements  Build  requirements  document  Build  community   Libraries   Scientists   Repositories   Programmers/ Developers    
  33. 33. Timeline   26 Sept DCXL Kickoff Meeting 7 Oct Finalize Requirements Gathering Framework 9 Nov 1st draft of Requirements to MSR 30 Nov 2nd draft of Requirements to MSR 5-9 Dec AGU Meeting, San Francisco 15 Dec Final Requirements to MSR 2012 16 Jan Receive Excel Add-in Version 1 23 Jan Rollout Excel Add-in Version 1 16-19 Feb AAAS meeting: Add-in user testing 20-24 Feb Ocean Sciences meeting: Add-in user testing 26 Feb 1st Draft of updated Requirements based on Version 1 to MSR 2 Apr Deliver updated Requirements based on Version 1 to MSR 28 May Receive Excel Add-in Version 2 29 May- 24 Jun User testing of Version 2 25 Jun Rollout Excel Add-in Version 2 7-10 July CSEE meeting: Add-in debut & demo 13 July Final code, technical documentation, and requirements published 31 July End user documentation published
  34. 34. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  DCXL  project  overview  1.  An  overview:  why  is  DCXL  needed?  
  35. 35. Ecological  Society  of  America   Summer  2011  Meeting  
  36. 36. ESA  Overview  •  Everyone  uses  Excel   –  Most  use  Excel  for  organizing  raw  data   –  Most  import  spreadsheets  into  other  programs  for  analysis   –  ~75%  are  embarrassed  about  using  Excel  •  Excitement  about  open  source  •  Minimal  knowledge  about  data  management,   organization,  and  archiving  •  55  surveys  from  diverse  group  
  37. 37. Operating  System  50  45  40  35  30  25  20  15  10   5   0   Mac   PC   Linux  
  38. 38. Use  Excel  for...   Sharing  Other  Analyses   Statistics   Visualization   Organization   0   10   20   30   40   50   60   #  Respondents  (out  of  55)  
  39. 39. How  often  do  you  use  Excel?   30   25  #  repsondents   20   15   10   5   0   Never   Rarely   Every   Every  day   day  
  40. 40. What  features  are  used  in  Excel?   Comments   Cell  shading   Macros  Embedded  formulas   Headers   Pivot  Tables   Multiple  Tabs   Multiple  Tables   0   10   20   30   40   50   60   70   80   90   100   Percent  
  41. 41. Ray  Troll  (trollart.com)  American  Fisheries  Society   Summer  2011  Meeting  
  42. 42. AFS  Overview  •  Everyone  uses  Excel  •  Most  use  it  only  for  data  organization  and  sharing  •  36  surveys  from  diverse  group  •  Heavy  MS  Access  use  •  100%  PC  
  43. 43. How  often  do  you  use  Excel?   18   16   14   12  #  respondents   10   8   6   4   2   0   Rarely   Every  day  
  44. 44. Tasks  performed  in  Excel?   Sharing  data  Simple  Calculations   Statistics   Visualizing  data   Organizing  data   0   10   20   30   40   50   60   70   80   90   100   %  respondents  (n  =  36)  
  45. 45. What  should  the  add-­in  help  you  do?   60   50  %    Respondents   40   30   20   10   0   Organize  my   Organize  my   Archive  my   Create   Share  my  data   No  opinion   data  for  my   data  for  others   data   metadata   publicly   own  use   to  use  more   easily  
  46. 46. AFS  Overview  •  Everyone  uses  Excel  •  Most  use  it  only  for  data  organization  and  sharing  •  36  surveys  from  diverse  group  •  Heavy  MS  Access  use  •  100%  PC  •  Data  hoarders   Myoverstuffedbookshelf.blogspot.com  
  47. 47. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  DCXL  project  overview  1.  An  overview:  why  is  DCXL  needed?  
  48. 48. Get  Involved   dcxl.cdlib.org     Now:     General  info   Blog   Forum   Calendar   Later:     Requirements   Documentation  
  49. 49. Get  Involved   @dcxlCDL   www.facebook.com/ DCXLatCDL  
  50. 50. Acknowledgements  •  CDL:  Rachael  Hu,  Trisha  Cruse,  John  Kunze,  Tracy  Seneca  •  MSR:  Lee  Dirks  •  GBMF:  Chris  Mentzel   Carly  Strasser   carly.strasser@ucop.edu  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×