Digital Curation for Excel (DCXL)
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Digital Curation for Excel (DCXL)

on

  • 1,856 views

CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate ...

CDL has recently launched a new project dubbed Digital Curation for Excel (DCXL), funded by the Gordon and Betty Moore Foundation and Microsoft Research. The goal of the DCXL project is to facilitate data management, sharing, and archiving for earth, environmental, and ecological scientists. The main result from the project will be an open source add-in for Microsoft Excel that will assist scientists in preparing their Excel data for sharing.

Statistics

Views

Total Views
1,856
Views on SlideShare
1,854
Embed Views
2

Actions

Likes
0
Downloads
11
Comments
1

1 Embed 2

https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Digital Curation for Excel (DCXL) Presentation Transcript

  • 1. DCXL:  Digital  Curation  for  Excel   Funders:  Gordon  &  Betty  Moore  Foundation,  Microsoft  Research   Carly  Strasser   UC3,  California  Digital  Library   carly.strasser@ucop.edu   22  Sept  2011    UC3  Webinar  Series      California  Digital  Library  
  • 2. Community   Build  on  existing   Engagement  cyberinfrastructure   Create  new   cyberinfrastructure   Support   communities  
  • 3. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  Goals  of  DCXL  project  1.  An  overview:  why  is  DCXL  needed?  
  • 4. Digital  data   +     Complex  workAlows  
  • 5. Data   Models   Maximum   Likelihood   estimation   Matrix   Models   Images   Tables   Paper  
  • 6. UGLY TRUTH Most     Earth  |  Environmental  |  Ecological   scientists…    5shortessays.blogspot.com   are  not  taught  data  management   don’t  know  what  metadata  are   can’t  name  data  centers  or  repositories   don’t  share  data  publicly  or  store  it  in  an  archive   aren’t  convinced  they  should  share  data  
  • 7. 2  tables   Random  notes   From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 8. Wash  Cres  Lake  Dec  15  Dont_Use.xls   From  Stephanie  Hampton  (2010)       ESA  Workshop  on  Best  Practices  
  • 9. Collaboration  and  Data  Sharing   9  
  • 10. What  is  this?  
  • 11. The  path  of  research  products   www www.collectionco noaa.gov   nnection.alcts.ala. org   www.Tlickr.com/ photos/csessums   Data   blog.disorder2order.com  Metadata   blog.seattlepi.com   Recreated  from  Klump  et  al.  2006  
  • 12. Data   Reuse   Data   Sharing   Data  Management  
  • 13. The  path  of  research  products   www www.collectionco noaa.gov   nnection.alcts.ala. org   Data   wwwMetadata   digital-­ servers.com   Recreated  from  Klump  et  al.  2006  
  • 14. Barriers   Cost  ttatteredntornprims.blogspot.com/   Time   cultblender.wordpress.com   Software,   Personnel   hardware  
  • 15. Barriers  Cost:  time,  personnel,  software,  hardware   free-­photos.biz  Culture  of  Science   •  Not  the  norm   •  Lack  of  training   •  Disparate  data  
  • 16. Barriers   Cost:  time,  personnel,  software,  hardware   Culture  of  Science   Loss  of  rights  or  bene:its   wattsupwiththat.com  colouringbook.org   Misuse  of   data   Missed   opportunities   ConZlict  
  • 17. Barriers  Cost:  time,  personnel,  software,  hardware  Culture  of  Science  Loss  of  rights  or  bene:its  Lack  of  incentives   Time  consuming   &  expensive   Reward   structure   Few   requirements  georgevanantwerp.com  
  • 18. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  DCXL  project  overview  1.  An  overview:  why  is  DCXL  needed?  
  • 19. DCXL  Project  Goals   “A  transformation  in  the  conduct  of  a  segment  of  scientiTic   research  by  enabling  and  promoting  publishing,  sharing,   and  archiving  of  tabular  data”  •  Increase    interoperability   =  Sharing        publishability   =  Publishing        archivability               =  Archiving  •  Focus  on  atmospheric,  ecological,  hydrological,   and  oceanographic  data  
  • 20. DCXL  Project  Goals   Open  Source  &  Free     Excel  Add-­in   Software  program  that  extends  the  capabilities   of  larger  programs   Complements  basic  Excel  functionality   From  www.webopedia.com   www.ablebits.com  
  • 21. DCXL  Add-­in  Goals   Easier   Archiving   Sharing   Harder   Publishing  
  • 22. DCXL  Project  Deliverables  •  Excel  add-­‐in  •  Publicly  available  source  code  •  Technical  documentation  •  End  user  documentation    •  Publicly  available   requirements  •  Community     storageplusgulfport.com  
  • 23. DCXL  Project  Outcomes    Enable  citation  &  allow  credit    Enable  policy  enactment    Enable  re-­‐use  by  eliminating  barriers    Save  time  for  researcher      Encourage  creation  of  extensions  
  • 24. Process  Assess  needs  •  Quantitative   –  Surveys  
  • 25. Process  Assess  needs  •  Quantitative   –  Surveys   –  Quick  poll  
  • 26. Process  Assess  needs  •  Quantitative   ? –  Surveys   –  Quick  poll  •  Qualitative   –  Interviews  
  • 27. Process  Assess  needs  Gather  requirements   Recruitment  tools   DCXL/data  management  seminars   Listservs  &  email   Blog,  Facebook,  Twitter   Face-­‐to-­‐face  interactions   Flyers  
  • 28. Process  Assess  needs  Gather  requirements   Locations    Conferences    UC  campus  visits    Remote/web-­‐based  
  • 29. Process  Assess  needs  Gather  requirements   Stakeholders  &  contributors      Libraries    Scientists    Repositories    Experts:  MSR,  GBMF    Personnel  on  related  projects  
  • 30. Process   Social  media,  emails,   Social  media,   campus  visits   emails   CDL   Email   Data   Libraries   Seminars   Flyers   Centers   Social  media   Scientists   Quick  poll   Survey   Interview   Related   Funders   projects   Requirements  
  • 31. Implementation  Assess  needs  Gather  requirements  Build  requirements  document  
  • 32. Implementation  Assess  needs  Gather  requirements  Build  requirements  document  Build  community   Libraries   Scientists   Repositories   Programmers/ Developers    
  • 33. Timeline   26 Sept DCXL Kickoff Meeting 7 Oct Finalize Requirements Gathering Framework 9 Nov 1st draft of Requirements to MSR 30 Nov 2nd draft of Requirements to MSR 5-9 Dec AGU Meeting, San Francisco 15 Dec Final Requirements to MSR 2012 16 Jan Receive Excel Add-in Version 1 23 Jan Rollout Excel Add-in Version 1 16-19 Feb AAAS meeting: Add-in user testing 20-24 Feb Ocean Sciences meeting: Add-in user testing 26 Feb 1st Draft of updated Requirements based on Version 1 to MSR 2 Apr Deliver updated Requirements based on Version 1 to MSR 28 May Receive Excel Add-in Version 2 29 May- 24 Jun User testing of Version 2 25 Jun Rollout Excel Add-in Version 2 7-10 July CSEE meeting: Add-in debut & demo 13 July Final code, technical documentation, and requirements published 31 July End user documentation published
  • 34. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  DCXL  project  overview  1.  An  overview:  why  is  DCXL  needed?  
  • 35. Ecological  Society  of  America   Summer  2011  Meeting  
  • 36. ESA  Overview  •  Everyone  uses  Excel   –  Most  use  Excel  for  organizing  raw  data   –  Most  import  spreadsheets  into  other  programs  for  analysis   –  ~75%  are  embarrassed  about  using  Excel  •  Excitement  about  open  source  •  Minimal  knowledge  about  data  management,   organization,  and  archiving  •  55  surveys  from  diverse  group  
  • 37. Operating  System  50  45  40  35  30  25  20  15  10   5   0   Mac   PC   Linux  
  • 38. Use  Excel  for...   Sharing  Other  Analyses   Statistics   Visualization   Organization   0   10   20   30   40   50   60   #  Respondents  (out  of  55)  
  • 39. How  often  do  you  use  Excel?   30   25  #  repsondents   20   15   10   5   0   Never   Rarely   Every   Every  day   day  
  • 40. What  features  are  used  in  Excel?   Comments   Cell  shading   Macros  Embedded  formulas   Headers   Pivot  Tables   Multiple  Tabs   Multiple  Tables   0   10   20   30   40   50   60   70   80   90   100   Percent  
  • 41. Ray  Troll  (trollart.com)  American  Fisheries  Society   Summer  2011  Meeting  
  • 42. AFS  Overview  •  Everyone  uses  Excel  •  Most  use  it  only  for  data  organization  and  sharing  •  36  surveys  from  diverse  group  •  Heavy  MS  Access  use  •  100%  PC  
  • 43. How  often  do  you  use  Excel?   18   16   14   12  #  respondents   10   8   6   4   2   0   Rarely   Every  day  
  • 44. Tasks  performed  in  Excel?   Sharing  data  Simple  Calculations   Statistics   Visualizing  data   Organizing  data   0   10   20   30   40   50   60   70   80   90   100   %  respondents  (n  =  36)  
  • 45. What  should  the  add-­in  help  you  do?   60   50  %    Respondents   40   30   20   10   0   Organize  my   Organize  my   Archive  my   Create   Share  my  data   No  opinion   data  for  my   data  for  others   data   metadata   publicly   own  use   to  use  more   easily  
  • 46. AFS  Overview  •  Everyone  uses  Excel  •  Most  use  it  only  for  data  organization  and  sharing  •  36  surveys  from  diverse  group  •  Heavy  MS  Access  use  •  100%  PC  •  Data  hoarders   Myoverstuffedbookshelf.blogspot.com  
  • 47. Roadmap   4.  How  to  get  involved  in  DCXL   3.  Progress  &  future  plans   2.  DCXL  project  overview  1.  An  overview:  why  is  DCXL  needed?  
  • 48. Get  Involved   dcxl.cdlib.org     Now:     General  info   Blog   Forum   Calendar   Later:     Requirements   Documentation  
  • 49. Get  Involved   @dcxlCDL   www.facebook.com/ DCXLatCDL  
  • 50. Acknowledgements  •  CDL:  Rachael  Hu,  Trisha  Cruse,  John  Kunze,  Tracy  Seneca  •  MSR:  Lee  Dirks  •  GBMF:  Chris  Mentzel   Carly  Strasser   carly.strasser@ucop.edu