Cal Poly - Data Management and the DMPTool


Published on

October 17, 2013 @ Robert E. Kennedy Library, Data Studio, California Polytechnic State University.
Many funders now require researchers to submit a Data Management Plan alongside their project proposals. The DMPTool is a free, online wizard that helps you create a data management plan specific to your project, and provides you with links and resources for ensuring your plan is successful.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Cal Poly - Data Management and the DMPTool

  1. 1. From  Flickr  by  OZinOH   The  Data  Management   Planning  Tool   DMPTool       Carly  Strasser      |  @carlystrasser   California  Digital  Library     Cal  Poly   Oct  2013  
  2. 2. Roadmap   Background   Current  DMP  tools   A  walkthrough  of   DMPTool   From  Flickr  by    (Luciano)   DMPTool2  
  3. 3. From  Flickr  by  US  Army  Environmental  Command   From  Flickr  by    deltaMike   From  Flickr  by    DW0825   C.  Strasser   From  Flickr  by  Flickmor   Courtesey  of  WHOI  
  4. 4. Data  management   Documentation   Reproducibility   From  Flickr  by  ~Minnea~  
  5. 5. From  Flickr  by    ahockey   DMP s!  
  6. 6. What  is  a  data   management  plan?   A  document  that  describes   what  you  will  do  with  your   data  both   during  your  research     and  after  you  complete   your  project   From  Flickr  by  Barbies  Land  
  7. 7. From  Flickr  by  401(K)  2013   For  funders:   A  short  plan  submitted   alongside  grant  applications    An  outline  of     –  –  –  –  –  –  what  will  be  collected   methods   But  they   Standards   all  have   different  requirements   Metadata   and  express  them  in   sharing/access   different  ways   long-­‐term  storage    Includes  how  and  why  
  8. 8. Why  prepare  a  DMP?   From  Flickr  by  natalinha     •  Saves  time   •  Increases  research   efficiency   •  Satisfies  requirements  
  9. 9. When  do  you  plan?   Yes.  
  10. 10. When  do  you  plan?   Plan   Proposal   writing   Research   Ideas   Collect   Analyze   Assure   Integrate   Discover   Publication   Describe   Preserve  
  11. 11. Remember!   From  Flickr  by  gmacfadyen   •  Keep  your  plan  current   •  Incorporate  changes     •  Use  as  a  guide  for  daily   activities    
  12. 12. Where  to  start?   From  Flickr  by  celikins  
  13. 13. Small  &  Simple   •  Document  what  you  know  now   •  Share  the  plan  with  your  team   •  Avoid  procrastination  and  immobilization     Where  to  start?  
  14. 14. Courtesy  of  Martin  Donnelly   Research  Support   Office   Data  Library  /  Repository   Researcher   DMP?   Unruly   Data   Computing   Support   Faculty  Ethics   Committee   Etc...  
  15. 15. Who:  Support  Services  &   Collaborators   Plan  section   Support   Information  about  data   PI,  co-­‐PIs,  research  staff   Metadata  content  &  format   Librarians,  data  repositories   Policies  for  access,  sharing,   reuse   Funder,  institute,  HIPPA,  IRB,   users   Long-­‐term  storage  and  data   management   Librarians,  IT  staff,  data   repositories   Budget   Sponsored  programs  office,   funder  
  16. 16. DMPs:  A  Short  History   Liz  Lyon:  Dealing   with  Data     2008   UK  funder  expectations   2009   2009-­‐10  
  17. 17. DMPs:  A  Short  History   Across  the  Pond…   Federal  Funding  Accountability   and  Transparency  Act   2006     2010  – present     2010  
  18. 18. NSF  DMP  Requirements   From  Grant  Proposal  Guidelines:    DMP  supplement  may  include:   1.  the  types  of  data,  samples,  physical  collections,  software,  curriculum   materials,  and  other  materials  to  be  produced  in  the  course  of  the  project   2.   the  standards  to  be  used  for  data  and  metadata  format  and  content   (where  existing  standards  are  absent  or  deemed  inadequate,  this  should  be   documented  along  with  any  proposed  solutions  or  remedies)   3.   policies  for  access  and  sharing  including  provisions  for  appropriate   protection  of  privacy,  confidentiality,  security,  intellectual  property,  or  other   rights  or  requirements   4.   policies  and  provisions  for  re-­‐use,  re-­‐distribution,  and  the  production  of   derivatives   5.   plans  for  archiving  data,  samples,  and  other  research  products,  and  for   preservation  of  access  to  them  
  19. 19. 1.  Types  of  data  &  other  information   •  Types  of  data  produced   •  Relationship  to  existing  data   •  How/when/where  will  the  data  be  captured  or   created?   C.  Strasser   •  How  will  the  data  be  processed?   •  Quality  assurance  &  quality  control  measures   •  Security:  version  control,  backing  up   •  Who  will  be  responsible  for  data  management   during/after  project?   From  Flickr  by  Lazurite  
  20. 20. 2.  Data  &  metadata  standards   •  What  metadata  are  needed  to  make  the  data  meaningful?   •  How  will  you  create  or  capture  these  metadata?   •  Why  have  you  chosen  particular  standards  and  approaches   for  metadata?  
  21. 21. 3.  Policies  for  access  &  sharing   4.  Policies  for  re-­‐use  &  re-­‐distribution   •  Are  you  under  any  obligation  to  share  data?     •  How,  when,  &  where  will  you  make  the  data  available?     •  What  is  the  process  for  gaining  access  to  the  data?     •  Who  owns  the  copyright  and/or  intellectual  property?   •  •  •  •  •  •  Will  you  retain  rights  before  opening  data  to  wider  use?  How  long?   Are  permission  restrictions  necessary?   Embargo  periods  for  political/commercial/patent  reasons?     Ethical  and  privacy  issues?   Who  are  the  foreseeable  data  users?   How  should  your  data  be  cited?  
  22. 22. 5.  Plans  for  archiving  &  preservation   •  What  data  will  be  preserved  for  the  long  term?  For  how  long?       •  Where  will  data  be  preserved?   •  What  data  transformations  need  to  occur  before   preservation?   •  What  metadata  will  be  submitted   alongside  the  datasets?   •  Who  will  be  responsible  for  preparing   data  for  preservation?  Who  will  be  the   main  contact  person  for  the  archived   data?   From  Flickr  by  theManWhoSurfedTooMuch  
  23. 23. Don’t  forget:  Budget   •  Costs  of  data  preparation  &  documentation   Hardware,  software   Personnel   Archive  fees   •  How  costs  will  be  paid     Request  funding!  
  24. 24. *   NSF’s  Vision DMPs  and  their  evaluation  will  grow  &   change  over  time     Peer  review  will  determine  next  steps   Community-­‐driven  guidelines     Evaluation  will  vary  with  directorate,   division,  &  program  officer     *Unofficially  
  25. 25. A  DMP  Example  (1)   •  •  Project  name:  Effects  of  temperature  and  salinity  on  population  growth  of  the  estuarine   copepod,  Eurytemora  affinis   Project  participants  and  affiliations:     Carly  Strasser  (University  of  Alberta  and  Dalhousie  University)   Mark  Lewis  (University  of  Alberta)   Claudio  DiBacco  (Dalhousie  University  and  Bedford  Institute  of     Oceanography)   •  Funding  agency:  CAISN  (Canadian  Aquatic  Invasive  Species  Network)     •      •  Description  of  project  aims  and  purpose:   •    We  will  rear  populations  of  E.  affinis  in  the  laboratory  at  three  temperatures  and  three   salinities  (9  treatments  total).  We  will  document  the  population  from  hatching  to  death,   noting  the  proportion  of  individuals  in  each  stage  over  time.  The  data  collected  will  be  used   to  parameterize  population  models  of  E.  affinis.  We  will  build  a  model  of  population  growth   as  a  function  of  temperature  and  salinity.  This  will  be  useful  for  studies  of  invasive  copepod   populations  in  the  Northeast  Pacific.     •  Video  Source:  Plankton  Copepods.  Video.    Encyclopædia  Britannica  Online.    Web.  13  Jun.  2011    
  26. 26. A  DMP  Example  (2)   •  •  •  1.  Information  about  data      Every  two  days,  we  will  subsample  E.  affinis  populations  growing  at  our   treatment  conditions.    We  will  use  a  microscope  to  identify  the  stage  and   sex  of  the  subsampled  individuals.    We  will  document  the  information  first   in  a  laboratory  notebook,  then  copy  the  data  into  an  Excel  spreadsheet.  For   quality  control,  values  will  be  entered  separately  by  two  different  people  to   ensure  accuracy.    The  Excel  spreadsheet  will  be  saved  as  a  comma-­‐ separated  value  (.csv)  file  daily  and  backed  up  to  a  server.  After  all  data  are   collected,  the  Excel  spreadsheet  will  be  saved  as  a  .csv  file  and  imported   into  the  program  R  for  statistical  analysis.  Strasser  will  be  responsible  for  all   data  management  during  and  after  data  collection.      Our  short-­‐term  data  storage  plan,  which  will  be  used  during  the   experiment,  will  be  to  save  copies  of  1)  the  .txt  metadata  file  and  2)  the   Excel  spreadsheet  as  .csv  files  to  an  external  drive,  and  to  take  the  external   drive  off  site  nightly.    We  will  use  the  Subversion  version  control  system  to   update  our  data  and  metadata  files  daily  on  the  University  of  Alberta   Mathematics  Department  server.  We  will  also  have  the  laboratory  notebook   as  a  hard  copy  backup.  
  27. 27. A  DMP  Example  (3)   •  2.      Metadata  format  &  content   •    We  will  first  document  our  metadata  by  taking  careful  notes  in  the  laboratory   notebook  that  refer  to  specific  data  files  and  describe  all  columns,  units,   abbreviations,  and  missing  value  identifiers.    These  notes  will  be  transcribed  into   a  .txt  document  that  will  be  stored  with  the  data  file.    After  all  of  the  data  are   collected,  we  will  then  use  EML  (Ecological  Metadata  Language)  to  digitize  our   metadata.  EML  is  on  of  the  accepted  formats  used  in  Ecology,  and  works  well  for  the   type  of  data  we  will  be  producing.  We  will  create  these  metadata  using  Morpho   software,  available  through  the  Knowledge  Network  for  Biocomplexity  (KNB).  The   documentation  and  metadata  will  describe  the  data  files  and  the  context  of  the   measurements.  
  28. 28. A  DMP  Example  (4)   3.    Policies  for  access,  sharing  &  reuse   •    We  are  required  to  share  our  data  with  the  CAISN  network  after  all  data   have  been  collected  and  metadata  have  been  generated.  This  should  be  no   more  than  6  months  after  the  experiments  are  completed.    In  order  to  gain   access  to  CAISN  data,  interested  parties  must  contact  the  CAISN  data   manager  (  or  the  authors  and  explain  their  intended  use.       Data  requests  will  be  approved  by  the  authors  after  review  of  the  proposed   use.       •    The  authors  will  retain  rights  to  the  data  until  the  resulting  publication  is   produced,  within  two  years  of  data  production.    After  publication  (or  after   two  years,  whichever  is  first),  the  authors  will  open  data  to  public  use.    After   publication,  we  will  submit  our  data  to  the  KNB  allowing  discovery  and  use   by  the  wider  scientific  community.  Interested  parties  will  be  able  to   download  the  data  directly  from  KNB  without  contacting  the  authors,  but   will  still  be  encouraged  to  give  credit  to  the  authors  for  the  data  used  by   citing  a  KNB  accession  number  either  in  the  publication’s  text  or  in  the   references  list.  
  29. 29. A  DMP  Example  (5)   4.      Long-­‐term  storage  and  data  management    The  data  set  will  be  submitted  to  KNB  for  long-­‐term  preservation  and   storage.    The  authors  will  submit  metadata  in  EML  format  along  with   the  data  to  facilitate  its  reuse.  Strasser  will  be  responsible  for   updating  metadata  and  data  author  contact  information  in  the  KNB.         •  5.    Budget     •    A  tablet  computer  will  be  used  for  data  collection  in  the  field,  which   will  cost  approximately  $500.    Data  documentation  and  preparation   for  reuse  and  storage  will  require  approximately  one  month  of  salary   for  one  technician.  The  technician  will  be  responsible  for  data  entry,   quality  control  and  assurance,  and  metadata  generation.  These  costs   are  included  in  the  budget  in  lines  12-­‐16.  
  30. 30. From  Flickr  by  thewmatt  
  31. 31. Roadmap   Background   Current  DMP  tools   A  walkthrough  of   DMPTool   From  Flickr  by    (Luciano)   DMPTool2  
  32. 32. DMPonline:   Step-­‐by-­‐step  wizard  for  generating  DMP   Create  |  edit  |  re-­‐use  |  share  |  save  |  generate     Open  to  community    
  33. 33. From  Flickr  by  Max  Chu                    
  34. 34. DMPTool  Project   •  Partners  started  working  in  January  2011   •  Developed  requirements,  divided  work   •  Self-­‐funded  /  In-­‐kind  
  35. 35.                     •  Free     •  Guides  through  creating  a  DMP   •  Helps  meet  funder  requirements     •  Supplies  questions     •  Includes  explanation/context  provided  by   the  agency   •  Provides  links  to  the  agency  website    
  36. 36. Wait!   Data  management  planning   is  complex  &  requires  dialog     Range  of  support  &   understanding     Our  focus:     •  simplify  &  scale  the  common  parts   •  develop  community   •  provide  incremental  improvement   in  functionality   From  Flickr  by  ChrisGoldNY  
  37. 37. Roadmap   Background     Current  DMP  tools   A  walkthrough  of   DMPTool   From  Flickr  by    (Luciano)   DMPTool2  
  38. 38. Access   DMPTool  can  be  added  to   campus  single  sign-­‐on   service   Researchers  use  campus   login  to  access  tool   From  Flickr  by  Clonny   Researchers   like  it  here  
  39. 39. Institution-­‐specific…   •  Help  text   •  Links  to  resources  &  services   •  Suggested  answers     …at  different  levels   •  All  DMPs   •  All  DMPs  for  a  particular   funding  agency   •  Question  within  a  data   management  plan     Customized   Resources   From  Flickr  by  lumachrome  
  40. 40. From  Flickr  by  OZinOH   DMPTool2    
  41. 41. DMPTool  Uptake   1000   900   6000   800   5000   700   600   4000   500   3000   400   300   2000   1000   0   Unique  Users   Plans   Institutions   200   100   0   Number  of  Institutions   Number  of  Plans  (solid)  &  Unique  Users  (dashed)   7000  
  42. 42. DMPTool  2:     Responding  to  the  Community     ming   Co oon!   S
  43. 43. 2 Plan   Creators   &   Plan   Administrators  
  44. 44. 2 Improvements  for  Plan  Creators   •  Collaborative  plan  creation   •  Role-­‐based  user  authorization  &  access   •  Better  plan  templates  &  resources  
  45. 45. 2 New  administrator  Interface   •  Template  creation:   •  Better  plan  template  granularity        discipline,  funder,  question   What  tgranularity   •  Better  institution  his  means  for  plan   creators:      department,  college,  lab  group,  …   •  Better  plans   •  Enhanced  search  and  browse  of  plans   •  Access  to  m•  More  granular  help   etrics  for  reporting  &  follow-­‐up   •  Local  input  &  assistance  
  46. 46. 2 Open  RESTful  API   ?  
  47. 47. Yelp   APIs   Google  Maps  
  48. 48. DMPTool   API   Other  stuff   •  Carefully  thought  out  code     •  Invisible  to  user   •  Expose  specific  functionality   and/or  data   •  Other  functionality/data   protected  
  49. 49. API  Benefits   Interactions   Improve  functionality   Add  more  functionality   Combine  with  their  services     Popularity  
  50. 50. IMLS  Grant   Improving  Data   Stewardship  with  the   DMPTool   Provide  librarians  with  the     tools  and  resources     to  claim  the  data  management  education  space  
  51. 51.  
  52. 52.  
  53. 53. Questions?   Website   Twitter   Blog   @TheDMPTool     Email   Tweet  me   My  slides   @carlystrasser