Sally Rumsey, Janet McKnight, James A.J. Wilson - Research data management for the humanities: a non-Procrustean infrastructure

  • 1. Research  data   management  for   the  humani2es:     a  non-­‐Procrustean   infrastructure   James  A.  J.  Wilson   Sally  Rumsey   Janet  McKnight   University  of  Oxford   hCp://  File:Theseus_Prokroustes_Staatliche_An2kensammlungen_2325.jpg   hCp://  
  • 2. Procrustes   “a  brigand  who  lived  between  Eleusis  and   Athens.  Having  overcome  his  vic2ms  he  would   force  them  to  lie  down  on  a  bed,  or  on  one  of   two  beds;  if  they  were  too  short,  he  would   hammer  them  out  or  rack  them  with  weights  to                  fit  the  longer  bed,  if  too                    tall  he  would  cut  them  to                  fit  the  shorter.  Theseus                    disposed  of  him  in  like                    manner.”   Oxford  Classical  Dic2onary  
  • 3. Oxford  RDM  Projects   •  EIDCSR  (2009-­‐10)   –  Heart  imaging  scien2sts   –  Computa2onal  biologists   •  SUDAMIH  (2010-­‐11)   – Humani2es  researchers   •  DAMARO  (2012-­‐2013)   –  All  academic  Divisions  and   disciplines  
  • 4. Oxford  RDM  Principles   •  Modular   –  Different  business  models  for   different  components   –  May  be  extended  (or  reduced)   •  Researcher-­‐focused   –  Caters  for  different  disciplines  and   working  prac2ces   •  Intra-­‐ins2tu2onal   –  Requires  input  from  mul2ple  support   departments  and  Academic  Divisions  
  • 5. Humani2es  research  data   •  Difficult  to  define  what  cons2tutes  ‘data’   •  Extremely  diverse   •  Value  tends  not  to  depreciate  over  2me     •  Tends  to  be  compiled  from  exis2ng  sources,  not  created  from   scratch   –  Frequently  incomplete  or  inconsistent  due  to  inconsistent  sources   –  Frequently  par2al  or  specific  according  to  research  focus   –  Frequently  involves  interpreta2on  and  assessment   •  Some2mes  not  in  op2mal  format  for  analysis   •  Life’s  work  –  projects  frequently  build  on  earlier  projects   •  Hard  to  generalize!   •  Many  issues  not  restricted  to  the  humani2es    
  • 6. Humani2es  data  formats   •  95%  work  with  textual  data   •  45%  with  images   •  48%  use  tables  or  spreadsheets   •  23%  use  rela2onal  databases   •  6%  use  XML  text  mark-­‐up   0%   20%   40%   60%   80%   100%   0%   20%   40%   60%   80%   100%   How  are  your  data   stored  or  structured?   What  kinds  of  data  do   you  work  with?   Based  on  2012  Survey  responses  from  researchers  working  with  data:  
  • 7. Humani2es  Data  Research  Prac2ces   •  Least  likely  to  conduct  research  as  part  of   a  team   –  idiosyncra2c  prac2ces   –  limited  sharing  of  best  prac2ce   •  Least  likely  to  be  externally  funded   •  Least  likely  out  of  all  academic  divisions  to   describe  RDM  as  ‘essen2al’  to  their   research  (49%)   •  Least  likely  to  have  deposited  data  in  a   data  repository   •  Lowest  awareness  of  Oxford’s  RDM  Policy   •  73%  happy  (at  least  in  theory)  to  freely   share  at  least  some  of  their  research  data   (2nd  most  open  aper  MPLS)   0% 50% 100% Humanities Mathematic al, Physical and Life Sciences Medical Sciences Social Sciences As part of a team, with our research data managed by the team As part of a team, but each member of the team looks after their own data As an individual Some of my research is undertaken as part of a team, but I also conduct some research independently Do  you  conduct  your  research  as  part   of  a  team  or  as  an  individual?   Based  on  2012  Survey  responses  from  researchers  working  with  data:  
  • 8. Conclusions  for  the  Ins2tu2on   •  Humani2es  researchers  amongst  hardest  to  reach   •  Need  to  offer  long-­‐term  cura2on   •  Need  to  encourage  cultural  change  through   training  and  support   –  Par2cularly  improving  documenta2on  and  spreading   good  prac2ce   •  Few  requirements  unique  to  Humani2es,  however   •  Need  to  offer  flexible  RDM  solu2ons   –  whilst  also  focusing  first  on  most  widely  shared   problems  across  disciplines  
  • 9. Research  data   services  for  the   University  of   Oxford   ©  Sally  Rumsey  
  • 10. Taking  advantage  of  Bodleian  exper2se   Collec2on   Reten2on   Cura2on   hollin.elizabeth    hCp:// photos/44904853@N00/2560177858*   *  This  work  is  licensed  under  a  Crea2ve  Commons   ACribu2on-­‐NonCommercial-­‐NoDerivs  2.0  UK:   England  &  Wales  License.   ©  Francis  Rumsey  Sally  Rumsey  
  • 11. Bodleian:  discovery  and  finding  aids     Steve  Hankins  hCp:// *   *  hCp://   hCp://   Towards  a  Union  Catalogue  of   Correspondence:  Early  Modern   LeCers  Online  hCp://  
  • 12. Access  to  collec2ons   physically  and  increasingly  remotely    
  • 13. Remit    
  • 14. Increasingly   cross-­‐ disciplinary    
  • 15. Consultancy   services:     Bodleian   text   technologies   TCP partners have used this corpus to: c o ar ed T ’ ec r ex T ex w o re o rc crea z a er o wor e a o e of war f roc a da The Text Creation Partnership is a significant data set for innovative digital humanities research f c e TC re “With titles on subjects ranging from literature to geography, diplomacy to slavery, poetry to science, it will be, without question, the most important digital resource ever created for the study of the early modern period.” Stephen Ramsay, Associate Professor of English, University of Nebraska-­Lincoln
  • 16. Consultancy   services:   Bodleian   metadata   exper2se    
  • 17. Consultancy   services:   Bodleian   digital   preserva2on  hCp://  kxtells    Some   rights  reserved  hCp://­‐nd/2.0/   hCp://  1SarahSmith  hCp://­‐nd/2.0/deed.en_GB     hCp:// 17425845@N00/9373725762/  Windell  Oskay  hCp://    
  • 18. Oxford  Research  Data  Chain   !"#$%&'%() !*+*,+*-() !*+*.*%/) !*+*0'%1(2) #34&'5*67%) 2(879'+72:) !'2(567%)7;)<(+*1*+*) #27=(5+)'%'6*67%) >2*%+)*?*21(1) @A'B(C)1*+*) A75*&)9+72() A'%/9)+7D;27<) 834&'5*67%9) !'957B(2:) E'+*67%) F25G'B*&)9+72*-() #34&'9G(1)B(29'7%)
  • 19. RDM  services:  IT  Services  
  • 20. •  Research  data  archive,  discovery  [&  access]   •  Building  a  flexible  solu2on   •  DOIs  –  cita2on   •  Preserva2on  for  long  term  access   •  Located  with  Bodleian  digital  collec2ons  
  • 21. Oxford  DataBank  environment   Metadata   Data  input   Manual   Mediated   Harvested   External  store   Applica/on   A   Applica/on   B   Applica/on   C   Applica/on   D   What  ques2ons  do  I   want  to  answer  using  my   data?   Digi2sed  and   born-­‐digital   Experimental  and   non-­‐experimental  
  • 22. Item  types   Oxford  Examina2on  Schools  hCp://     No  known  copyright  restric2ons  Cornell  University  Library   Images  –  s2ll  &  moving   hCps://  
  • 23. Reproduced with kind permission of the Boethius Commentary Project, Funded by The Leverhulme Trust, 2007-12, and based at the Faculty of English, University of Oxford cernebat  :  et  P3;  .i.  inquirebat  A  B1  C4  Ge  O  P  P9;     men2s  intuitu  pulori(!)  V4;  suo  acumine  F2  Ma  M1  P7  P9  T  V5   rosei  lumina  solis  :  astronomicam  ra2onem  A  C4  Ge  O  P  P9;     uel  splendorem  ius22ae  K1  P7  T;   uel]  om.  K1  T.   solis  uel  lunae  defectus.  et  hoc  peryfrasin  dictum  M2;   astronomicam  ra2onem  nam  et  sol  unus  est  ex  vii.  plane2s  B1   rosei:  rubicundi  B  P3;  crocei  V5;  pulchri  F2  K1  P9  T;  epitethon(!)  Es;   solis  pulchri  Ma;   rosei  rubicundi  siue  pulchri  quia  roseum  ponitur  saepe     pro  pulchre  A  C4  Ge  O;     rosei]  rosei  .i.  O;  rosei  et  croceum/  A.saepe]  om.  O.  pulchre]  pulchro  Ge  O.   roseum  et  croceum  pro  pulchro  accipiuntur  F2  Ma  M1  P7  P9  T  V4;   accipiuntur]  gloss  wri<en  over  by  late  hand  F2.     pulcri  quia  rosei  coloris  est  in  suo  ortu  P7  
  • 24. Fritzi  Scheff  (1879-­‐1954),  Vienna-­‐born  American  vocalist  hCp://  No   known  copyright  restric2ons   Audio   hCps:// Tick1AudioCorpus  
  • 25. Packages   •  Makes  DataBank  flexible   •  Ideal  for  data   •  Bundle  different  files   together   –  Metadata   –  Licence   –  Read  me   –  Sopware   •  Unpack  zip  and  other   types  of  compound  files  
  • 26. Metadata  describing  data  for  Oxford   data  services   •  Sources:  manually  entered  and  harvested   •  Data  cita2on     •  Person  [unique  ID]   •  Geo-­‐coordinates     •  Any  metadata  schema  can  be  uploaded   •  Subject  headings  (FAST)  &  keywords   •  Link  publica2ons  and  data     •  Other  related  works  
  • 27. DataReporter   •  Will  generates   standard  reports     –  Ins2tu2onal  and   departmental  reports     –  Click-­‐throughs  &   downloads     –  Personal  data   publica2on  reports     –  Records  lacking  key   metadata     –  Sta2s2cs  for  REF     •  Admin-­‐only  in  first   instance  
  • 28. Conclusion   We  believe  Oxford   humani2es  will  be  well   served  by  the  Oxford   model     The  Bodleian  Libraries.   MS.  Arch.  Selden  B.  26  
  • 29. Janet  Fell   •  Also  one  of  you   •  Main  objec2ves:   •  Test  refine  guidelines  &  procedures   •  Sort  out  data   •  Ingest  data   •  Examine  processes   •  Have  humani2es  data  in  Bod  archival  data  store   •  Data  management  –  planning  ahead   •  The  humani2es  projects  included  in  the  work  –  variety;  working   across  disciplines   •  “assessthestrengthsandweaknessesophecurrent   •  arrangementsforArtsandHumani2esdatacura2onandsharing”  from   RDMF  website  
  • 30. DHARMa:  Digital  Humani2es   Archives  for  Research  Materials   Enabling  Digital  Humani2es  research   through  effec2ve  data  preserva2on  
  • 31.  Direc2on  of  travel   Surveying  the  landscape   Humani2es   research  prac2ces,   funding   requirements,  etc.   Building  the  infrastructure   IT  systems,  planned   processes  and   workflows   Roads  and  roadmaps   Real  processes,   instruc2ons,   guidelines,  and   human  guides   through  the  maze    
  • 32. Where  we’re  going   •  Outline  the  workflow   •  Dive  into  the  data   •  Ingest  into  DataBank   •  Use  what  we’ve   learned   •  Plan  for  the  future   Photo:  Astolath   hCp://  
  • 33. Finding  our  way   Before  you  impose  a  workflow  on  someone,   you  should  walk  a  mile  in  their  shoes…   Photo:  juggzy_malone   hCp://  
  • 34. Variety  is  the  spice  of  life   Variety  of  projects:   •  Different  stages   •  Different  sizes  and  scales   •  Different  materials   •  Different  subject  areas     Across  departments:   •  Bodleian  Libraries   •  IT  Services   •  Humani2es  Division  /  TORCH   •  Research  Services   •  Facul2es  and  departments   Photo:  Joanna  Bourne   hCp://  
  • 35. Poten2al  problems   •  Mo2va2on   •  Ownership   •  Confusion!   •  Grey  areas   •  Sustainability   Photo:  Manic  Street  Preacher   hCp://  
  • 36. How  we’ll  know  we’ve  got  there     Principal  outputs  will  be:   •  Comprehensive  guidelines  and  procedures   •  A  strengthened  set  of  DH  projects   •  An  exemplary  archive  of  data  in  DataBank   Photo:  jayneandd   hCp://  
  • 37. But  also…     •  BeCer  communica2on   •  BeCer  networks   •  BeCer  sharing  of   knowledge  
  • 38. Thank  you!   •  DHARMa  Project hCp://   •  
  • 39. Ques2ons  
  • 40. Linking  publica2ons  and  data