Dc sheridan dlf_2011_final

803 views
734 views

Published on

Digital Library Federation presentation of Data Conservancy and Johns Hopkins University Sheridan Libraries Data Management Services (JHU DMS).

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
803
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
5
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Dc sheridan dlf_2011_final

  1. 1. Johns  Hopkins  University  Data  Management  Services   Sayeed  Choudhury  and  Barbara  Pralle   Digital  Library  Federation  –  October  31,  2011   Johns  Hopkins  University  Sheridan  Libraries  
  2. 2. Powered  by  Data  Conservancy  •  JHU  Data  Management  Service  (DMS)   represents  the  culmination  of  two  years  of   research,  design,  development  and   implementation  of  Data  Conservancy  •  Service  launched  in  July  2011  •  DC  instance  launched  in  October  2011  •  Important,  essential  foundations  in  place  •  There  remains  work  to  be  done   Johns  Hopkins  University  Sheridan  Libraries  
  3. 3. Data  Conservancy  •  The  Data  Conservancy  has  developed  a  blueprint   for  institutions  that  view  scientific  data  curation   as  a  means  to  collect,  organize,  validate  and   preserve  data  so  that  scientists  can  find  new   ways  to  address  the  grand  research  challenges   that  face  society.    
  4. 4. Technology-­‐related  Accomplishments    •  Preservation-­‐based,  flexible  data  model   (inspired  by  PLANETS  project)  •  Demonstration  of  modularity  through  Archival   Storage  framework  •  Feature  Extraction  Framework  •  Demonstration  of  interoperability  with  NSIDC   glacier  photo  service,  arXiv,  IVOA  and  Sakai  •  Development  of  DCS  instance   Johns  Hopkins  University  Sheridan  Libraries  
  5. 5. Notion  of  the  DCS  Instance  (1)  •  An  instance  of  the  DCS  (Data  Conservancy  System)   software  stack  •  A  deployment  infrastructure  (hardware,  servers,  etc.)   for  the  DCS  •  A  defined  (sub)set  of  DCS  services  to  be  exposed/ provided  •  A  defined/understood/expected  “context”  to  be   addressed  (e.g.  science  domain,  institutional  domain,     etc.)   5
  6. 6. Notion  of  the  DCS  Instance  (2)  •  A  local  policy  framework  in  place  for  the  System   Operation  •  Personnel/staffing  to  setup,  manage  and  support   the  continued  operation  of  the  system  •  A  sustainability/business  plan  in  place  to  operate   the  instance  post  NSF  funding  (cost  model,  user   base,  etc.)  •  Key  integrators  such  as  spatial,  temporal  and   taxonomic  queries  or  data  replication   6
  7. 7. Definition  of  Data  Preservation  •  “Data  preservation  involves  providing  enough   representation  information,  context,  metadata,   fixity,  etc.  such  that  someone  other  than  the   original  data  producer  can  use  and  interpret  the   data.”   -  Ruth  Duerr,  National  Snow  and  Ice  Data  Center   7
  8. 8. Architecture  mapped  to  OAIS  Open  Archival  Information  System   Functional  Entities   Data  Conservancy  Service   Architecture  Block  Diagram   Johns  Hopkins  University  Sheridan  Libraries  
  9. 9. Data  Model:  Research  •  Versions  vs.  (format)   migration   -  Modification  of  significant   properties  versus   transformation  of  data  •  Replication  and  provenance   (verifiable  snapshots)  •  Layered  data  model   -  Preservation  view,  Persistence   view,  Information  view  
  10. 10. Data  Model:  Application  •  Multiple  Data  Models  •  Content  models  for   describing  the  contents  of  a   Manifestation  •  General  Model  used  to   correlate  model  entities   across  heterogeneous   datasets   -  geo-­‐reference,  time  of   observation,  etc…  
  11. 11. Feature  Extraction  Framework:  Design  •  Must  accommodate  a   variety  of  data  formats  •  No  assumption  made   regarding  the  form  of  data   input  or  output  •  Not  coupled  to  a  specific   execution  model  
  12. 12. Feature  Extraction  Framework:  Application  •  Subsetting   -  Returning  a  portion  of  a   dataset  •  Indexing   -  Output  suitable  for  indexing  by   the  Query  Framework  •  Workflows   -  Process  Orchestration,   Meandre,  Taverna,  Kepler  •  Execution  environment  for   analysis   -  Stateless  Mappings  basis  for   MapReduce  
  13. 13. Implementation:  Archival  Storage  •  Responsible  for  long  term  storage    of  content  and  metadata  (AIP)   -  Evolving  storage  technologies     (including  cloud)   -  Policy  implementation  (e.g.  multiple  copies)  •  Archival  Storage  API  abstracts   underlying  technology   -  Fedora  (object  based),  ELM  (file  based)  •  Persistent  Storage  Framework  (Y2)   -  Instrumented  file  systems   -  File  semantics  over  block,  cloud,  content-­‐ addressed  storage   -  Storage  services  (e.g.  integrity  checking)  
  14. 14. Data  Management  Layers  Layers   Examples   Implication  for  PI   Implication  relative   to  NSF  Curation   Future  JHU  Data  Archive   •  Feature  Extraction   •  Competitive   and  other  DCS  instances   •  New  query   advantage   capabilities   •  New   •  Cross-­‐disciplinary   opportunities  Preservation   JHU  Data  Archive   •  Ability  to  use  own   •  Satisfies  NSF   Portico   data  in  the  future   needs  across   ICPSR   (e.g.  5  yrs)   directorates   •  Data  sharing      Archiving   CUAHSI   •  Provides  identifiers   •  Could  satisfy   NEES   for  sharing,   most  NSF   Dataverse   references,  etc.   requirements  Storage   Server  in  Lab   •  Responsible  for:   •  Could  be  enough   Website   •  Restore   for  now  but  not   Amazon  S3   •  Sharing   near-­‐term  future   •  Staffing   Johns  Hopkins  University  Sheridan  Libraries  
  15. 15. Defining  Sustainability   •  “Ensuring  that  valuable   digital  assets  will  be   available  for  future  use  is   not  simply  a  matter  of   finding  sufficient  funds.  It   is  about  mobilizing   resources—human,   technical,  and  financial— across  a  spectrum  of   stakeholders  diffuse  over   both  space  and  time.”  
  16. 16. Establishing  the  JHU  DMS    •  May  2010  NSF  announces  DMP  expectations  •  Services  incubated  and  scoped  summer/fall  2010   -  Build  on  Data  Conservancy  expertise    •  Proposed  in  January  and  launched  in  July  2011   -  Consultative  data  management  planning  services  to   support  NSF  proposals   -  Post  award  data  management  services   Johns  Hopkins  University  Sheridan  Libraries  
  17. 17. Resources  Needed  for  Launch    •  Talented  team  and  expertise  •  Flexible  process  •  Tools  that  support  service  provision  •  Systems  developed  by  the  DC  to  manage  data  •  Marketing  and  outreach  partners  •  Mechanism  for  supporting  financial  model   Johns  Hopkins  University  Sheridan  Libraries  
  18. 18. People  and  Expertise    •  Create  data  management  consultant  position    •  Position  description  recognizes  diversity  of   experience  and  talent  that  can  support  service  •  Recruited  people  with  skills  that  match  with  the   JHU  DMS  objectives  and  services  •  Knowledge  transfer  through  hands  on  work  and   interactions  with  DC  partners   Johns  Hopkins  University  Sheridan  Libraries  
  19. 19. Day  in  the  Life  of  a  JHU  DM  Consultant  (1)  •  Consultative  process  emulates  a  reference   interview  •  Adapt  to  PI  timeframe  and  deadlines  •  Gather  of  information,  identify  gaps,  understand   options,  prepare  and  iterate  plan  •  Support  better  data  management  by   encouraging  systematic  cultural  change  through   PI  interactions   Johns  Hopkins  University  Sheridan  Libraries  
  20. 20. Day  in  the  Life  of  a  JHU  DM  Consultant  (2)  •  In-­‐depth  planning  after  award  received  •  Helping  PI  understand  service  levels  within  JHU   Data  Archive  •  Planning  and  preparing  data  for  deposit  •  Reviewing  data  management  plan  and   identifying  solutions  for  next  phase  of  data   management   Johns  Hopkins  University  Sheridan  Libraries  
  21. 21. Challenges  •  Timing  of  consultative  support,  deadlines,  and   marketing  of  service  •  Building  awareness  of  data  management  value  •  Establishing  common  vocabulary  ex.  One  PIs   ‘storage’  is  another  PIs  ‘archiving’  •  Condensing  key  information  into  two  pages  •  Navigating  different  data  retention  policies  •  Responding  to  widely  ranging  domains   Johns  Hopkins  University  Sheridan  Libraries  
  22. 22. Opportunities  •  Grow  the  researcher/grad  student   understanding  of  data  management   -  Support  good  stewardship  of  data  across  institution  •  Establish  an  archive  specifically  designed  for   data,  enabling  future  discovery  and  use    •  Build  our  collective  expertise  in  data   management   Johns  Hopkins  University  Sheridan  Libraries  
  23. 23. Acknowledgements  and  Resources  •  NSF  Award  OCI-­‐0830976  •  Sheridan  Libraries  financial  support  •  Johns  Hopkins  University  financial  support  •  Elliot  Metsger  for  infrastructure  slides  •  Tim  DiLauro  for  inspiration  about  layers  •  Data  Conservancy  colleagues  for  their  exceptional  work   and  patience  •  http://dataconservancy.org  •  http://dmp.data.jhu.edu   Johns  Hopkins  University  Sheridan  Libraries  

×