NISO Webinar: Part 2: Managing Data for Scholarly Communications


Published on

The explosion of data creation across all scholarly disciplines necessitates corresponding efforts to create new solutions for its management and use. Ever-growing repositories and datasets within require organization, identification, description, publication, discovery, citation, preservation, and curation to allow these materials to realize their potential in support of data-driven, often interdisciplinary research. What infrastructures and technical environments are required for this work? Can new approaches, specifications, standards and best practices be created? Are there partnerships and collaborations that exist or can be pursued? This webinar, Part 2 of a two-part NISO series on data, will explore these and other questions

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NISO Webinar: Part 2: Managing Data for Scholarly Communications

  1. 1. Managing Data forScholarly Communications PART 2: Technical Management October 19, 2011Speakers: Joan Starr, Mark McFarland, and MacKenzie Smith
  2. 2. Dataset  Iden*fica*on  &  Cita*on:   DataCite  and  EZID   Joan  Starr   California  Digital  Library   October,  2011  
  3. 3. Dataset  Iden*fica*on  &  Cita*on  Introduc*on  The  Researchers’  Challenge   Iden*fiers  are  a  tool  for  researchers  DataCite   “Helping  you  find,  access  and  reuse  data.”  EZID   Easy  crea*on  and  management  of  DataCite  DOIs  and  other   iden*fiers.  Next  steps      For  DataCite,  EZID  and  you!  
  4. 4. California  Digital  Library  (CDL)  
  5. 5. The  Researchers’  Challenge  
  6. 6. Early  in  the  research  life  cycle  Data-­‐intensive  research   +   Wri*ng  up  the  results   Where’s   the  data?   What  if    I   move  it?   PERSISTENT  IDENTIFIERS   make  the  difference   by  Dave  Rogers  hWp://­‐rogers/2815036285/  
  7. 7. Working  on  a  federated  team   Data-­‐intensive  research   +   Regional  research  center   +   Aging  infrastructure   Where’s   We  have  to   the  data?   move  it!   PERSISTENT  IDENTIFIERS   make  the  difference  ©All  rights  reserved  by  University  of  California,  hWp://  
  8. 8. Making  a  career  move  •  Data-­‐intensive  research   +   •  Researcher(s)  on  the   move   I  know   where  my   data  is   and  I’m   taking  it   with  me!   PERSISTENT  IDENTIFIERS   make  the  difference   ©All  rights  reserved  by  University  of  California,     hWp://  
  9. 9. Mee*ng  funder  requirements  •  Data-­‐intensive  research   +   •  Grantor  requirements   for  data  management   What  do  we   plan   put  here?   How  do  we   track  the  data?   PERSISTENT  IDENTIFIERS   make  the  difference   By  David  Mellis,  hWp://  
  10. 10. DataCite  German  Na8onal  Library  of  Economics  (ZBW)       Canada  Ins8tute  for  Scien8fic  and  Technical  Informa8on  German  Na8onal  Library  of  Science  and  Technology  (TIB)     (CISTI)  German  Na8onal  Library  of  Medicine  (ZB  MED)   Technical  Informa8on  Center  of  Denmark  GESIS  -­‐  Leibniz  Ins8tute  for  the  Social  Sciences,  Germany     Ins8tute  for  Scien8fic  &  Technical  Informa8on  (INIST-­‐Australian  Na8onal  Data  Service  (ANDS)   CNRS),  France    ETH  Zurich,  Switzerland   TU  DelS  Library,  The  Netherlands     The  Swedish  Na8onal  Data  Service  (SNDS)   The  Bri8sh  Library  ,  UK   California  Digital  Library  (CDL),  USA     Office  of  Scien8fic  &  Technical  Informa8on  (OSTI),  USA     Purdue  University  Library  
  11. 11. DataCite  Metadata  V.  2.2  •  Small  required  set  =  cita*on  elements  •  Op*onal  descrip*ve  set:   –  extendable  lists   –  can  refer  to  other  standards,  schemes   –  domain-­‐neutral   –  rich  ability  to  describe  rela*onships  to  other   digital  objects  •  Metadata  Search  (MDS)  is  full-­‐text  indexed    
  12. 12. DataCite  Metadata  V.  2.2   Required  proper8es   Op8onal  proper8es  1.  Iden8fier  (with  type  aWribute)   6.  Subject  (with  schema  aWribute)  2.  Creator  (with  name  iden*fier   7.  Contributor  (with  type  &  name  iden*fier   aWributes)   aWributes)  3.  Title  (with  op*onal  type  aWribute)   8.  Date  (with  type  aWribute)  4.  Publisher   9.  Language      5.  Publica8onYear   10.  ResourceType  (with  descrip*on  aWribute)   11.  AlternateIden*fier  (with  type  aWribute)   12.  RelatedIden*fier  (with  type  &rela*on   type  aWributes)   13.  Size       14.  Format       15.  Version   16.  Rights   17.  Descrip*on  (with  type  aWribute)  
  13. 13. •  Get  iden*fiers  •  Add  loca*on  •  Add  metadata  •  Update  loca*on  •  Update  metadata  
  14. 14. hWp://  
  15. 15. hWp://  
  16. 16. hWp://  
  17. 17. hWp://  
  18. 18. hWp://  
  19. 19. hWp://  
  20. 20. hWp://  
  21. 21. What  this  means…  
  22. 22. What  this  means…  
  23. 23. Next  Steps  DataCite  •   Dublin  Core  applica*on  profile  •   Content  Service  •   Metadata  v.  2.3  EZID  • UI  redesign  • Automated  link  checking  • Exposure  for  cita*ons   By  Nicola  Whitaker  hWp://  
  24. 24. Next  Steps  for  you  •  Get  more  informa*on,  and  •  Try  EZID  for  yourself!   By  Nicola  Whitaker  hWp://  
  25. 25. For  more  informa*on  EZID  EZID  applica*on:  hWp://    EZID  website:  hWp://  UC3  website:  hWp://  DataCite  DataCite  Home:  hWp://  DataCite  Metadata  Schema:   hWp://­‐2.2/index.html  DataCite  Metadata  Search:  hWp://  Contact  Joan  Starr  at  
  26. 26. Ques*ons?   by  Horia  Varlan     hWp://  
  27. 27. Digital  Library  Services  in  the  Cloud   Mark  McFarland   Director,  Texas  Digital  Library  
  28. 28. Outline  •  Who:  Texas  Digital  Library  •  Where:  on  the  cloud  •  Why:  mo*va*ons  •  When:  late  2010  •  What:  lessons  learned  June  2011   30  
  29. 29. Who:  Texas  Digital  Library  •  Consor*um  of  higher  educa*on  ins*tu*ons  in  Texas  •  Current  services  include:   –  Ins*tu*on:  IR  (DSpace),  ETD  system   –  Faculty:  OJS,  OCS,  blogs,  wikis   –  Approximately  70  customer-­‐facing  service  instances  •  Legacy  hardware  included   –  Compute  servers   –  Storage  servers   –  Network  support  devices  June  2011   31  
  30. 30. Where:  on  the  cloud  •  Migrated  customer-­‐facing  services  to  AWS   –  50  AWS  VM  instances  •  Maintained  some  services  on  local  hardware  •  Simplified  and  consolidated  system   architecture  June  2011   32  
  31. 31. Why:  mo*va*ons  /  When:  late  2010  •  Disaster  recovery  plan   –  Prepare  for  data  center  move  •  Elas*c  capacity   –  New  members,  collec*ons  •  Personnel  savings   –  Fewer  competencies,  responsibili*es  •  Began  Oct  2010  June  2011   33  
  32. 32. What:  lessons  learned  •  The  Good   –  Elas*c  capacity;  customers  did  not  no*ce  change   –  No  hardware  purchase  cycle  •  The  Mixed   –  Lower  personnel  costs;  failover  •  The  Unexpected   –  Development  tools;  concerns  about  AWS  being  in   U.S.;  excellent  management  console  June  2011   34  
  33. 33. Future  •  Preserva*on   –  DuraCloud  •  Con*nue  to  evaluate   –  AWS  is  flexible  and  feature  rich,  but  may  s*ll  not   be  cost  effec*ve  June  2011   35  
  34. 34. For  more  informa*on  about  the  TDL,  please  visit  the  Texas   Digital  Library  website  at  hWp://     or  contact  us  at    
  35. 35. Data  Governance  and   Legal  Interoperability   MacKenzie  Smith,  Science  Fellow  ©  Crea*ve  Commons,  2011.  This  work  is  licensed  under  a  Crea*ve  Commons  AWribu*on  3.0  United  States  License.  
  36. 36. Why  Data  Sharing  is  Good    •  research  reproducibility  •  fiscal  responsibility  •  broadest  possible  impact  •  large-­‐scale  data  interoperability   –  Includes  technical,  social,  legal  and  policy  aspects   –  usual  focus  on  technical/social   –  focus  here  on  legal/policy  aspects  
  37. 37. Why  Data  Sharing  is  Hard  •  No  incen*ves  to  improve  data  quality,  provide   missing  documenta*on  •  Confiden*ality  and  privacy  concerns   (e.g.  HIPAA,  endangered  species)  •  Patents  and  commercial  poten*al  •  Closed  Access  to  journal  ar*cles  (i.e.  results)  •  IP  issues  very  complicated  
  38. 38. Defini*ons  Data  governance  is  the  system  of  decision  rights  and   accountabili8es  that  describe  who  can  take  what  ac8ons   with  what  data,  and  when,  under  what  circumstances,  using   what  methods  •  strategies  for  data  quality  control  and  management,  and  processes  that   insure  important  data  assets  are  formally  managed  throughout  an   organiza*on;   –  organiza*ons  can  be  legal  en**es  like  universi*es,  or  virtual  organiza=ons   (e.g.  distributed  research  collabora*ons)   –  Includes  business  processes  and  risk  management;  •  laws  and  policies  associated  with  data;  •  ensures  that  data  can  be  trusted  and  that  people  are  accountable  for   ac*ons  affec*ng  the  data  
  39. 39. Defini*ons  •  A"ribuon  is  legally-­‐imposed,  remedy  is  lawsuit  •  Credit  is  what  researchers  want    •  Citaon  is  the  norm  in  scholarly  communica*on,   to  provide  suppor*ng  evidence,  now  proxy  for   credit  AWribu*on  does  not  insure  credit  or  cita*on.    
  40. 40. Legal  Mechanisms  for  Sharing  Data  1.   licenses   Require  aWribu*on  2.   contracts  3.   waivers     No  aWribu*on   requirement  
  41. 41. Copyright  for  Data  •  Does  not  apply  to  facts,  e.g.,  most  scien*fic   data  •  Can  apply  to  a  collec=on  of  facts,  but  only  to   original  aspects,  not  facts  themselves  •  Can  extract  facts  from  a  copyrighted  database   without  infringing  
  42. 42. Licenses  •  Licenses  are  not  contracts   –  depend  on  underlying  rights,  e.g.  copyright  or  sui  generis   rights   –  Copyright  is  a  bundle  of  rights,  automa*c  when  fixed,   limited  in  scope  and  dura*on  •  US  and  EU  differ  (EU  has  sui  generis  data  rights)   so  different  licenses  cover  copyright,  sui  generis   rights,  or  both  
  43. 43. Licenses  •  Crea*ve  Commons  (CC-­‐BY)  example   –  applies  to  data  and  databases  to  the  extent  they’re   copyrightable   –  Only  data  uses  that  implicate  copyright  trigger   aWribu*on  requirement   –  uses  of  data  that  do  not  implicate  copyright,  e.g.  is  in   the  public  domain,  do  not  trigger  aWribu*on  
  44. 44. Licenses  •  Hard  to  assess  copyright  for  par*cular  data   and  databases  •  Hard  to  know  when  license  applies,  creates   risks:   –  data  provider  be  misled   –  data  user  will  under  or  over  comply  
  45. 45. Licenses  •  AWribu*on  requirements  are  inflexible,   causing  absurd  situa*ons   –  e.g.  providing  aWribu*on  to  1,000  providers       in  1,000  different  ways   –  known  as  ‘aWribu*on  stacking’    •  Could  provide  aWribu*on  and  s*ll  not  sa*sfy   norms  or  expecta*ons  
  46. 46. Contracts  
  47. 47. Contracts  •  Do  not  require  underlying  right     –  rely  on  offer/acceptance,  click  through,  terms  of  use   –  require  formali*es,  e.g.  aWribu*on  •  Downsides   –  confusing  obliga*ons,  no  standardiza*on,  each  user   agreement  can  have  different  requirements  •  Researchers  may  avoid  data  if  they  can’t   understand  the  terms  of  use  
  48. 48. Contracts  Unlike  licenses,  contracts  only  binds  par=es  •  If  someone  obtains  licensed  data  and  shares  it,  anyone   who  obtains  data  from  that  user  is  s*ll  bound  by  the   license  •  If  data  had  been  shared  by  contract,  anyone  obtaining   data  from  the  second  party  is  not  bound  by  the   contract  since  they  aren’t  a  party  to  the  contract  •  In  this  respect,  contracts  are  more  limited  than  licenses  
  49. 49. Contracts  •  Have  broader  reach  than  licenses   –  not  *ed  to  a  legal  right   –  can  take  away  rights  of  public  
  50. 50. Example  
  51. 51. Waivers  •  Provide  legal  certainty   –  No  need  to  decipher  copyright  protec*on  or  six  through  confusing   legalese   –  BeWer  than  silence,  to  avoid  forcing  people  to  guess  what  their  risks   are    •  Mean  loss  of  control   –  Can’t  require  aWribu*on  or  other  terms  •  Avoid  problems  and  rely  on  scholarly  norms   –  no  aWribu*on  stacking  or  inappropriate  obliga*ons  
  52. 52. 3  levels:  Waiver,  Fall-­‐back  license,  Non-­‐asser*on  pledge  
  53. 53. Summary  •  Law  is  messy,  each  approach  has  consequences  •  Licenses  –  (1)  legal  uncertainty  about  scope,  (2)   requirements  can  be  inconsistent  with  norms  •  Contracts  –  (1)  burdensome  requirements  with  custom   terms,  (2)  exceed  scope  of  rights  with  requirements  that   take  away  normal  rights  •  Waivers  –  (1)  avoid  problems,  but  (2)  lose  control  and   rely  on  norms  
  54. 54. Summary  •  Each  approach  requires  loss  of  control  •  No  mechanism  imposes  legally-­‐binding  obliga*ons  in   way  that  perfectly  maps  to  scholarly  credit,  e.g.   cita*on  •  Ideal  solu*on  creates  the  least  fric*on  to  scien*fic   progress  while  giving  credit  where  due,  i.e.,  waivers   and  norms  (the  community  governs  itself)