Data	
  Repositories	
  and	
  
         Services	
  
        Xiamen	
  University	
  Library	
  
             June	
  8,	
  2012	
  
                        	
  
                    Jian	
  Qin	
  
        School	
  of	
  InformaCon	
  Studies	
  
             Syracuse	
  University	
  
       hDp://eslib.ischool.syr.edu/jqin/	
  
Agenda	
  
•      What	
  is	
  a	
  repository?	
  Repository	
  soNware?	
  
•      What	
  does	
  it	
  do?	
  	
  
•      How	
  does	
  it	
  work?	
  
•      Case	
  studies:	
  
         –  Dryad:	
  an	
  internaConal	
  repository	
  of	
  data	
  and	
  
            publicaCons	
  for	
  basic	
  and	
  applied	
  biosciences	
  
         –  Dataverse:	
  a	
  data	
  repository	
  system	
  




6/8/12	
                           Data	
  repositories	
  and	
  services	
      2	
  
What	
  is	
  a	
  data	
  repository?	
  
Data	
  Repository	
  is	
  a	
  logical	
  (and	
  
someCmes	
  physical)	
  parCConing	
                                            Repository	
  commonly	
  
                                                                                 refers	
  to	
  a	
  locaCon	
  for	
  
    of	
  data	
  where	
  mulCple	
  
                                                                                storage,	
  oNen	
  for	
  safety	
  
   databases	
  which	
  apply	
  to	
  
                                                                                    or	
  preservaCon.	
  
 specific	
  applicaCons	
  or	
  sets	
  of	
  
                                                                                                    	
  
         applicaCons	
  reside.	
  	
                                            hDp://en.wikipedia.org/wiki/Repository	
  	
  
                    	
  
hDp://www.learn.geekinterview.com/data-­‐warehouse/
       dw-­‐basics/what-­‐is-­‐data-­‐repository.html	
  	
  




 6/8/12	
                                       Data	
  repositories	
  and	
  services	
                                         3	
  
WHAT	
  CAN	
  WE	
  EXPECT	
  IN	
  A	
  DATA	
  
   REPOSITORY?	
  




6/8/12	
               Data	
  repositories	
  and	
  services	
     4	
  
Technical	
  features	
  
•  Standards	
  
      –  OAI-­‐PMH	
  
      –  Z39.50	
  protocol	
  	
  
      –  Open	
  source	
  license	
  
•  Hardware	
                                                                     •  Staff	
  requirements	
  
      –  Minimum	
  hardware	
  requirements	
                                            –  UNIX	
  systems	
  
      –  SAN	
  support	
                                                                    administrator	
  
•  So;ware	
                                                                              –  Java	
  programmer	
  
      –      OS	
  	
                                                                     –  PERL	
  programmer	
  
      –      Programming	
  language	
                                                    –  Python	
  programmer	
  
      –      Database	
  
      –      Web	
  server	
                                                   Open	
  Society	
  InsCtute.	
  (2004).	
  A	
  guide	
  to	
  
      –      Java	
  servlet	
  engine	
                                       insCtuConal	
  repository	
  soNware.	
  3rd	
  ed.	
  
                                                                               hDp://www.soros.org/openaccess/pdf/
      –      Search	
  engine	
                                                OSI_Guide_to_IR_SoNware_v3.pdf	
  	
  	
  
      – 
6/8/12	
     Other	
                       Data	
  repositories	
  and	
  services	
                                                   5	
  
Features	
  and	
  funcCons	
  
•  Repository	
  &	
  system	
  administraDon	
  
      –  User	
  registraCon,	
  authenCcaCon	
  &	
  password	
  
         administraCon	
  
      –  Module-­‐level	
  APIs	
  
•  Content	
  submission	
  administraDon	
  
      –  Define	
  mulCple	
  collecCons	
  with	
  same	
  instance	
  of	
  
         system	
  
      –  Submission	
  stages	
  
      –  Submission	
  support	
  
      –  System	
  generated	
  usage	
  stats	
  and	
  reposts	
  
                   Open	
  Society	
  InsCtute.	
  (2004).	
  A	
  guide	
  to	
  insCtuConal	
  repository	
  soNware.	
  3rd	
  ed.	
  
                   hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf	
  	
  	
  

 6/8/12	
                                    Data	
  repositories	
  and	
  services	
                                            6	
  
FuncCons	
  of	
  repositories	
  
•  Content	
  management	
                                                •  Archiving	
  
    –     Content	
  import/export	
                                                 –  Persistent	
  document	
  
                                                                                        idenCficaCon	
  
    –     Document/object	
  formats	
  
                                                                                     –  Data	
  preservaCon	
  report	
  
    –     Metadata	
  
                                                                                     –  Object	
  history/version	
  control	
  
    –     Real-­‐Cme	
  updaCng	
  and	
  
          indexing	
  of	
  accepted	
  content	
                         •  System	
  maintenance	
  
•  DisseminaCon	
                                                                    –  System	
  support	
  
                                                                                                •  DocumentaCon/manual	
  
    –  User	
  interface	
                                                                      •  Listserv	
  
    –  Search	
  capability	
                                                                   •  Bug	
  track/feature	
  request	
  
               •    Full	
  text	
                                                                 system	
  
               •    All	
  descripCve	
  metadata	
                                             •  Formal	
  support/help	
  desk	
  
               •    Selected	
  metadata	
  fields	
  
               •    Browse	
  
               •    Sort	
  search	
  results	
                                        Open	
  Society	
  InsCtute.	
  (2004).	
  A	
  guide	
  to	
  
    –  Indexed	
  by	
  Google/other	
                                                 insCtuConal	
  repository	
  soNware.	
  3rd	
  ed.	
  
       search	
  engines	
                                                             hDp://www.soros.org/openaccess/pdf/
                                                                                       OSI_Guide_to_IR_SoNware_v3.pdf	
  	
  	
  
  6/8/12	
                                        Data	
  repositories	
  and	
  services	
                                                      7	
  
The	
  context	
  of	
  repositories	
  
                                                            Research	
  
                                                          community	
  

              InsCtuConal	
  
               repository	
                                                               Data	
  
                                                                                       repository	
  
              PublicaCons,	
  
             presentaCons,	
                                                           Datasets	
  
              reports,	
  etc.	
  	
  
                                                 Disciplines	
  
                                                 Standards	
  
                                                 Technology	
  
6/8/12	
                                 Data	
  repositories	
  and	
  services	
                      8	
  
InsCtuConal	
  repositories	
  
  InsCtuConal	
              •  An	
  insCtuConal	
  repository	
  (IR)consists	
  of	
  formally	
  
   repository	
                 organized	
  and	
  managed	
  collecCons	
  of	
  digital	
  content	
  
                                generated	
  by	
  faculty,	
  staff,	
  and	
  students	
  at	
  an	
  insCtuCon	
  
 PublicaCons,	
  
presentaCons,	
              •  Types	
  of	
  IRs:	
  
 reports,	
  etc.	
  	
              –  CollecCon-­‐based	
  digital	
  repositories	
  managed	
  by	
  library	
  
                                        professionals	
  
                                     –  Course	
  management	
  systems	
  and	
  associated	
  file	
  stores	
  
                                     –  CollecCon	
  of	
  research	
  data	
  and	
  reports	
  managed	
  by	
  research	
  
                                        units	
  (centers,	
  laboratories,	
  etc.)	
  
                                     –  Student	
  academic	
  porlolio	
  systems	
  
                                     –  InsCtuConal	
  file	
  storage	
  systems	
  
                                     –  Digital	
  asset	
  management	
  workflow	
  systems	
  	
  
                                     –  Web	
  content	
  management	
  systems	
  	
  used	
  by	
  insCtuCons	
  or	
  
                                        depts	
  to	
  store	
  and	
  stage	
  web	
  content	
  
EDUCAUSE	
  Evolving	
  Technologies	
  CommiDee.	
  (2003).	
  InsCtuConal	
  repositories:	
  Enhancing	
  teaching,	
  learning,	
  and	
  
research.	
  hDp://net.educause.edu/ir/library/pdf/DEC0303.pdf	
  	
  
  6/8/12	
                                             Data	
  repositories	
  and	
  services	
                                           9	
  
Data	
  repositories	
  
•  No	
  one	
  agreed-­‐upon	
  definiCon	
  
                                                                                        Data	
  
•  CharacterisCcs:	
                                                                 repository	
  

         –  A	
  repository	
  operated	
  by	
  an	
  academic	
  
            insCtuCon/unit	
  or	
  a	
  research	
  organizaCon	
                   Datasets	
  
         –  A	
  system	
  for	
  storing,	
  managing,	
  preserving,	
  
            and	
  providing	
  access	
  to	
  data	
  
         –  Centered	
  on	
  a	
  discipline	
  or	
  a	
  research	
  field	
  
            involving	
  mulCple	
  disciplines	
  
         –  Policies	
  governing	
  the	
  intellectual	
  property	
  
            rights,	
  management,	
  access,	
  sharing,	
  and	
  
            citaCon	
  

6/8/12	
                               Data	
  repositories	
  and	
  services	
               10	
  
Dryad:	
  a	
  repository	
  for	
  
                                    data	
  and	
  publicaCons	
  
hDp://datadryad.org/	
  	
  

  •  As	
  a	
  data	
  repository,	
  Dryad	
  provides	
  a	
  plalorm	
  to	
  associate	
  
     data	
  with	
  underlying	
  publicaCons.	
  	
  
  •  Content	
  acquisiCon:	
  user	
  submission	
  
  •  How	
  to	
  moCvate	
  users	
  to	
  submit	
  data?	
  
      •  Make	
  it	
  simple	
  and	
  rewarding	
  
      •  Provide	
  detailed	
  support	
  informaCon	
  about:	
  
                •  DeposiCng	
  data	
  
                •  Managing	
  data	
  
                •  Intellectual	
  property	
  rights	
  (CC0)	
  
                •  Download	
  data	
  packages	
  
                •  View	
  usage	
  staCsCcs	
  
  6/8/12	
                            Data	
  repositories	
  and	
  services	
                   11	
  
hDp://datadryad.org/handle/10255/dryad.8085	
  	
  

                                                                 Dryad	
  
                                                                metadata	
  
                                                                 record	
  
                                                                example	
  




6/8/12	
     Data	
  repositories	
  and	
  services	
                   12	
  
Dryad	
  metadata	
  record	
  example	
  (cont’d)	
  


Individual	
  files	
  in	
  
the	
  data	
  package.	
  
The	
  metadata	
  
shows:	
  
•  #	
  of	
  downloads	
  
•  File	
  technical	
  
   data	
  
•  Copyright	
  type	
  
•  DocumentaCon	
  
   for	
  the	
  data	
  file	
  




        6/8/12	
                      Data	
  repositories	
  and	
  services	
     13	
  
Dryad	
  Backend	
  
•  Uses	
  core	
  features	
  of	
  DSpace	
  with	
  
   modificaCons	
  or	
  complete	
  replacement	
  
•  Uses	
  OAI-­‐PMH	
  to	
  allow	
  metadata	
  harvesCng	
  
         –  Metadata	
  formats	
  available	
  for	
  harvesCng	
  include	
  
             •  METS/MODS,	
  OAI-­‐DC	
  (Dublin	
  Core),	
  OAI-­‐ORE/Atom,	
  
                and	
  RDF/DC	
  	
  
•  Uses	
  DOI	
  to	
  idenCfy	
  Dryad	
  data	
  packages	
  and	
  
   files	
  
                         hDp://wiki.datadryad.org/Category:Technical_DocumentaCon	
  	
  

6/8/12	
                           Data	
  repositories	
  and	
  services	
           14	
  
DOI	
  Examples	
  	
  	
  
     •  Data	
  packages	
  
             –  doi:10.5061/dryad.1664	
  
             –  doi:10.5061/dryad.642	
  
             –  doi:10.5061/dryad.1307	
  
     •  Data	
  files	
  
             –  doi:10.5061/dryad.1664/1	
  
             –  doi:10.5061/dryad.642/1	
  
             –  doi:10.5061/dryad.1307/1	
  
             –  doi:10.5061/dryad.1307/2	
  
             –  doi:10.5061/dryad.1307/3	
  
6/8/12	
                        Data	
  repositories	
  and	
  services	
     15	
  
DATA	
  REPOSITORY	
  SOFTWARE	
  


6/8/12	
           Data	
  repositories	
  and	
  services	
     16	
  
6/8/12	
     Data	
  repositories	
  and	
  services	
     17	
  
Dataverse	
  metadata	
  ediCng	
  interface	
  




6/8/12	
     Data	
  repositories	
  and	
  services	
                                       18	
  
Dataverse	
  metadata	
  ediCng	
  interface	
  (cont’d)	
  




6/8/12	
                                    Data	
  repositories	
  and	
  services	
     19	
  
6/8/12	
     Data	
  repositories	
  and	
  services	
     20	
  
Standards	
  and	
  tools	
  for	
  repositories	
  
             •  Open	
  Archive	
  IniCaCve	
  (OAI)	
  and	
  its	
  Protocol	
  for	
  
                Metadata	
  HarvesCng	
  (OAI-­‐PMH)	
  
             •  Tools	
  (open	
  source):	
  
                 –  DSpace	
  (hDp://www.dspace.org)	
  	
  
                 –  Fedora	
  (hDp://www.fedora-­‐commons.org/)	
  
                 –  Dataverse	
  (hDp://thedata.org/)	
  	
  
                 –  EPrints	
  (hDp://www.eprints.org/)	
  
                 –  More:	
  
                    hDp://oad.simmons.edu/oadwiki/Free_and_open-­‐
                    source_repository_soNware	
  	
  



6/8/12	
                               Data	
  repositories	
  and	
  services	
            21	
  

Data repositories -- Xiamen University 2012 06-08

  • 1.
    Data  Repositories  and   Services   Xiamen  University  Library   June  8,  2012     Jian  Qin   School  of  InformaCon  Studies   Syracuse  University   hDp://eslib.ischool.syr.edu/jqin/  
  • 2.
    Agenda   •  What  is  a  repository?  Repository  soNware?   •  What  does  it  do?     •  How  does  it  work?   •  Case  studies:   –  Dryad:  an  internaConal  repository  of  data  and   publicaCons  for  basic  and  applied  biosciences   –  Dataverse:  a  data  repository  system   6/8/12   Data  repositories  and  services   2  
  • 3.
    What  is  a  data  repository?   Data  Repository  is  a  logical  (and   someCmes  physical)  parCConing   Repository  commonly   refers  to  a  locaCon  for   of  data  where  mulCple   storage,  oNen  for  safety   databases  which  apply  to   or  preservaCon.   specific  applicaCons  or  sets  of     applicaCons  reside.     hDp://en.wikipedia.org/wiki/Repository       hDp://www.learn.geekinterview.com/data-­‐warehouse/ dw-­‐basics/what-­‐is-­‐data-­‐repository.html     6/8/12   Data  repositories  and  services   3  
  • 4.
    WHAT  CAN  WE  EXPECT  IN  A  DATA   REPOSITORY?   6/8/12   Data  repositories  and  services   4  
  • 5.
    Technical  features   • Standards   –  OAI-­‐PMH   –  Z39.50  protocol     –  Open  source  license   •  Hardware   •  Staff  requirements   –  Minimum  hardware  requirements   –  UNIX  systems   –  SAN  support   administrator   •  So;ware   –  Java  programmer   –  OS     –  PERL  programmer   –  Programming  language   –  Python  programmer   –  Database   –  Web  server   Open  Society  InsCtute.  (2004).  A  guide  to   –  Java  servlet  engine   insCtuConal  repository  soNware.  3rd  ed.   hDp://www.soros.org/openaccess/pdf/ –  Search  engine   OSI_Guide_to_IR_SoNware_v3.pdf       –  6/8/12   Other   Data  repositories  and  services   5  
  • 6.
    Features  and  funcCons   •  Repository  &  system  administraDon   –  User  registraCon,  authenCcaCon  &  password   administraCon   –  Module-­‐level  APIs   •  Content  submission  administraDon   –  Define  mulCple  collecCons  with  same  instance  of   system   –  Submission  stages   –  Submission  support   –  System  generated  usage  stats  and  reposts   Open  Society  InsCtute.  (2004).  A  guide  to  insCtuConal  repository  soNware.  3rd  ed.   hDp://www.soros.org/openaccess/pdf/OSI_Guide_to_IR_SoNware_v3.pdf       6/8/12   Data  repositories  and  services   6  
  • 7.
    FuncCons  of  repositories   •  Content  management   •  Archiving   –  Content  import/export   –  Persistent  document   idenCficaCon   –  Document/object  formats   –  Data  preservaCon  report   –  Metadata   –  Object  history/version  control   –  Real-­‐Cme  updaCng  and   indexing  of  accepted  content   •  System  maintenance   •  DisseminaCon   –  System  support   •  DocumentaCon/manual   –  User  interface   •  Listserv   –  Search  capability   •  Bug  track/feature  request   •  Full  text   system   •  All  descripCve  metadata   •  Formal  support/help  desk   •  Selected  metadata  fields   •  Browse   •  Sort  search  results   Open  Society  InsCtute.  (2004).  A  guide  to   –  Indexed  by  Google/other   insCtuConal  repository  soNware.  3rd  ed.   search  engines   hDp://www.soros.org/openaccess/pdf/ OSI_Guide_to_IR_SoNware_v3.pdf       6/8/12   Data  repositories  and  services   7  
  • 8.
    The  context  of  repositories   Research   community   InsCtuConal   repository   Data   repository   PublicaCons,   presentaCons,   Datasets   reports,  etc.     Disciplines   Standards   Technology   6/8/12   Data  repositories  and  services   8  
  • 9.
    InsCtuConal  repositories   InsCtuConal   •  An  insCtuConal  repository  (IR)consists  of  formally   repository   organized  and  managed  collecCons  of  digital  content   generated  by  faculty,  staff,  and  students  at  an  insCtuCon   PublicaCons,   presentaCons,   •  Types  of  IRs:   reports,  etc.     –  CollecCon-­‐based  digital  repositories  managed  by  library   professionals   –  Course  management  systems  and  associated  file  stores   –  CollecCon  of  research  data  and  reports  managed  by  research   units  (centers,  laboratories,  etc.)   –  Student  academic  porlolio  systems   –  InsCtuConal  file  storage  systems   –  Digital  asset  management  workflow  systems     –  Web  content  management  systems    used  by  insCtuCons  or   depts  to  store  and  stage  web  content   EDUCAUSE  Evolving  Technologies  CommiDee.  (2003).  InsCtuConal  repositories:  Enhancing  teaching,  learning,  and   research.  hDp://net.educause.edu/ir/library/pdf/DEC0303.pdf     6/8/12   Data  repositories  and  services   9  
  • 10.
    Data  repositories   • No  one  agreed-­‐upon  definiCon   Data   •  CharacterisCcs:   repository   –  A  repository  operated  by  an  academic   insCtuCon/unit  or  a  research  organizaCon   Datasets   –  A  system  for  storing,  managing,  preserving,   and  providing  access  to  data   –  Centered  on  a  discipline  or  a  research  field   involving  mulCple  disciplines   –  Policies  governing  the  intellectual  property   rights,  management,  access,  sharing,  and   citaCon   6/8/12   Data  repositories  and  services   10  
  • 11.
    Dryad:  a  repository  for   data  and  publicaCons   hDp://datadryad.org/     •  As  a  data  repository,  Dryad  provides  a  plalorm  to  associate   data  with  underlying  publicaCons.     •  Content  acquisiCon:  user  submission   •  How  to  moCvate  users  to  submit  data?   •  Make  it  simple  and  rewarding   •  Provide  detailed  support  informaCon  about:   •  DeposiCng  data   •  Managing  data   •  Intellectual  property  rights  (CC0)   •  Download  data  packages   •  View  usage  staCsCcs   6/8/12   Data  repositories  and  services   11  
  • 12.
    hDp://datadryad.org/handle/10255/dryad.8085     Dryad   metadata   record   example   6/8/12   Data  repositories  and  services   12  
  • 13.
    Dryad  metadata  record  example  (cont’d)   Individual  files  in   the  data  package.   The  metadata   shows:   •  #  of  downloads   •  File  technical   data   •  Copyright  type   •  DocumentaCon   for  the  data  file   6/8/12   Data  repositories  and  services   13  
  • 14.
    Dryad  Backend   • Uses  core  features  of  DSpace  with   modificaCons  or  complete  replacement   •  Uses  OAI-­‐PMH  to  allow  metadata  harvesCng   –  Metadata  formats  available  for  harvesCng  include   •  METS/MODS,  OAI-­‐DC  (Dublin  Core),  OAI-­‐ORE/Atom,   and  RDF/DC     •  Uses  DOI  to  idenCfy  Dryad  data  packages  and   files   hDp://wiki.datadryad.org/Category:Technical_DocumentaCon     6/8/12   Data  repositories  and  services   14  
  • 15.
    DOI  Examples       •  Data  packages   –  doi:10.5061/dryad.1664   –  doi:10.5061/dryad.642   –  doi:10.5061/dryad.1307   •  Data  files   –  doi:10.5061/dryad.1664/1   –  doi:10.5061/dryad.642/1   –  doi:10.5061/dryad.1307/1   –  doi:10.5061/dryad.1307/2   –  doi:10.5061/dryad.1307/3   6/8/12   Data  repositories  and  services   15  
  • 16.
    DATA  REPOSITORY  SOFTWARE   6/8/12   Data  repositories  and  services   16  
  • 17.
    6/8/12   Data  repositories  and  services   17  
  • 18.
    Dataverse  metadata  ediCng  interface   6/8/12   Data  repositories  and  services   18  
  • 19.
    Dataverse  metadata  ediCng  interface  (cont’d)   6/8/12   Data  repositories  and  services   19  
  • 20.
    6/8/12   Data  repositories  and  services   20  
  • 21.
    Standards  and  tools  for  repositories   •  Open  Archive  IniCaCve  (OAI)  and  its  Protocol  for   Metadata  HarvesCng  (OAI-­‐PMH)   •  Tools  (open  source):   –  DSpace  (hDp://www.dspace.org)     –  Fedora  (hDp://www.fedora-­‐commons.org/)   –  Dataverse  (hDp://thedata.org/)     –  EPrints  (hDp://www.eprints.org/)   –  More:   hDp://oad.simmons.edu/oadwiki/Free_and_open-­‐ source_repository_soNware     6/8/12   Data  repositories  and  services   21