Export/Import in Dspace & Backup
ARD Prasad
Where Dspace stores data

/dspace/assetstore directory will have all the
− Bitstreams and licenses

PostgreSQL databases contains information on
− Metadata
− Information about Communities
− Information about Collections
− Information about e-groups & authorizations
− Information about E-persons & authorizations
− Host of other information
Export/Import in Dspace

Export and import deal only with bitstreams,
metadata, license and handles.

But NOT information about communities,
collection, members, reviewers etc., access
permissions/restrictions

You can export or Import
− An item or
− All items in a collection
Export command syntax
/dspace/bin/dsrun org.dspace.app.itemexport.ItemExport 
--type=COLLECTION --id=collID 
--dest=dest_dir --number=seq_num
Where
--type can have either the value COLLECTION or ITEM
--id is the handle/collection_or_Item_Id ex: 1849/2
(or 123456789/2 in case you do not have handle)
--dest is destination directory
(directory be created prior before running the script)
--number is sequence number, it can be just 1
Shell Script for exporting
#!/bin/sh
if test $# != 1
then
echo "Usage: $0 <export-directoryname>"
exit
fi
declare collection_id[5]=(2 3 4 5 6 7)
for((i=0; i<=5; i++))
do
mkdir $1/${collection_id[$i]}
/dspace/bin/dsrun
org.dspace.app.itemexport.ItemExport 
--type=COLLECTION 
--id=1849/${collection_id[$i]} 
--dest=$1/${collection_id[$i]} 
--number=1
done
In the shell script...

Look for the line

declare collection_id[5]=(2 3 4 5 6 7)

Change 2 3 4 etc with your collection ids

Clue: collection ids are the one that appear in the
browser URL after handle prefix, ie. If you have not
registered with CNRI, the number that appears after
123456789/

Also create the directory where the data should be
exported to
Shell Script for Import
#!/bin/sh
declare collection_id[5]=(2 3 4 5 6 7)
for((i=0; i<=5; i++))
do
/dspace/bin/dsrun
org.dspace.app.itemimport.ItemImport 
-a -e dspace@localhost.localdomain 
-c 123456789/${collection_id[$i]} 
-s $1/${collection_id[$i]} 
-m mapfile
done

Here also change the collection ids in the import
progam

-e option, should have the dspace admin id (i.e. e-
mail address)
What is exported

The following files will be created for every item
− dublin_core.xml ( metadata)
− Handle ( one line having the handle number)
− license.txt
− Actual file ( bitstream: could be pdf or doc or an
image file)
− Contents (with two lines – license file name, and
actual bitstream name)
However

Import and Export are meant for data exchange

It can however, be used for partial back up

It takes care of only items

It does not back up
− Your communities, collection, e-groups, e-persons
How to backup postgresql

pg_dump as dspace user

Example:

$ pg_dump dspace > backupfile

Note: where dspace is name of the database

backup file will have all the table definitions and
contents.

pg_dump has lots of options
How to restore database

psql -d dspace –f dumpedfile

Note: pgsql has lots of options, to know more
about options, you can use
Alternative (using tar)

To dump a database called mydb that contains
large objects to a tar file:

$ pg_dump -Ft -b mydb > db.tar

To reload this database (with large objects)
to an existing database called newdb:

$ pg_restore -d newdb db.tar
Upgrading

This procedure should be first step when you are
upgrading DSpace to newer version

Even if upgradation fails, you have back to fall
back
Upgrading Tip

Have different database and as a different user, so
that you do not have to touch the existing DSpace
insallation
Extra care

It is a good idea to take a tape (hard disk) back up
of
− Entire /dspace directory
− pg_dump out put file
− And the export directory
Final Lesson

Learning dspace is too easy.
− can be learnt in a week
− Can be mastered in a month

Creating content is continuous, long-term,
perhaps no end

Be more careful with the Content
Thank You

Ingest export

  • 1.
    Export/Import in Dspace& Backup ARD Prasad
  • 2.
    Where Dspace storesdata  /dspace/assetstore directory will have all the − Bitstreams and licenses  PostgreSQL databases contains information on − Metadata − Information about Communities − Information about Collections − Information about e-groups & authorizations − Information about E-persons & authorizations − Host of other information
  • 3.
    Export/Import in Dspace  Exportand import deal only with bitstreams, metadata, license and handles.  But NOT information about communities, collection, members, reviewers etc., access permissions/restrictions  You can export or Import − An item or − All items in a collection
  • 4.
    Export command syntax /dspace/bin/dsrunorg.dspace.app.itemexport.ItemExport --type=COLLECTION --id=collID --dest=dest_dir --number=seq_num Where --type can have either the value COLLECTION or ITEM --id is the handle/collection_or_Item_Id ex: 1849/2 (or 123456789/2 in case you do not have handle) --dest is destination directory (directory be created prior before running the script) --number is sequence number, it can be just 1
  • 5.
    Shell Script forexporting #!/bin/sh if test $# != 1 then echo "Usage: $0 <export-directoryname>" exit fi declare collection_id[5]=(2 3 4 5 6 7) for((i=0; i<=5; i++)) do mkdir $1/${collection_id[$i]} /dspace/bin/dsrun org.dspace.app.itemexport.ItemExport --type=COLLECTION --id=1849/${collection_id[$i]} --dest=$1/${collection_id[$i]} --number=1 done
  • 6.
    In the shellscript...  Look for the line  declare collection_id[5]=(2 3 4 5 6 7)  Change 2 3 4 etc with your collection ids  Clue: collection ids are the one that appear in the browser URL after handle prefix, ie. If you have not registered with CNRI, the number that appears after 123456789/  Also create the directory where the data should be exported to
  • 7.
    Shell Script forImport #!/bin/sh declare collection_id[5]=(2 3 4 5 6 7) for((i=0; i<=5; i++)) do /dspace/bin/dsrun org.dspace.app.itemimport.ItemImport -a -e dspace@localhost.localdomain -c 123456789/${collection_id[$i]} -s $1/${collection_id[$i]} -m mapfile done
  • 8.
     Here also changethe collection ids in the import progam  -e option, should have the dspace admin id (i.e. e- mail address)
  • 9.
    What is exported  Thefollowing files will be created for every item − dublin_core.xml ( metadata) − Handle ( one line having the handle number) − license.txt − Actual file ( bitstream: could be pdf or doc or an image file) − Contents (with two lines – license file name, and actual bitstream name)
  • 10.
    However  Import and Exportare meant for data exchange  It can however, be used for partial back up  It takes care of only items  It does not back up − Your communities, collection, e-groups, e-persons
  • 11.
    How to backuppostgresql  pg_dump as dspace user  Example:  $ pg_dump dspace > backupfile  Note: where dspace is name of the database  backup file will have all the table definitions and contents.  pg_dump has lots of options
  • 12.
    How to restoredatabase  psql -d dspace –f dumpedfile  Note: pgsql has lots of options, to know more about options, you can use
  • 13.
    Alternative (using tar)  Todump a database called mydb that contains large objects to a tar file:  $ pg_dump -Ft -b mydb > db.tar  To reload this database (with large objects) to an existing database called newdb:  $ pg_restore -d newdb db.tar
  • 14.
    Upgrading  This procedure shouldbe first step when you are upgrading DSpace to newer version  Even if upgradation fails, you have back to fall back
  • 15.
    Upgrading Tip  Have differentdatabase and as a different user, so that you do not have to touch the existing DSpace insallation
  • 16.
    Extra care  It isa good idea to take a tape (hard disk) back up of − Entire /dspace directory − pg_dump out put file − And the export directory
  • 17.
    Final Lesson  Learning dspaceis too easy. − can be learnt in a week − Can be mastered in a month  Creating content is continuous, long-term, perhaps no end  Be more careful with the Content
  • 18.