CDL provides a service to UC campuses for ETDs with two main components: ETDs are preserved in Merritt—CDL’s preservation repository--with no public access ETDs are published in eScholarship—CDL’s Institutional Repository.
eScholarship processes the Proquest XML metadata file that accompanies the ETD, including any embargo information. If an ETD is embargoed, eScholarship creates a landing page with no access to the thesis until the embargo has expired.
Most UC campuses participate in the ETD Service. Some exceptions: UC Davis only publishes their ETDs in Proquest UC Santa Barbara is now using their own IR, Alexandria, to preserve and publish their ETDs. UC Berkeley continues to receive ETDs from Proquest and submit them to Merritt, after which they are published in eScholarship. UCLA receives ETDs from Proquest, catalogs them and them sends them to CDL via SFTP.
Each ETD in eScholarship is used on average over 4 uses per month, or over 50 times per year.
The basic workflow: Student submits thesis and accompanying files to the Grad Division on their campus, using software provided by Proquest (“ETD Administrator”: www.etdadmin.com). The student fills out a form with basic information (author, title, abstract) that is used to generate an XML metadata file. The Grad Division gathers approvals for thesis and grants the degree. They then submit the ETD to Proquest (using the ETD Administrator). They may submit ETDs on a rolling basis every day, or may wait until the end of term to submit all approved ETDs for that term. Proquest sends ETD package as a zip container to CDL’s SFTP server. The ETD Service retrieves any zip containers from the SFTP server once a day and submits them to Merritt. Later that same day, eScholarship retrieves an Atom feed from Merritt listing all objects in ETD collections. If it finds new submissions, it harvests the files and stores them on its local file system. eScholarship indexes the metadata and PDF, and adds its persistent URL. Later that day, the ETD service will request a report from eScholarship listing all of the ETDs, linking the eScholarship URL with the Merritt ARK. The ETD Service extracts the XML metadata from the zip container, adds the eScholarship URL, and generates a MARC record. The ETD Service delivers the MARC record.
Looking more closely at the first of two components of the ETD service: The ETD Service retrieves files from the SFTP server It checks that the files have been retrieved correctly, and if so, deletes them from the remote SFTP server. It extracts the XML metadata file from the zip container, and looks for the institutional code assigned by Proquest. It uses the institutional code to submit the zip container to the appropriate Merritt collection. It also extracts author and title from the XML to include in the Merritt submission. The ETD Service updates a SQLite table with information about the thesis from the Proquest metadata (author, title, etc) and Merritt (ARK identifier).
Looking at the second part of the ETD service: eScholarship retrieves the Atom feed for ETD collections at 10am every day. If it notices new submissions, it harvests them from Merritt. The PDF, XML metadata, and any auxiliary files are stored on the local eScholarship file system. eScholarship indexes these files and creates a persistent link. Later that same day, the ETD Service requests a report from eScholarship listing all ETDs. The report includes both the eScholarship URL and Merritt ARK. The ETD Service updates a table in the SQLite db with this information. The ETD Service extracts the XML metadata file from the zip container (still stored on the local ETD Service file system) and uses information from the XML and the database to generate a MARC record conforming to local campus cataloging rules and containing the eScholarship link. MARC record and/or CSV report about submissions are delivered to campus, either via email or SFTP. MARC records and CSV reports are also submitted to a restricted Merritt collection. Once the record is generated, the zip container is deleted from the local ETD Service file system.
ETDs: Electronic Thesis and Dissertation Service at the University of California
Electronic Thesis and Dissertation
Service at the University of California
California Digital Library
July 20, 2017
Outline of ETD service
• preserved in Merritt
• published in eScholarship
Features of ETD Service
• MARC records with eScholarship links
ETDs by Campus
In Merritt In eScholarship Recd via SFTP from Proquest †
In Merritt In eScholarship
* (Nothing since 2013)
~6500 in 2016
†UCLA receives their ETDs from Proquest, then
delivers them to CDL via SFTP
100,000 uses/month in eScholarship
UCLA and UCR
Grad Division Library
Merritt Collection 1
Merritt Collection 2
Merritt Collection 3
SFTP or Email?