SlideShare a Scribd company logo
1 of 29
Download to read offline
This	presentation	 is	about	 making	a	citation-only	 database more	useful	to	researchers	
by	moving	it	to	Dspace and	adding	multiple	types	 of	access	links,	such	as	openurl,	 a	DOI	
resolver,	 mementoweb links,	and	worldcat searches.
1
I’ll cover:
1. A brief background of legacy dataset
2. The goals for enhancement & migration
3. Details about the migration and enriching citations
4. The final product and results
5. The take-home message
2
The resource that was enriched and migrated is EthxWeb, a diverse open
database of 340,000 metadata records used as a catalog of materials collected
by the bioethics research library over 40 years. There are metadata records for
many format types. It is worthwhile noting that some	material	types,	such	 as	gray	
do	not	fit	as	nicely	into	the	same	publishing	and	link	resolver	 methods	 as	articles	and	
monographs	 which	have	a	more	developed	 bibliographic	 description,	 discovery,	 and	
resolver	 infrastructure that	I’ll	be	describing.
3
While	not	 providing	 content	 (which	requires	publisher	 permission	 or	licensing),	citation	
databases do	provide	 value,	especially	 in	interdisciplinary	 fields.	Web	of	Science	in	
particular	 is	often	used	 to	assess	the	impact	factor	of	academic	journals.	By	using	open	
access	frameworks	such	as	DOI	/	OpenURL,	existing	citation	databases	can	be	extended	
to	provide	 value.
4
• The original platform was SunOS running Open Text Livelink Discovery
Server,
• It had a search-only interface, no browsing capability
• There were no permanent URIs for individual records
• It was not integrated with discovery platforms
• The metadata were not using a recognized schema
• There were no links to access cited articles from publisher or libraries.
• Overall it had lots of room for improvement … oh, and it is still being used
by the United States Patent Office.
5
The	goals	in	mind	to	improve	 this	resource	 were	to:
Integrate	with	external	 platforms	for	content	 access
Add	Interoperability	 with	discovery	 platforms	through	 OAI-PMH
Adopt	 a	metadata	standard
Try	to	mitigate	link	rot	and	“content	 shift”	for	records	with	URIs
Promote	 discovery	 of	content	 using	indexing	techniques	
Creating	unique	 pages	for	each	record
Transform	the	database	from a	library	catalog	to	a	global	resource	 for	those	without	
physical	access	to	the	collection.
To	give	a	sense	of	the	history	of	this	database,	here	is	a	scan	from	an	original	
typewritten record	 from	1980.	 The	styles	of	citations	varied	over	time.
6
Early on,	I	initially	extended	 legacy	database	with	an	external	link	resolver	 system
I created	links	that	displayed	next to	records	which	 forwarded	to	OpenURL or	WorldCat
This	approach	 improved	 usability	by	linking	to	full-text	searches	but	did	not	increase	
traffic to	the	resource,	 which stayed	at	about	 120,000	 searches	per	year
7
• Last year I began the process to switch to DSpace hosted by our
institutional repository:
• I exported data in XML format using native mode interface and then
imported data dumps into MySQL
• To improve the dataset so I would be comfortable adding it to our IR I
cleaned inconsistent metadata practices using PHP scripts, SQL queries,
and in some cases, hand corrections for edge cases
• Then I added automatic indexing of title & abstract fields using our
controlled vocabulary
• And finally added the OpenURL, DOI and mementoweb links, which I will
cover in more detail next
8
• Normalizing metadata is a bit like folding origami... You want to use the
original object but transform it into a new shape by applying knowledge
and techniques, not necessarily adding or subtracting anything that wasn’t
there intrinsically (other than an object identifier, later)
• To do this, some of the techniques I used were to:
• Split journal citations into journal, volume, issue, date and page ranges
• Split author comma lists apart into their own fields
• Make corrections for “et al” and “eds” so they don’t display as authors
• Correct typos: double periods or invalid date ranges
• Convert language codes to the 2-character ISO format
• Apply a consistent title case function, remove all caps (sometime articles
have strange use of capitalization that is actually correct… if the entire
citation is all caps convert, but if only some words are all caps leave them
as it could be an acronym or intentional emphasis)
• Assign only one format type (i.e. monograph, article, etc)
• However, keep original “Bibliographic Citation” field intact from the original
record
9
• We	have	a	controlled	 vocabulary	 of	about	 1500	keywords	 so	I	added	 them	to	our	
records.	
• Simply scripted pattern matching against title and abstract fields
• This replaced existing subject keywords and MeSH in metadata that were
prone to typos
10
Next,	to add	information	 to	help	with	access	to	full-text	I	used	the	crossref API	to	locate	
digital	object	 identifiers.	Some	of	the legacy	records	had	a	DOI	listed	but	they	 were	
susceptible	 to	typos.	Using	this	technique	 I	found	76,000	 identifiers,	and	since	the	
migration	in	January	 another	 1,500	 more	newly	minted	DOIs	were	found.
For	the	resolver	 we	are	using	the	dx.doi.org	 links,	but	it	is	possible	to	use	other	 this	
identifier	with	other	 resolvers	 although	 there	are	not	many	alternatives	available.
11
Here is	an	example	DOI	monograph	 resolving	from	the	ShortDOI link	to	the	publisher	
page.	It’s	interesting	to	note	that	on	this	publisher	 page	they	also	include	 an	Open	 URL	
link and	a	WorldCat title	search	link.
12
Here’s	 an	overview	 of	the	#	of	DOI	published	 per	year	for	the records	 in	EthxWeb.	DOI	
was	introduced	 in	2000	but publishers	 are	retroactively	 assigning	DOI	to	previously	
published	 content.	 The	overall	volume	 reflects	the	funding	 for	the	work	of	hand-
indexing	 materials	supported	 through	 2000.	However	 there	is	a	trend	 that	30%	of	
works	published	 after	2002	have	a	Digital	Object	Identifier	 found,	while	previous	 years	
have	closer	to	20%.
13
The	approach	 to	using	OpenURL is	formatting	links	using	embedded	 metadata	as	key-
value	pairs.	This	approach	 works	great	for	articles	and	uses	link	resolvers	to	pair	citation	
metadata	with	access	through	 a	list	of	licensed	 and	open	access	resources	 managed	by	
local	institutions.	 OpenURL also	supports	 books	 and	journals,	 although	the	handling	 of	
that	material	type	is	not	as	fluid	as	catalog	systems.	For	OpenURL we	linked	 to	the	OCLC	
OpenURL gateway	which	works	 with	more	than	1,000	 registered	link	resolvers	
worldwide.	 Here	is	an	example	link:	
http://worldcatlibraries.org/registry/gateway?version=1.0&url_ver=Z39.88-
2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&atitle=Incomplete+financial+disclosure
s+in+a+letter+on+reducing+opioid+abuse+and+diversion.&title=JAMA+:+the+journal+o
f+the+American+Medical+Association+&volume=306&issue=13&date=2011-
10&au=Fine,+Perry+G
It	is	also	possible	 to	simply	pass	a	DOI,	as	follows:	
http://worldcatlibraries.org/registry/gateway?version=1.0&url_ver=Z39.88-
2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&id=doi:10.1001/jama.2011.1397
14
Here is	an	example	of	OpenURL resolving	 with	framed local	resolver	 showing	citation	
and	links	to	content.	 Resolvers	usually	list	a	“request	through	 interlibrary	 loan”	link	
when	full-text	access	is	not	available,	passing	forward	the	metadata	to	ILL	systems.
15
As	an	example,	here	is	the	same	book using the	OpenURL provided	 by	Cambridge	
University	 Press	and	directly	 with	the	Georgetown	 University	 resolver.	They	 passed	the	
type,	 title,	and	ISBN	– but	due	 to	how	the	resolver	 is	focused	 on	electronic	 full	text	no	
actions	to	actually	get	a	copy	 of	the	book	 were	found.
16
• This is why for book records we use a wordcat search rather than
OpenURL link. WorldCat provides better integration with libray catalogs to
provide a useful next step to scholars.
17
One more enrichment was using Mementoweb prefixes to replace any direct
links in the citations.This integrates with web archiving systems such as the
wayback machine, avoiding dead links
It also helps to mitigate “content drift” by assigning a date for the web
snapshot
It couldn’t	 be	easier	to	implement!	A	simple	URL	structure	 includes	 a	prefix,	date,	and	
the	URI	is	all	you	need.
18
Here	is	the	summary	of	links	added	 to	citations	for	major	content	 types	 in	our	
database.	OpenURL links	were	created	for	journal	 and	newspaper	 articles	and	books	
were	given	worldcat searches.	Adding	worldcat searches	for	other	 formats	is	a	
possibility	 in	the	future	as	well.
DOI	works	well	but	identifiers were	not	found	 for	about	 75%	of	the	citations	so	
OpenURL,	 worldcat searches,	and	direct	links	through	 mementoweb are	still	useful	to	
direct	people	 to	next	steps	to	access	materials.
19
As	publishers	 are	minting	DOI	for	their	back	catalog	I	am	also	regularly	checking	
citations	without a	DOI	to	see	if	an	identifier	 has	been	added.	 With	the	Dspace
metadata	update	 tool	and	the	admin	interface	created	 by	Terry	Brady	this	is	really	easy	
to	do.
20
Quality	assessment	was done	before	 migrating	the	dataset	to	Dspace.	 Both	the	DOIs	
and	OpenURLs were	checked	 to	be	accurate	 and	any	transformations	 to	the	metadata	
were	verified	 to	have	been	done	 correctly.	 We	used	a	test	instance	of	dspace to	check	
before	loading	into	production.
21
• To host the updated records we created a new collection in our IR,
DigitalGeorgetown
• Why did we use DSpace? Integration with GU platforms / tools were the
primary drivers
• Integration	 with	existing infrastructure	 helped	for	discovery	 as	well	as	reduced	
additional	 effort	to	maintain	another	 system.
• The	suite	of	admin	tools	also	makes	maintaining	the	data	very	easy.
22
Here	is	a	screenshot	 of	the	new	collection	 page	on	DigitalGeorgetown
23
• Here are screenshots of important sections to show how DSpace
accommodates citations
• Offsite links are renamed for patrons as “Find in a Library” (OpenURL /
WorldCat) and “Full Text from Publisher” (DOI)
• Dspace features such as subject browsing and related items show on page
• Can later add digital objects to the record if we are able to get rights to
digitize materials
24
Here	is	a	look at	the	full	metadata	record,	 included	 different	DublinCore fields	to	house	
all	the	information.	 The	added	 links	are	in	`dc.identifier.uri`	 and	the	original	citation	is	in	
`dc.identifier.bibliographicCitation`
25
• We completed the ingest into DigitalGeorgetown this January.
• Since then the new collection has had 232,000 visits from February to
March
• This is more visits in first 4 months than 2013 and 2014 combined to the
previous legacy database in Livelink
• This is now the most accessed collection on DigitalGeorgetown
• We’ve received postive feedback on bonus Dspace features such as
“related items”
• And as a bonus we turned off the legacy SunOS server!
26
Go	for	it– invest	time	and	effort	into	carrying	forward	unique	 datasets	into	
interoperable	 infrastructures,	 using	a	synergy	 of	features	to	create	successful	outcomes
Provide	 “next	 steps”	for	users	to	access	content	 however	 is	appropriate	 and	global,	
including	 more	than	one	wayOpen citations	as	part	of	a	research	 index	can	be	valuable	
even	without	 content
Linking	externally	 to	DOI,	OpenURL,	 and	Memento	 provides	 value
Discovery	 of	content	 is	really	help	to	scholars	and	can	save	them	time.
I	think	 citations	will	become	 more	common	 in	repositories	 in	the	future	 as platforms	
begin	to	support	 importing	citations	 from	ORCID	to	get	a	list	of	all	publications	 by	
authors.
27
• Here is a list of references if anyone would like to see more
information about this project. Also, I would like to thank Terry Brady
and the rest of the Georgetown Library team for their help with this
project!
28
Thank	 you!
The	Bioethics	Research	Library	is	collaborating	 with	Georgetown’s	 University	 Library	 to	
digitize,	preserve	 and	extend	 the	history	of	bioethics.	
29

More Related Content

Viewers also liked

SANMIGUEL - E-COMMERCE / DIGITAL BRANDING
SANMIGUEL - E-COMMERCE / DIGITAL BRANDINGSANMIGUEL - E-COMMERCE / DIGITAL BRANDING
SANMIGUEL - E-COMMERCE / DIGITAL BRANDINGHans A. Sanmiguel
 
Case study on PE film global sourcing
Case study on PE film global sourcingCase study on PE film global sourcing
Case study on PE film global sourcingDragon Sourcing
 
Structural Designer-Vijayanand
Structural Designer-VijayanandStructural Designer-Vijayanand
Structural Designer-Vijayanandvijay anand
 
0924 中川ゼミナール発表用PP
0924 中川ゼミナール発表用PP0924 中川ゼミナール発表用PP
0924 中川ゼミナール発表用PP鳥飼 皓平
 
World Peace Yoga School : Ayurveda
World Peace Yoga School : AyurvedaWorld Peace Yoga School : Ayurveda
World Peace Yoga School : AyurvedaYoga School
 

Viewers also liked (8)

Docker & azure
Docker & azureDocker & azure
Docker & azure
 
SANMIGUEL - E-COMMERCE / DIGITAL BRANDING
SANMIGUEL - E-COMMERCE / DIGITAL BRANDINGSANMIGUEL - E-COMMERCE / DIGITAL BRANDING
SANMIGUEL - E-COMMERCE / DIGITAL BRANDING
 
Case study on PE film global sourcing
Case study on PE film global sourcingCase study on PE film global sourcing
Case study on PE film global sourcing
 
Intel education
Intel educationIntel education
Intel education
 
Structural Designer-Vijayanand
Structural Designer-VijayanandStructural Designer-Vijayanand
Structural Designer-Vijayanand
 
0924 中川ゼミナール発表用PP
0924 中川ゼミナール発表用PP0924 中川ゼミナール発表用PP
0924 中川ゼミナール発表用PP
 
World Peace Yoga School : Ayurveda
World Peace Yoga School : AyurvedaWorld Peace Yoga School : Ayurveda
World Peace Yoga School : Ayurveda
 
Propuesta slideshare grupo 6
Propuesta slideshare grupo 6Propuesta slideshare grupo 6
Propuesta slideshare grupo 6
 

Similar to Making a citation database more useful with DSpace, DOIs, and OpenURL

Partnering to Improve Library Discovery Services
Partnering to Improve Library Discovery ServicesPartnering to Improve Library Discovery Services
Partnering to Improve Library Discovery ServicesJulie Zhu
 
One IOTA at a time: A Case Study of OpenURL Success Metrics
One IOTA at a time: A Case Study of OpenURL Success MetricsOne IOTA at a time: A Case Study of OpenURL Success Metrics
One IOTA at a time: A Case Study of OpenURL Success MetricsCharleston Conference
 
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceEDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceRafal Kasprowski
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) robin fay
 
Linked data for Libraries
Linked data for LibrariesLinked data for Libraries
Linked data for Librariesrobin fay
 
Searching of Web and Electronic Resources
Searching of Web and Electronic Resources Searching of Web and Electronic Resources
Searching of Web and Electronic Resources Bramesha B
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebPascal-Nicolas Becker
 
Supporting Open Scholarly Annotation
Supporting Open Scholarly AnnotationSupporting Open Scholarly Annotation
Supporting Open Scholarly AnnotationAnna Gerber
 
Article metadata in institutional repositories gonzalez
Article metadata in institutional repositories gonzalezArticle metadata in institutional repositories gonzalez
Article metadata in institutional repositories gonzalezLisa Gonzalez
 
Representing Serials Metadata in Institutional Repositories
Representing Serials Metadata in Institutional RepositoriesRepresenting Serials Metadata in Institutional Repositories
Representing Serials Metadata in Institutional RepositoriesNASIG
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Kai Eckert
 
Digital library and MLE integration - where are we now and where do we want t...
Digital library and MLE integration - where are we now and where do we want t...Digital library and MLE integration - where are we now and where do we want t...
Digital library and MLE integration - where are we now and where do we want t...Andy Powell
 
Webscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - Expanded
Webscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - ExpandedWebscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - Expanded
Webscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - ExpandedRafal Kasprowski
 
BHL Technical Update (May 2013)
BHL Technical Update (May 2013)BHL Technical Update (May 2013)
BHL Technical Update (May 2013)William Ulate
 
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)Rafal Kasprowski
 

Similar to Making a citation database more useful with DSpace, DOIs, and OpenURL (20)

Partnering to Improve Library Discovery Services
Partnering to Improve Library Discovery ServicesPartnering to Improve Library Discovery Services
Partnering to Improve Library Discovery Services
 
One IOTA at a time: A Case Study of OpenURL Success Metrics
One IOTA at a time: A Case Study of OpenURL Success MetricsOne IOTA at a time: A Case Study of OpenURL Success Metrics
One IOTA at a time: A Case Study of OpenURL Success Metrics
 
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston ConferenceEDS Web-scale Panel (Preprint), 2012 Charleston Conference
EDS Web-scale Panel (Preprint), 2012 Charleston Conference
 
Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries) Intro to the semantic web (for libraries)
Intro to the semantic web (for libraries)
 
Linked data for Libraries
Linked data for LibrariesLinked data for Libraries
Linked data for Libraries
 
Down and Dirty EPUB 3
Down and Dirty EPUB 3Down and Dirty EPUB 3
Down and Dirty EPUB 3
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Searching of Web and Electronic Resources
Searching of Web and Electronic Resources Searching of Web and Electronic Resources
Searching of Web and Electronic Resources
 
114 sem 3_j-walker
114 sem 3_j-walker114 sem 3_j-walker
114 sem 3_j-walker
 
SWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic WebSWIB14 Weaving repository contents into the Semantic Web
SWIB14 Weaving repository contents into the Semantic Web
 
Metadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the schemeMetadata : Concentrating on the data, not on the scheme
Metadata : Concentrating on the data, not on the scheme
 
Supporting Open Scholarly Annotation
Supporting Open Scholarly AnnotationSupporting Open Scholarly Annotation
Supporting Open Scholarly Annotation
 
Article metadata in institutional repositories gonzalez
Article metadata in institutional repositories gonzalezArticle metadata in institutional repositories gonzalez
Article metadata in institutional repositories gonzalez
 
Representing Serials Metadata in Institutional Repositories
Representing Serials Metadata in Institutional RepositoriesRepresenting Serials Metadata in Institutional Repositories
Representing Serials Metadata in Institutional Repositories
 
Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)Linked Open Citation Database (LOC-DB)
Linked Open Citation Database (LOC-DB)
 
Digital library and MLE integration - where are we now and where do we want t...
Digital library and MLE integration - where are we now and where do we want t...Digital library and MLE integration - where are we now and where do we want t...
Digital library and MLE integration - where are we now and where do we want t...
 
Webscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - Expanded
Webscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - ExpandedWebscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - Expanded
Webscale Discovery EDS / WorldCat Local "quick start" Charleston 2012 - Expanded
 
BHL Technical Update (May 2013)
BHL Technical Update (May 2013)BHL Technical Update (May 2013)
BHL Technical Update (May 2013)
 
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
Evaluating the Quality of OpenURLs Through Analytics (TLA 2012)
 
Ld4 l triannon
Ld4 l triannonLd4 l triannon
Ld4 l triannon
 

Making a citation database more useful with DSpace, DOIs, and OpenURL

  • 1. This presentation is about making a citation-only database more useful to researchers by moving it to Dspace and adding multiple types of access links, such as openurl, a DOI resolver, mementoweb links, and worldcat searches. 1
  • 2. I’ll cover: 1. A brief background of legacy dataset 2. The goals for enhancement & migration 3. Details about the migration and enriching citations 4. The final product and results 5. The take-home message 2
  • 3. The resource that was enriched and migrated is EthxWeb, a diverse open database of 340,000 metadata records used as a catalog of materials collected by the bioethics research library over 40 years. There are metadata records for many format types. It is worthwhile noting that some material types, such as gray do not fit as nicely into the same publishing and link resolver methods as articles and monographs which have a more developed bibliographic description, discovery, and resolver infrastructure that I’ll be describing. 3
  • 4. While not providing content (which requires publisher permission or licensing), citation databases do provide value, especially in interdisciplinary fields. Web of Science in particular is often used to assess the impact factor of academic journals. By using open access frameworks such as DOI / OpenURL, existing citation databases can be extended to provide value. 4
  • 5. • The original platform was SunOS running Open Text Livelink Discovery Server, • It had a search-only interface, no browsing capability • There were no permanent URIs for individual records • It was not integrated with discovery platforms • The metadata were not using a recognized schema • There were no links to access cited articles from publisher or libraries. • Overall it had lots of room for improvement … oh, and it is still being used by the United States Patent Office. 5
  • 6. The goals in mind to improve this resource were to: Integrate with external platforms for content access Add Interoperability with discovery platforms through OAI-PMH Adopt a metadata standard Try to mitigate link rot and “content shift” for records with URIs Promote discovery of content using indexing techniques Creating unique pages for each record Transform the database from a library catalog to a global resource for those without physical access to the collection. To give a sense of the history of this database, here is a scan from an original typewritten record from 1980. The styles of citations varied over time. 6
  • 7. Early on, I initially extended legacy database with an external link resolver system I created links that displayed next to records which forwarded to OpenURL or WorldCat This approach improved usability by linking to full-text searches but did not increase traffic to the resource, which stayed at about 120,000 searches per year 7
  • 8. • Last year I began the process to switch to DSpace hosted by our institutional repository: • I exported data in XML format using native mode interface and then imported data dumps into MySQL • To improve the dataset so I would be comfortable adding it to our IR I cleaned inconsistent metadata practices using PHP scripts, SQL queries, and in some cases, hand corrections for edge cases • Then I added automatic indexing of title & abstract fields using our controlled vocabulary • And finally added the OpenURL, DOI and mementoweb links, which I will cover in more detail next 8
  • 9. • Normalizing metadata is a bit like folding origami... You want to use the original object but transform it into a new shape by applying knowledge and techniques, not necessarily adding or subtracting anything that wasn’t there intrinsically (other than an object identifier, later) • To do this, some of the techniques I used were to: • Split journal citations into journal, volume, issue, date and page ranges • Split author comma lists apart into their own fields • Make corrections for “et al” and “eds” so they don’t display as authors • Correct typos: double periods or invalid date ranges • Convert language codes to the 2-character ISO format • Apply a consistent title case function, remove all caps (sometime articles have strange use of capitalization that is actually correct… if the entire citation is all caps convert, but if only some words are all caps leave them as it could be an acronym or intentional emphasis) • Assign only one format type (i.e. monograph, article, etc) • However, keep original “Bibliographic Citation” field intact from the original record 9
  • 10. • We have a controlled vocabulary of about 1500 keywords so I added them to our records. • Simply scripted pattern matching against title and abstract fields • This replaced existing subject keywords and MeSH in metadata that were prone to typos 10
  • 11. Next, to add information to help with access to full-text I used the crossref API to locate digital object identifiers. Some of the legacy records had a DOI listed but they were susceptible to typos. Using this technique I found 76,000 identifiers, and since the migration in January another 1,500 more newly minted DOIs were found. For the resolver we are using the dx.doi.org links, but it is possible to use other this identifier with other resolvers although there are not many alternatives available. 11
  • 12. Here is an example DOI monograph resolving from the ShortDOI link to the publisher page. It’s interesting to note that on this publisher page they also include an Open URL link and a WorldCat title search link. 12
  • 13. Here’s an overview of the # of DOI published per year for the records in EthxWeb. DOI was introduced in 2000 but publishers are retroactively assigning DOI to previously published content. The overall volume reflects the funding for the work of hand- indexing materials supported through 2000. However there is a trend that 30% of works published after 2002 have a Digital Object Identifier found, while previous years have closer to 20%. 13
  • 14. The approach to using OpenURL is formatting links using embedded metadata as key- value pairs. This approach works great for articles and uses link resolvers to pair citation metadata with access through a list of licensed and open access resources managed by local institutions. OpenURL also supports books and journals, although the handling of that material type is not as fluid as catalog systems. For OpenURL we linked to the OCLC OpenURL gateway which works with more than 1,000 registered link resolvers worldwide. Here is an example link: http://worldcatlibraries.org/registry/gateway?version=1.0&url_ver=Z39.88- 2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&atitle=Incomplete+financial+disclosure s+in+a+letter+on+reducing+opioid+abuse+and+diversion.&title=JAMA+:+the+journal+o f+the+American+Medical+Association+&volume=306&issue=13&date=2011- 10&au=Fine,+Perry+G It is also possible to simply pass a DOI, as follows: http://worldcatlibraries.org/registry/gateway?version=1.0&url_ver=Z39.88- 2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&id=doi:10.1001/jama.2011.1397 14
  • 15. Here is an example of OpenURL resolving with framed local resolver showing citation and links to content. Resolvers usually list a “request through interlibrary loan” link when full-text access is not available, passing forward the metadata to ILL systems. 15
  • 16. As an example, here is the same book using the OpenURL provided by Cambridge University Press and directly with the Georgetown University resolver. They passed the type, title, and ISBN – but due to how the resolver is focused on electronic full text no actions to actually get a copy of the book were found. 16
  • 17. • This is why for book records we use a wordcat search rather than OpenURL link. WorldCat provides better integration with libray catalogs to provide a useful next step to scholars. 17
  • 18. One more enrichment was using Mementoweb prefixes to replace any direct links in the citations.This integrates with web archiving systems such as the wayback machine, avoiding dead links It also helps to mitigate “content drift” by assigning a date for the web snapshot It couldn’t be easier to implement! A simple URL structure includes a prefix, date, and the URI is all you need. 18
  • 19. Here is the summary of links added to citations for major content types in our database. OpenURL links were created for journal and newspaper articles and books were given worldcat searches. Adding worldcat searches for other formats is a possibility in the future as well. DOI works well but identifiers were not found for about 75% of the citations so OpenURL, worldcat searches, and direct links through mementoweb are still useful to direct people to next steps to access materials. 19
  • 20. As publishers are minting DOI for their back catalog I am also regularly checking citations without a DOI to see if an identifier has been added. With the Dspace metadata update tool and the admin interface created by Terry Brady this is really easy to do. 20
  • 21. Quality assessment was done before migrating the dataset to Dspace. Both the DOIs and OpenURLs were checked to be accurate and any transformations to the metadata were verified to have been done correctly. We used a test instance of dspace to check before loading into production. 21
  • 22. • To host the updated records we created a new collection in our IR, DigitalGeorgetown • Why did we use DSpace? Integration with GU platforms / tools were the primary drivers • Integration with existing infrastructure helped for discovery as well as reduced additional effort to maintain another system. • The suite of admin tools also makes maintaining the data very easy. 22
  • 24. • Here are screenshots of important sections to show how DSpace accommodates citations • Offsite links are renamed for patrons as “Find in a Library” (OpenURL / WorldCat) and “Full Text from Publisher” (DOI) • Dspace features such as subject browsing and related items show on page • Can later add digital objects to the record if we are able to get rights to digitize materials 24
  • 25. Here is a look at the full metadata record, included different DublinCore fields to house all the information. The added links are in `dc.identifier.uri` and the original citation is in `dc.identifier.bibliographicCitation` 25
  • 26. • We completed the ingest into DigitalGeorgetown this January. • Since then the new collection has had 232,000 visits from February to March • This is more visits in first 4 months than 2013 and 2014 combined to the previous legacy database in Livelink • This is now the most accessed collection on DigitalGeorgetown • We’ve received postive feedback on bonus Dspace features such as “related items” • And as a bonus we turned off the legacy SunOS server! 26
  • 27. Go for it– invest time and effort into carrying forward unique datasets into interoperable infrastructures, using a synergy of features to create successful outcomes Provide “next steps” for users to access content however is appropriate and global, including more than one wayOpen citations as part of a research index can be valuable even without content Linking externally to DOI, OpenURL, and Memento provides value Discovery of content is really help to scholars and can save them time. I think citations will become more common in repositories in the future as platforms begin to support importing citations from ORCID to get a list of all publications by authors. 27
  • 28. • Here is a list of references if anyone would like to see more information about this project. Also, I would like to thank Terry Brady and the rest of the Georgetown Library team for their help with this project! 28
  • 29. Thank you! The Bioethics Research Library is collaborating with Georgetown’s University Library to digitize, preserve and extend the history of bioethics. 29