Overview of issues and tools to ensure long-term access to scholarly content. Presented at II Seminário sobre Informação na Internet in Brasilia, 3 - 6 August 2015.
Ensuring Continuity of Access To Our Published Heritage
1. Ensuring Continuity of Access To Our Published Heritage
http://www.flickr.com/photos/shinez/5000985919/
Preserving Streams of
Issued Content
2. 1. Scottish Education Data Archive, 1979 - mid ‘80s
– Survey statistician: school leavers, YTS & 16-19 cohort surveys
• In Centre for Educational Sociology
2. Edinburgh University Data Library,1984 & on
– Manager: set-up and development
– President of IASSIST, 2000 – 2004 : social science data professionals
3. Graduate School, Faculty of Social Science, 1987 – 1997
– Senior Lecturer, teaching quantitative/survey methods
• In Research Centre for Social Sciences
4. ESRC Regional Research Laboratory for Scotland, 1986/90
– Co-director: early days of Geographical Information Systems (GIS)
• With University’s Department of Geography
5. EDINA, 1995/6 to present - main focus as day job
– Director: set-up and continuous development
– Jisc-designated centre for service delivery & digital expertise
6. Digital Curation Centre, 2004/05
– Director for set-up & definition of ‘data curation + digital preservation’
• With University’s School of Informatics
‘Data person’: now with focus on scholarly record
3. 3-Part Talk
1. What EDINA Does
– University of Edinburgh & Jisc
– Celebrating 20 years of online services
2. Adapting to Digital Realities
– Challenges to the integrity & continuity of
the scholarly and cultural record
3. What Needs to Happen: what you can do!
– Leadership for national actions within the
international context of scholarship
4. Part 1: What EDINA Does
Helps fulfil the mission of the University of Edinburgh
• EDINA & Data Library Division, in Information Services Group
– As Director, I report to: Chief Information Officer & Librarian to University
** My duty to say something about Jisc and the University **
1983: Early beginnings as Data Library for University
joined IASSISTdata.org
1995, University won open competition as UK datacentre
25 January 2016 marks the 20th Anniversary of launch as EDINA
with a Burns Night Supper
develops & delivers online services for research
& education in the UK, and beyond
• 85+ staff (inc. librarians, GIS specialists & 35 software engineers)
• About 2/3rds of funding for EDINA as a Jisc-designated UK centre
for digital expertise and service delivery
5. Once JISC, part of the UK Funding Councils for Higher Education,
Jisc is now a share services organisation, a charity owned by
Universities UK, the Association of Colleges and Guild HE
Jisc.ac.uk
6. Ranked 17th= in the World by QS World University
Rankings 2014-15
Over 1,000 international research collaborations
Our scientists:
created Dolly the Sheep
developed the first genetically engineered
hepatitis B vaccine
pioneered the first automated industrial
assembly robot
devised technology used in today's
smartphones
discovered how particles acquire their mass
(Higgs boson)
Associated with 16 Nobel Prize winners - in areas
such as Physics, Medicine, Economics
The University: History & Prestige
The Edinburgh
Experience
World’s first UNESCO
City of Literature
World Heritage Site
Founded in 1583,
as the first ‘civic’ university,
in UK, and perhaps in the world
The Library was
started in 1580
it is older than
the University
ed.ac.uk
MoU's with FAPESP, University of Sao Paulo, University Federal de San Carlos, UFRGS and
UNIFESP . The University has Latin America office is in Santiago, Chile
8. Many available ‘openly’ for
international use:
• National Union Catalogue
• Keepers Registry
• Open Access Depot
• Research Data Management
• MANTRA
Others ‘restricted’ for use
only by staff & students in UK
universities & colleges:
• Digimap geospatial data
• Access Management
Special focus here on
• Keepers Registry
• Hiberlink: Reference Rot
• SafeNet: Post-Cancelation
Access
9. Part 2: Adapting to Digital Realities
http://www.flickr.com/photos/shinez/5000985919/
Key Message:
• What is digital and online
(somewhere) is at risk of loss
10. what was once available in print,
on-shelf locally …
… is now online & accessed
remotely,
‘anytime/anywhere’
But what of Continuity
of Access?
Ease of Access is so much better
11. Digital back copy is not in the custody of libraries
Picture credit: http://somanybooksblog.com/2009/03/27/library-tour/
Libraries boast of ‘e-collections’,
but do they only have ‘e-connections’?
If not custodians, then only customers?
12. More Bad News: ‘The Digital Realities’
Risks inherent in digital media & formats:
• Lacks ‘fixity’ – it can be change from what it was!
• ‘digital decay’: format obsolescence & bit rot
And single points of failure:
• natural disasters (earthquake, fire and flood)
• human folly (criminal and political action): hacking
+ risks associated with commercial events in the
publisher/supply chain
Web Today, Gone Tomorrow!
<References to what is on the Web rots over time: hiberlink.org >
13. Language Technology Group
Funded by the Andrew W. Mellon Foundation
‘Reference Rot’
When what was referenced & cited
ceases to say the same thing, or ‘has ceased to be’
http://www.snorgtees.com/this-parrot-has-ceased-to-be
Reference Rot = Link Rot + Content Drift
“when links to web resources
no longer point to what they once did”
Breaking News: Yet Another Threat (& some Remedy)
15. Online
Continuing
Resources
ISSN
‘Our published
heritage’
‘resources needed
for scholarship’ Issued in Parts
(Serials)
Content changes over time
(Integrating)
‘e-journals’
Websites,
Databases,
Repositories
‘Book-length work’
‘Gov Docs’
Identifying what is published and issued online
as a ‘continuing resource’
Conference proceedings
‘e-magazines’
‘e-newsmedia’
16. Some Good News: digital shelving is available
① Web-scale not-for-profit archiving agencies:
② National libraries …
③ Research libraries: consortia & specialist centres …
National Science Library,
Chinese Academy of Sciences
National Science Library,
Chinese Academy of Sciences
Different models
Organisations that ingest content with archival intent …
17. Many archiving organisations a Good Thing
“Digital information is best preserved by replicating it at multiple
archives run by autonomous organizations”
B. Cooper and H. Garcia-Molina (2002)
Bad stuff will happen!
Lots Of Copies Keeps Stuff Safe (LOCKSS)
But how do we know
who is keeping what?
18. ISSN
Register
E-J Preservation Registry Service
E-Journal
Preservation
Registry
(a)
(b)
Data
dependency
ISSN-L as
kernel field
METADATA
on extant e-journals
METADATA
on preservation action
ISSN Register at the heart of the Data Model
(Taken from Figure 1 in reference paper in Serials, March 2009)
Digital Preservation
Agencies
e.g. CLOCKSS, Portico; BL, KB;
UK LOCKSS Alliance etc.
Project
thekeepers.org
19. … to discover who is looking after what
thekeepers.org
20. What’s the (scale of the) Present Danger?
The Keepers Registry reports titles ‘ingested &
archived’ by at least 1 ‘keeper’:
16,558 In 2011,
21,557 in 2013
27,463 as at May 2015
9,785 'ingested & archived' by 3 or more
More archives reporting into Registry & more archiving!
ISSN assigned for ‘e’
35,000 in 2009
100,000 in 2012
160,000 in 2015
21. Two Key Performance Indicators (KPIs)
‘Ingest Ratio’ = titles ingested by one or more Keeper
/ ‘online serials’ in ISSN Register
= 28,103 / 165,949 [as of June 2015]
=> 17%
‘KeepSafe Ratio’ = titles being ingested by 3+ Keepers
/ ‘online serials’ in ISSN Register
= 9,836 / 165,949
=> 6%
22. Bad News: Most of what libraries care about
may be missing …
Using Title List Comparison tool in Members Area of Keepers Registry
As reported in: P. Burnhill (2013) Tales from The Keepers Registry: Serial Issues About Archiving & the
Web. Serials Review 39 (1), 3–20. http://www.sciencedirect.com/science/article/pii/S0098791313000178,
&https://www.era.lib.ed.ac.uk/handle/1842/6682
In 2011/12 three major research libraries in the USA
(Columbia, Cornell & Duke)
checked archival status of serial titles regarded as important
against the Keepers Registry
‘Ingest Ratio’ = 22% to 28%, ie about a quarter
=> fate of c.75% is unknown
Evidence from the USA
23. … logs for the UK OpenURL Router*
• 8.5m full text requests in UK during 2012
=> 53,311 online titles requested
Analysis in 2013::
‘Ingest Ratio’ = 32% (16,985/53,311)
=> over two thirds 68% (36,326 titles) held by none!
Not much better for what Researchers Use
* As reported in Keepers Registry Blog, OpenURL Router passes ‘discovery’ requests to commercial OpenURL
resolver services; developed & delivered by EDINA as part of Jisc support for UK universities & colleges
Evidence from the UK
24. 3. What needs to happen
http://www.flickr.com/photos/shinez/5000985919/
Key Message:
• Act now to take responsibility
for archiving digital streams
of Issued Content
25. US: 20%Sp: 5%
Researchers (& libraries/publishers) in any one country are
dependent upon content written and published as serials in
countries other than their own
Canada 5%UK: 9%
Brazil: 5%
Ger: 5%
India: 3%
Using the ISSN to identify serial content
%age of 132,806 ISSN assigned for e-serials (December 2013)
* now 160,000 *
26. Known Archival Status of Online Continuing Resources
assigned ISSN, by Country, June 2015
Elsevier
Springer
->
27. very many ‘at risk’ e-journals from many (small &
not so small) publishers
BIG
publishers
act early but
incompletely
Priority:
find economic way to
archive content from
28. Known Archival Status of Online Continuing Resources
published in Brazil (ISSN), June 2015
->
Perhaps more is being kept safe?
& IBICT has project to archive all open access scientific e-journals,
see http://oasisbr.ibict.br
[+ Cariniana implementing a LOCKSS Network: 1023 subscribed titles]
Cariniana & IBICT are planning to tell the Keepers Registry!
29. • Upload list of ISSN & titles
• Receive back report on what is
being archived & what is not
Register now for Member Services:
http://thekeepers.org
New Service: [just launched last week]
Title List Comparison
You can use the Keepers Registry to check
the archival status of the journals that are
of key importance to you
30. Imagine Sinpred 2020
• Best Case scenario
– Publishers (& Libraries) have acted
– ‘Adapting to Digital Realities’
– Together with the Keepers they have ensured
that all the e-journal content used by
researchers this year (in 2015) has been
preserved and can be used successfully in 2020
31. Imagine Sinpred 2020
this year (in 2015)
• Worst Case scenario
– Publishers (& Libraries) failed to act sufficiently!
– ‘Not Adapting to Digital Realities’
– Important literature has been lost
– Citizens & scholars (rightly) complain of neglect
32. “the values [academic libraries] hold are of
immense importance to a world in which …
• knowledge has been transformed into intellectual property,
• the Web has been turned into a shopping platform, and
• social interaction online is used to collect and monetize our
lives […].
As the invisible infrastructure of our technological
future is taking shape, society needs library values
more than ever.”
(Fister, 2015)
33. Whose responsibility to archive content?
Do we leave it to the publishers?
What then is a library?
Should each research library act on its own?
Benefit of acting as consortia of research libraries
What is the role of national/state libraries?
More than a national, it is a trans-national challenge!
Ensuring access to digital back copy:
34. What should be done?
Accept responsibility for stewardship of collections
1. Use the Keepers Registry
2. Commit financial support for web-scale agencies,
• such as CLOCKSS: invest 1%
3. Contribute your collection development expertise
** use the Title List Comparison Tool in the Keepers Registry **
4. Tell publishers, archiving agencies & national library to commit to
archive their content with a Keeper
5. Consider options for collaborative action
• By associations of research & university libraries [Cariniana]
• With National Library of Brazil? Another ?
6. Avoid the 2020 Vision where you get the blame!
What can I do?
35. Thank You / Obrigado!
edina@ed.ac.uk
‘Take The Long View’
7th September 2015
Edinburgh
thekeepers.org