4. Why is healthy data important?
Good quality, healthy data can be
utilized to gain insight into customers,
business relationships and to support
strategic planning, decision making,
and ongoing business operations.
But when it’s unhealthy….
5. Poor data has real consequences
Hard to get a true picture of relationships with institutions
Lack of quality author (and affiliation) data
Inability to see overlap between authors, members and
customers
Inaccurate holdings and revenue reports
Protracted time and effort taken to analyse data
Everything becomes more difficult, and less accurate
6. Healthy records are:
Complete
Accurate
Free of duplicates
Current
Consistent
Conform to standards
7. Unique Identifiers
What are they? How can they help?
Numeric or alpha-numeric designations which are associated with
a single entity
Entities can be an institution, person, or piece of content
Enable the disambiguation of each entity
Proper understanding of the customer, author, reader or
institution
Proper identification of content object, article, product, or
package
Can be used internally or in conjunction with external partners
8.
9. Why we should worry about data now?
Number of researchers increasing by 3% per annum*
Number of articles increasing by 3% per annum, current
output is 1.8-1.9 million per year*
Number of journals increasing by 3.5% per annum*
Growth in China has been in double digits for over 15 years*
Increased demand for anytime/anywhere access
Library budgets are frozen or being cut, less money for more
content means we have to work smarter
* Ware, M and Mabe, M, The STM Report, 2012
10.
11. What are Institutional Identifiers for?
Disambiguating:
UCL:
University College London (UK)
Université Catholique de Louvain
(Belgium)
Universidad Cristiana
Latinoamericana (Ecuador)
University College Lillebælt
(Denmark)
Centro Universitario Celso Lisboa
(Brazil)
Union County Library (USA)
NPL:
National Physical Laboratory (UK)
National Physical Laboratory
(India)
York University
University of York (UK)
York University (Canada)
Northeastern University:
Northeastern University
(Boston, USA)
Northeastern University
(Shenyang, China)
12. What are Institutional Identifiers for?
Consolidating:
Hierarchy View:
University of Oxford
University of Northampton
Univ. Oxford
Northampton Business School
Oxford University
School of Education
Library, Oxford Univ.
Radcliffe Science Library
School of Health
School of Science and Technology
Bodleian Library
Bodleian, Oxford
Oxford, University of
Division of Computing
Division of Engineering
Environmental & Geographical Sciences
Institute for Creative Leather
Technologies
School of Social Sciences
School of The Arts
13. Use cases – the why
Identifiers enforce uniqueness
Disambiguate institutional records
Eradicate duplication of data
Ensure correct delivery, entitlements and access rights
Better understand your customer base and relationships with
institutions
Improve “trust” in data
Map institutions into their hierarchy
14.
15. Common data problems
Most publishers have problems with data:
Multiple accounts for each customer
Multiple internal IT systems for different purposes
Data entry without standard names or ID numbers
Lack of hierarchy information
No formal manner to track customers across systems
16. The challenge: Data Sources
Multiple data sources – ‘system’ data silos
Multiple locations – ‘geographic’ data silos
Data entered by different people for different purposes
Data from third parties in the supply chain
Data from bought-in sources
17. The challenge: Data Sources
Typical publisher systems:
Data can be entered by:
Financial system
Organisation staff
CRM/Sales database
Authors
Authentication system
Society members
Fulfilment
Agents in the supply chain
Usage statistics
Submissions system
Author database
Document Storage (contracts and
licences)
…..
3rd party organisations
…..
18.
19. Implementing a data governance plan
Important considerations:
What data is held, where it is held and how it is accessed?
How can the data be used to further benefit different
departments, processes or activities?
Could the use of current or planned systems be expanded for
further benefit?
Is data highly accurate and consolidated or in need of cleansing?
Are there applications of data that have not been explored?
What requirements are there for additional data?
20. Improve data capture
If you can – use web forms
Implement required fields
Data validation – at a minimum use naming conventions
Address validation – postcode lookup
Institution validation – institution lookup
Web form consistency across systems
Avoid free-text fields
Make institutional identifiers a requirement
23. Data integration
CRM
Using Institutional Identifiers
to link internal systems:
Electronic
document
storage
Financial
System
Prevent duplicate account
creation
Break down silos
Keep data up-to-date and
systems synchronised
Enable staff to use data more
effectively
Simplify data transmission
Improve overall data quality
Authentication
Institutional
Identifiers
Membership
system
Usage
statistics
Author
Database
Fulfilment
system
24. Linking author and institution IDs
When authors and their affiliations are linked
correctly, publishers gain:
Market intelligence about authors and institutions
Author and subscriber information mapped together
Knowledge of where research funding is concentrated
Reduction in time taken calculating open access charges (APCs)
Institutions gain information about their overall research
output
Funders gain information about where authors reside and
publish
25.
26. The scholarly supply chain
Purpose:
Serving the author and reader
Disseminate content as widely as possible
Ensure content is easily discoverable
Provide information in an efficient and trouble-free manner
regardless of:
Content type
User requirements
Desired methods of access
27. The supply chain (simple version)
Author
Funders
Submission
and Peer
Review
System
End User
Discovery
Service
Consortium
Consortium
Data
Providers and
Systems
(multiple)
Publisher
Online Host
or
Technology
Partner
Library
Fulfilment
House or
System
Subscription
Agent or
Sales Agent
Societies
29. What could possibly go wrong?
Records are unconnected through the supply chain, links fail:
Between entities
Between internal systems
Between external systems
Renewals are mishandled
Journal transfers are mishandled
Access and authentication is mishandled
Authors and individuals are not linked to their institution
Open access fees have to be checked manually
Authors are not linked to their research
Funders are not linked to the research they fund
30. Where stronger links are needed
Finding a path to using standardized data, which:
Eradicates duplicate records within and between systems
Enables seamless communication between organizations
Smoothes the supply chain, removing ambiguity or lack of
information for any party
Enables higher quality of service
Increases understanding of customer base and enables better
decision making for everyone involved
33. The vision
In an ideal world we would be able to utilise, provide and
obtain data that is accurate, complete and easily joined
together:
Reducing problems and errors
Providing better overall service
Creating seamless processes
Providing a better understanding of customers and our own
businesses
34. External linking – in the supply chain
Using Identifiers will:
Ensure accuracy of information
Speed up data transactions
Reduce queries
Reduce costs
Open data up to new uses
Ensures that authors receive credit for the work they produce
Ensures that end users receive uninterrupted access to the
content they need
37. Institutional Identifiers – which ones?
JISC and CASRAI (Consortia Advancing Standards in Research
Administration Information) report on Organisation IDs:
http://repository.jisc.ac.uk/5381/1/CC549D0011.0_org_ID_landscape_study.pdf
Examined the landscape of organisational identifiers in the UK
and identified 23 different IDs
Based on interviews with key individuals
Lots of detail on use cases for publishing, funders, and
institutions
38. CASRAI report
Disambiguating organisational information from multiple
sources typically described as “a nightmare”
Benefits from effective unique identifiers are truly realised
when data is shared
Key aspects of identifiers that support the widest range of
uses:
Governance
Trust
Transparency
Temporal
Appropriate metadata
42. ISNI
ISNI Number
ISNI Number
Party ID 1
Party ID 2
Proprietary
Information and/or
Metadata
Proprietary
Information and/or
Metadata
ISO Standard 27729
ISNI is designed to be a
“bridge identifier”
Covers any type of entity
43. ISNI IDs
Ringgold is an ISNI Registration Agency for institutions
Unique ISNI Institutional ID number can connect any data
and any systems
ISNI IDs should be used by publishers and across the
scholarly supply chain to:
Link systems using the ID numbers
Link data sets which contain proprietary metadata
Provide clean data transmission
44. ISNI spans all industries, market segments, and regions
Academia
Medical
Corporate
Government
Not-for-profit
Public libraries
Schools
Publishers
Funding bodies
Intermediaries
Distributors
http://isni.org/
45.
46. What can YOU do now?
Engage with the problems you have with data
Find some resources – think about time not just money
Consider how data could better serve your organisation
Appoint a data champion and document everything
Generate a data governance policy
Create some basic rules for data entry
Utilise universal identifiers to clean and link your data
Work with suppliers and customers to utilise institutional
identifiers to strengthen the supply chain