How dinosaurs broke our system: challenges in building national researcher identifier services

How dinosaurs broke our system

Challenges in building national researcher
identifier services

Amanda Hill
Names Project

JISC Conference, 2010

Hoping that…

 …Simeon has explained all about the name
authority problem

 I‟d like to talk about some of the work
that we‟ve done as part of the Names
Project recently…

 …and how that fits into today‟s researcher
identification landscape

Gross generalisation about past
approaches to author identifiers
Libraries Publishers

Book-level data Article-level data

Labour intensive: Automatically generated:
disambiguation first disambiguation later

Authors not involved Authors can edit

Open Proprietary

Current international activity
ISNI ORCID

Library-instigated Publisher-instigated

Disambiguation first Disambiguation later

Authors not involved Authors can submit/edit

Broad scope Current researchers

Signs of convergence?

 Knowledge Exchange meeting on Digital
Author Identifiers in March 2012
encouraged alignment of ISNI and ORCID
approaches

 ISNI has reserved a block of identifiers for
use by ORCID


Sources of information

 Both ORCID and ISNI will use existing pools
of information to populate their systems

 ISNI: “Leveraging high confidence data from
different domains”

 “ORCID will link to other name identifier
systems”

National author ID systems

 2011: JISC-funded survey and report on
national author/researcher identifier
systems around the world

 Report published November 2011
http://ie-repository.jisc.ac.uk/567/

Maturity of systems (late 2011)
System In development since Number of identities

Lattes (Brazil) 1999 1,600,000

31,000 researchers at 160
Frida/Cristin (Norway) 2003
institutions
24,400 faculty with profiles
VIVO 2003 150,000 total IDs including
undisambiguated co-authors
40,000 in the NTA
Digital Author Identifier 2005 (1980s for National Thesaurus
15,000 researchers with Digital
(Netherlands) of Author Names)
Author IDs
Names Project (UK) 2007 46,000
New Zealand Electronic Text
2007 2,000
Centre
Trove People and
Organisations/NLA Party 2007 900,000 people and organisations
Infrastructure (Australia)
AuthorClaim 2008 200

Researcher Name Resolver
2008 190,000
(Japan)

Populating identifier systems
System Records created by Records imported from Records generated by
cataloguers other systems data subjects

AuthorClaim

Digital Author Identifier
(Netherlands)

Frida/Cristin (Norway)

Lattes (Brazil)

Names Project (UK)
New Zealand Electronic Text
Centre

Researcher Name Resolver
(Japan)

Trove People and
Organisations/NLA Party
Infrastructure (Australia)

VIVO

Good sources of data for some
nations

National system Existing unique identifiers
Researcher identifiers from national
Japan
researcher databases
Number from National Thesaurus of
Netherlands Author names is converted into
Digital Author Identifier
Human resources data: social security
Norway
numbers

Other national systems assign new
identifiers as new identities are
established.

Features of mature national
identifier systems

 With more mature systems:
 A national organisation generally has oversight: e.g. in
Brazil, Norway, Netherlands

 Integration with research funders, reporting agencies
and institutional repositories

 Individual institutions also have defined roles
relating to managing information about their own
staff

SITUATION IN UK


Work to investigate unique IDs
for UK researchers
 Identified in 2006 as part of the call for
proposals for the JISC-funded Repositories
and Preservation Programme

 Mimas and the British Library proposed a two-
year project to:
 Investigate requirements for a UK name authority
service
 Build a pilot system to demonstrate potential

The Names Project
The Chang Project
„From the Annals of the Onomastic
Society‟

Ian Watson (1990)

Names (not an acronym…)

 Name Authorities Make Everything Simpler

 Names: Ambiguous, Meaningful (or
Meaningless?), Essential, Symbolic

 …nearly everyone has a name-related
story

Rhyming couples


Original plan
 Use data from British Library‟s Zetoc service to
create author IDs
 Journal article information from 1993->
 Last names, initials, paper titles, subject
classifications

 But…
 International in scope
 Lack of information on affiliations and first names to
help with making matches
 Huge dataset -> processing issues

Revised plan
 Used 2008 Research Assessment Exercise
data (as cleaned up by JISC Merit project)
to pre-populate the Names system
 Identify unique individuals and assign
identifiers
 Data quality good, included institutional
information: high accuracy, despite only
having initials, not full first names
 Except for…

Building on Merit…

 Merit data covers around 20% of active UK
researchers

 Working to enhance records and create
new ones with information from other
sources
 Institutional repositories
 British Library data sets (Zetoc)
 Direct input from researchers

Submission form


http://separatedbyacommonlanguage.blogspot.com/2009/08/initials-and-names.html

Quality matters
 Automatic matching can only achieve so
much
 Dependent on data source

 British Library team perform manual check of
results of matching new data sources
 Allows for separation/merging of records

 Plan to allow people to update their own
information

Ultimate aim
 High-quality set of unique identifiers for UK
researchers and research institutions

 Available to other systems (national and
international)
 e.g. Names records exported to ISNI in 2011

 Possible additional services
 Disambiguation of existing data sets
 Identification of external researchers

Access to Names

 API allows for flexible searching of Names
data

 EPrints plugin released in 2011: allows
repository users to choose from a list of
Names identities
 …and to create a Names record if none exists


Next steps…

 JISC-convened Researcher ID group – final
meeting in September > recommendations

 Options Appraisal Report for UK national
researcher identifier service > December

 Improving data and adding new records


Summing up

 Names is a hybrid of library/publisher
approaches
 Automated matching/disambiguation
 Human quality checks
 Data immediately available for re-use in other
systems
 Researchers can supply information

An evolving area

 Main challenges are cultural and political
rather than technical

 National author/researcher ID services can be
important parts of research infrastructure

 Getting agreement and co-ordination at
national level is vital

Project updates

 Names: http://names.mimas.ac.uk

 Blog: http://namesproject.wordpress.com

 Twitter: @NamesProject


How dinosaurs broke our system: challenges in building national researcher identifier services

Recommended

Recommended

More Related Content

Similar to How dinosaurs broke our system: challenges in building national researcher identifier services

Similar to How dinosaurs broke our system: challenges in building national researcher identifier services (20)

More from Amanda Hill

More from Amanda Hill (20)

Recently uploaded

Recently uploaded (20)

How dinosaurs broke our system: challenges in building national researcher identifier services

Editor's Notes