Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
Discovering Scholarly Orphans
Using ORCID
Martin Klein
@mart1nkle1n
http://orcid.org/0000-0003-0130-2097
Herbert Van de Sompel
@hvdsomp
http://orcid.org/0000-0002-0715-6126
Research Library
Los Alamos National Laboratory
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
2
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
3
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
4
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
5
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
6
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
7
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
8
Novel Archival Paradigm
• Current paradigm:
• Owner of scholarly record submits finalized and atomic record to
custodian, takes care of long-term preservation
• E.g., Publisher uploads journals to Portico, author uploads paper
into institutional repository
• Fails, even for traditional journal articles
• Significant number of journal articles do not make it into archives
• IRs are under-utilized
• Does not account for web-based scholarship, living things with
versions, web resources related to paper
 Argument for a novel paradigm to capture web-based scholarly
resources
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
9
Capture Flow
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
10
Capture Flow
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
11
Algorithmic Discovery of Web Identities
James Powell et al. (2014) EgoSystem: Where are our alumni?
In: code4lib http://journal.code4lib.org/articles/9519
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
12
Capture Flow
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
13
Discovery of Web Identities via a Registry: ORCID
Ian Milligan
http://orcid.org/0000-0002-1470-7723
Mark Matienzo
http://orcid.org/0000-0003-3270-1306
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
14
Mark Matienzo’s ORCID
• Web Identities: 3
(homepage, ScopusID,
ResearcherID)
http://orcid.org/0000-0003-3270-1306
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
15
Mark Matienzo’s Home Page
• URI to GitHub
repository, Twitter
• Could be included in
ORCID profile
http://matienzo.org/
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
16
Ian Milligan’s ORCID
• Web Identities: 0
http://orcid.org/0000-0002-1470-7723
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
17
• Evaluation of ORCID for automatic discovery of Web Identities
• How well does ORCID represent the global community of active
researchers?
• Adoption rate
• Subject coverage
• Geo-location coverage
• How well does ORCID score when it comes to listing Web Identities?
Discovery of Web Identities via a Registry: ORCID
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
18
ORCID data
Discovery of Web Identities via a Registry: ORCID
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
19
• Extract from ORCID records
• First name
• Last name
• Affiliations
• Works (publications, datasets, etc)
• Web identities
ORCID - Adoption Rate
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
20
ORCID - Adoption Rate
2013 2014 2015 2016
05000001000000150000020000002500000
ORCIDs total
ORCIDs with given names
ORCIDs with first names
ORCIDs with works
ORCIDs with affiliations
ORCIDs with web identities
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
21
• Extract DOIs from works
• Match DOIs against CrossRef’s Metadata API
• Obtain subject terms
• Match against descriptive terms from “Classification of Instructional
Programs” (CIP) published by the Institute of Education Sciences
ORCID - Subject Coverage
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
22
ORCID - Subject Coverage
2013
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
23
ORCID - Subject Coverage
Changes from 2013 to 2014
Ranks gained:
• Social Science
• Education
• History
Ranks lost:
• Computer Science
• Legal professions
• Journalism
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
24
ORCID - Subject Coverage
Changes from 2014 to 2015
Ranks gained:
• Social Science
• Education
Ranks lost:
• Natural Resources and
Conservation
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
25
ORCID - Subject Coverage
Changes from 2015 to 2016
Ranks gained:
• Natural Resources and
Conservation
Ranks lost:
• Multi/Interdisciplinary Studies
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
26
Comparison of ORCID subjects with:
1. Distribution of researchers’ disciplines
• Proxy: Ph.D. recipients from U.S. universities
• Obtained from NSF, 2015 data
2. Distribution of publications’ disciplines
• Obtained from UNESCO Science Report
• U.S. data from 2014
Both report disciplines aligned with CIP terms, hence they are
easily comparable.
ORCID - Subject Coverage
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
27
ORCID - Subject Coverage
0
10
20
30
40
50
60
Other
Life Sciences
Physical
Sciences
Mathematics and
Computer Sciences
Education
Psychology and
Social Sciences
Engineering
Humanities and Arts
●
●
●
● ●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
● ●
●
●
●
●
●
●
ORCID Subjects
Ph.D. Researchers
Publications
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
28
• Extract affiliations from ORCID records
• Aggregate country code for associated locations
• Only available in ORCID data since 2015
• Compare against UNESCO data of researcher distribution
ORCID – Geo-Location Coverage
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
29
ORCID - Geo-Location Coverage
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
30
ORCID - Geo-Location Coverage
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
31
ORCID - Geo-Location Coverage
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
32
• Analyze distribution of link “Labels”
• Field lacks controlled vocabulary
ORCID – Web Identities
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
33
ORCID - Web Identities
Top 20 labels 2016
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
34
ORCID - Web Identities
Top 20 labels 2016
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
35
Capture Flow
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
36
Ian Milligan’s ORCID
• Artifacts?
http://orcid.org/0000-0002-1470-7723
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
37
• Analyze distribution of types of “Work” e.g.,
• “journal article” – likely not an orphan
• “data-set” – potential orphan
ORCID - Scholarly Orphans
https://members.orcid.org/api/resources/work-types
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
38
ORCID - Work Types
Dominated by types expected not to be orphans!
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
39
Take-Aways
• ORCID Adoption rate is increasing
• Subject coverage is focused, does not cover all disciplines equally
• Geo-Location coverage is good but not quite representative
• Web Identity coverage is poor; not usable for our purpose in its
current state
• Very few scholarly orphans directly referenced
Discovering Scholarly Orphans Using ORCID
@mart1nkle1n, @hvdsomp
JCDL 2017, 06/22/2017, Toronto, CA
Discovering Scholarly Orphans
Using ORCID
Martin Klein
@mart1nkle1n
http://orcid.org/0000-0003-0130-2097
Herbert Van de Sompel
@hvdsomp
http://orcid.org/0000-0002-0715-6126
Research Library
Los Alamos National Laboratory

Discovering Scholarly Orphans Using ORCID

  • 1.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA Discovering Scholarly Orphans Using ORCID Martin Klein @mart1nkle1n http://orcid.org/0000-0003-0130-2097 Herbert Van de Sompel @hvdsomp http://orcid.org/0000-0002-0715-6126 Research Library Los Alamos National Laboratory
  • 2.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 2
  • 3.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 3
  • 4.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 4
  • 5.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 5
  • 6.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 6
  • 7.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 7
  • 8.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 8 Novel Archival Paradigm • Current paradigm: • Owner of scholarly record submits finalized and atomic record to custodian, takes care of long-term preservation • E.g., Publisher uploads journals to Portico, author uploads paper into institutional repository • Fails, even for traditional journal articles • Significant number of journal articles do not make it into archives • IRs are under-utilized • Does not account for web-based scholarship, living things with versions, web resources related to paper  Argument for a novel paradigm to capture web-based scholarly resources
  • 9.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 9 Capture Flow
  • 10.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 10 Capture Flow
  • 11.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 11 Algorithmic Discovery of Web Identities James Powell et al. (2014) EgoSystem: Where are our alumni? In: code4lib http://journal.code4lib.org/articles/9519
  • 12.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 12 Capture Flow
  • 13.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 13 Discovery of Web Identities via a Registry: ORCID Ian Milligan http://orcid.org/0000-0002-1470-7723 Mark Matienzo http://orcid.org/0000-0003-3270-1306
  • 14.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 14 Mark Matienzo’s ORCID • Web Identities: 3 (homepage, ScopusID, ResearcherID) http://orcid.org/0000-0003-3270-1306
  • 15.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 15 Mark Matienzo’s Home Page • URI to GitHub repository, Twitter • Could be included in ORCID profile http://matienzo.org/
  • 16.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 16 Ian Milligan’s ORCID • Web Identities: 0 http://orcid.org/0000-0002-1470-7723
  • 17.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 17 • Evaluation of ORCID for automatic discovery of Web Identities • How well does ORCID represent the global community of active researchers? • Adoption rate • Subject coverage • Geo-location coverage • How well does ORCID score when it comes to listing Web Identities? Discovery of Web Identities via a Registry: ORCID
  • 18.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 18 ORCID data Discovery of Web Identities via a Registry: ORCID
  • 19.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 19 • Extract from ORCID records • First name • Last name • Affiliations • Works (publications, datasets, etc) • Web identities ORCID - Adoption Rate
  • 20.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 20 ORCID - Adoption Rate 2013 2014 2015 2016 05000001000000150000020000002500000 ORCIDs total ORCIDs with given names ORCIDs with first names ORCIDs with works ORCIDs with affiliations ORCIDs with web identities
  • 21.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 21 • Extract DOIs from works • Match DOIs against CrossRef’s Metadata API • Obtain subject terms • Match against descriptive terms from “Classification of Instructional Programs” (CIP) published by the Institute of Education Sciences ORCID - Subject Coverage
  • 22.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 22 ORCID - Subject Coverage 2013
  • 23.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 23 ORCID - Subject Coverage Changes from 2013 to 2014 Ranks gained: • Social Science • Education • History Ranks lost: • Computer Science • Legal professions • Journalism
  • 24.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 24 ORCID - Subject Coverage Changes from 2014 to 2015 Ranks gained: • Social Science • Education Ranks lost: • Natural Resources and Conservation
  • 25.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 25 ORCID - Subject Coverage Changes from 2015 to 2016 Ranks gained: • Natural Resources and Conservation Ranks lost: • Multi/Interdisciplinary Studies
  • 26.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 26 Comparison of ORCID subjects with: 1. Distribution of researchers’ disciplines • Proxy: Ph.D. recipients from U.S. universities • Obtained from NSF, 2015 data 2. Distribution of publications’ disciplines • Obtained from UNESCO Science Report • U.S. data from 2014 Both report disciplines aligned with CIP terms, hence they are easily comparable. ORCID - Subject Coverage
  • 27.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 27 ORCID - Subject Coverage 0 10 20 30 40 50 60 Other Life Sciences Physical Sciences Mathematics and Computer Sciences Education Psychology and Social Sciences Engineering Humanities and Arts ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ORCID Subjects Ph.D. Researchers Publications
  • 28.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 28 • Extract affiliations from ORCID records • Aggregate country code for associated locations • Only available in ORCID data since 2015 • Compare against UNESCO data of researcher distribution ORCID – Geo-Location Coverage
  • 29.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 29 ORCID - Geo-Location Coverage
  • 30.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 30 ORCID - Geo-Location Coverage
  • 31.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 31 ORCID - Geo-Location Coverage
  • 32.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 32 • Analyze distribution of link “Labels” • Field lacks controlled vocabulary ORCID – Web Identities
  • 33.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 33 ORCID - Web Identities Top 20 labels 2016
  • 34.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 34 ORCID - Web Identities Top 20 labels 2016
  • 35.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 35 Capture Flow
  • 36.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 36 Ian Milligan’s ORCID • Artifacts? http://orcid.org/0000-0002-1470-7723
  • 37.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 37 • Analyze distribution of types of “Work” e.g., • “journal article” – likely not an orphan • “data-set” – potential orphan ORCID - Scholarly Orphans https://members.orcid.org/api/resources/work-types
  • 38.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 38 ORCID - Work Types Dominated by types expected not to be orphans!
  • 39.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA 39 Take-Aways • ORCID Adoption rate is increasing • Subject coverage is focused, does not cover all disciplines equally • Geo-Location coverage is good but not quite representative • Web Identity coverage is poor; not usable for our purpose in its current state • Very few scholarly orphans directly referenced
  • 40.
    Discovering Scholarly OrphansUsing ORCID @mart1nkle1n, @hvdsomp JCDL 2017, 06/22/2017, Toronto, CA Discovering Scholarly Orphans Using ORCID Martin Klein @mart1nkle1n http://orcid.org/0000-0003-0130-2097 Herbert Van de Sompel @hvdsomp http://orcid.org/0000-0002-0715-6126 Research Library Los Alamos National Laboratory