SlideShare a Scribd company logo
Seed+Expand
aggregating the scientific output of the
Netherlands, 2000-2010
Linda Reijnhoudt, Rodrigo Costas, Ed Noyons,
Katy Börner, Andrea Scharnhorst
1
linda.reijnhoudt@dans.knaw.nl, andrea.scharnhorst@dans.knaw.nl
DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), the Hague, the Netherlands
2
rcostas@cwts.leidenuniv.nl, noyons@cwts.leidenuniv.nl
Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, the Netherlands
3
katy@indiana.edu
Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana
University, Bloomington, Indiana, United States of America
to study the dynamics on the output of
Dutch professors (2001-2011)
but, lack of data on
the output of full professors!
goal
the problem
given a Dutch professor in the NARCIS system
find all his/her publications
how to connect bibliographic data from CWTS
with the NARCIS system?
CWTS
Bibliometric
publications database:
● author
● author-order
● email (sometimes)
● affiliation
(sometimes)
● journal
context
DANS
NARCIS
dutch scholars:
● name, initials
● DAI
● affiliations
● organisation
● email
=
?
non trivial I
● misspelled names
○ Van Knienberg instead Van Knippenberg
● different initials / first name
○ Johannes and Hans
● different formats in the data across sources
○ Prefixes separated in the NARCIS system
■ P.M.P. | van | Bergen en Henegouwen
○ Made initials or concatenated in WoS
■ Henegouwen, PMPVE (Henegouwen, Paul M. P. van Bergen En)
non trivial II
● multiple scholars have the same author
name (homonymy)
● the same scholar with multiple author
names (synonymy)
○ changes over time, e.g., due to marriage
the raw data
NARCIS database (DANS)
○ 8378 Dutch full professors
■ affiliation to dutch organizations
■ name, initials
■ email
■ DAI
CWTS bibiometric data system
○ close to 23 million publications in more than 12,000
journals
○ no unique author identifier for all authors
the Gold Standard
we already know the complete oeuvre of
1400 Dutch full professors, due to manually
verified publication lists by CWTS (2001-
2010)
USEFUL TO VALIDATE OUR METHODOLOGY
the 1400 of the 8376 (17%) full professors
who already appear in this list:
the Gold Standard
the sources & main overview
Seed+Expand main concept
● seed creation, precision
○ given a full professor, {initials, name, email, affiliations}
○ find one or more publications that are most likely
authored by this professor
● seed expansion, recall
○ given these 'seed' publications,
○ find publications by the same author
1. publication-based classifications
2. Scopus Author Identifier
seed creation
1. Email seed (EM)
2. Author Address approaches (*)
a. Reprint Author (RP)
b. Direct linkage author-addresses (DL)
c. Approximate linkage author addresses (AL)
3. Digital Author Identifier seed (DAI)
(*) For these seeds, very common
names have been excluded
seed expansion
1. CWTS Paper-Based Classification
(2001-2011)
○ based on citation relationships of publications
○ 672 meso, over 20K micro disciplines
○ micro: +23% unique papers over seed
○ meso: +34% unique papers over seed
2. Scopus Author Identifier (1996-2011)
○ +69% unique papers over seed
evaluation
Gold standard:2001-2010
results
● 80% of Dutch professors detected
● Micro-disciplines: highest precision (88.5)
● Scopus Author id & micro disciplines:
same recall (95.9)
● This methodology can be applied to other
sets and author identity schemes (ORCID,
VIVO, etc.)
● Further research on disciplinary
differences and improvements
general discussion
● increasing bibliographic data sources but
still lacking author disambiguated data!!
● lack of research on how to connect
databases
○ repositories
○ bibliographic databases (WoS, Scopus, etc.)
○ altmetrics
● e-mail data and DAI/ORCID-like
identifiers are powerful linking elements
across systems
the end ...
thank you very much for your attention!
questions?
comments?
five seeds
combined: 6753 of 8376 full professors found

More Related Content

Viewers also liked

Knowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use casesKnowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use cases
Andrea Scharnhorst
 
KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013
Andrea Scharnhorst
 
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Andrea Scharnhorst
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
Andrea Scharnhorst
 
If only I had a map!
If only I had a map!If only I had a map!
If only I had a map!
Andrea Scharnhorst
 
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Andrea Scharnhorst
 
How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...
Andrea Scharnhorst
 
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Andrea Scharnhorst
 

Viewers also liked (8)

Knowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use casesKnowledge maps for libraries and archives - uses and use cases
Knowledge maps for libraries and archives - uses and use cases
 
KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013KnowEscape - COST Action TD1210 at the TPDL 2013
KnowEscape - COST Action TD1210 at the TPDL 2013
 
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
Knowledge – dynamics – landscape - navigation – what have interfaces to digit...
 
Drowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research fundingDrowning in information – the need of macroscopes for research funding
Drowning in information – the need of macroscopes for research funding
 
If only I had a map!
If only I had a map!If only I had a map!
If only I had a map!
 
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...Mapping Digital Humanities projects. A pilot of a DH project registry for The...
Mapping Digital Humanities projects. A pilot of a DH project registry for The...
 
How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...How to use science maps to navigate large information spaces? What is the lin...
How to use science maps to navigate large information spaces? What is the lin...
 
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
Bibliometrics, Webometrics, Altmetrics, Alternative metrics.
 

Similar to Seed and Expand

Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Anne-Wil Harzing
 
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Anne-Wil Harzing
 
Benchmarking Research Performance
Benchmarking Research PerformanceBenchmarking Research Performance
Benchmarking Research Performance
Anne-Wil Harzing
 
Fransen From Researcher Profiling to System of Record
Fransen From Researcher Profiling to System of RecordFransen From Researcher Profiling to System of Record
Fransen From Researcher Profiling to System of Record
National Information Standards Organization (NISO)
 
TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?
Gudmundur Thorisson
 
Mapping dh through heterogeneous communicative practices
Mapping dh through heterogeneous communicative practicesMapping dh through heterogeneous communicative practices
Mapping dh through heterogeneous communicative practices
Wayne State University School of Information Sciences
 
Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401
Ricardo Valls P. Geo., M. Sc.
 
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Nadine Rons
 
Large-scale visualization of science
Large-scale visualization of scienceLarge-scale visualization of science
Large-scale visualization of science
Nees Jan van Eck
 
Search like a Pro!
Search like a Pro!Search like a Pro!
Search like a Pro!
Aaron Tay
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
Armin Haller
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profileCreate and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile
Nader Ale Ebrahim
 
2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides
Nathalie Cornée
 
British Library
British LibraryBritish Library
British Library
clarivate
 
What can we learn from academic impact
What can we learn from academic impactWhat can we learn from academic impact
What can we learn from academic impact
Anne-Wil Harzing
 
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Anne-Wil Harzing
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
Vince Smith
 
Transforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional RepositoriesTransforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional Repositories
NASIG
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile
Nader Ale Ebrahim
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
Maaike Duine
 

Similar to Seed and Expand (20)

Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
Metrics vs peer review: Why metrics can (and should?) be applied in the Socia...
 
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
Citation metrics across disciplines - Google Scholar, Scopus, and the Web of ...
 
Benchmarking Research Performance
Benchmarking Research PerformanceBenchmarking Research Performance
Benchmarking Research Performance
 
Fransen From Researcher Profiling to System of Record
Fransen From Researcher Profiling to System of RecordFransen From Researcher Profiling to System of Record
Fransen From Researcher Profiling to System of Record
 
TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?TNC2012 Federated and scholarly identity - match made in heaven?
TNC2012 Federated and scholarly identity - match made in heaven?
 
Mapping dh through heterogeneous communicative practices
Mapping dh through heterogeneous communicative practicesMapping dh through heterogeneous communicative practices
Mapping dh through heterogeneous communicative practices
 
Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401Library connect-webinar---february-2020---slides 560401
Library connect-webinar---february-2020---slides 560401
 
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
Testing Reviewer Suggestions Derived from Bibliometric Specialty Approximatio...
 
Large-scale visualization of science
Large-scale visualization of scienceLarge-scale visualization of science
Large-scale visualization of science
 
Search like a Pro!
Search like a Pro!Search like a Pro!
Search like a Pro!
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profileCreate and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile
 
2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides2014 09-04-foster-metricsworkshopslides
2014 09-04-foster-metricsworkshopslides
 
British Library
British LibraryBritish Library
British Library
 
What can we learn from academic impact
What can we learn from academic impactWhat can we learn from academic impact
What can we learn from academic impact
 
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
Citation metrics versus peer review: Google Scholar, Scopus and the Web of Sc...
 
Scratchpad 2014-introduction
Scratchpad 2014-introductionScratchpad 2014-introduction
Scratchpad 2014-introduction
 
Transforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional RepositoriesTransforming the Quality of Metadata in Institutional Repositories
Transforming the Quality of Metadata in Institutional Repositories
 
Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile Create and maintain an up-to-date ResearcherID profile
Create and maintain an up-to-date ResearcherID profile
 
THOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOSTHOR Workshop - Data Publishing PLOS
THOR Workshop - Data Publishing PLOS
 

More from Andrea Scharnhorst

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Andrea Scharnhorst
 
The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...
Andrea Scharnhorst
 
Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...
Andrea Scharnhorst
 
Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)
Andrea Scharnhorst
 
Dilemmata of research infrastructures
Dilemmata of research infrastructuresDilemmata of research infrastructures
Dilemmata of research infrastructures
Andrea Scharnhorst
 
DARIAH Contributions 2019
DARIAH Contributions 2019DARIAH Contributions 2019
DARIAH Contributions 2019
Andrea Scharnhorst
 
Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
Andrea Scharnhorst
 
SUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINESSUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINES
Andrea Scharnhorst
 
Information science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital ArchiveInformation science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital Archive
Andrea Scharnhorst
 
Why do we need to model the science system?
Why do we need to model the science system?Why do we need to model the science system?
Why do we need to model the science system?
Andrea Scharnhorst
 
Humanities and ICT
Humanities and ICTHumanities and ICT
Humanities and ICT
Andrea Scharnhorst
 
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Andrea Scharnhorst
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...
Andrea Scharnhorst
 
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Andrea Scharnhorst
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studies
Andrea Scharnhorst
 
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
Andrea Scharnhorst
 
Digital Humanities as a Virtual Community
Digital Humanities as a Virtual CommunityDigital Humanities as a Virtual Community
Digital Humanities as a Virtual Community
Andrea Scharnhorst
 
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Andrea Scharnhorst
 
Future of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge MapsFuture of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge Maps
Andrea Scharnhorst
 
KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps
Andrea Scharnhorst
 

More from Andrea Scharnhorst (20)

Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
Flexibility in Metadata Schemes and Standardisation: the Case of CMDI and the...
 
The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...The Polifonia portal: a confluence of user stories, research pilots, data man...
The Polifonia portal: a confluence of user stories, research pilots, data man...
 
Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...Floating classifications - Knowledge Organization Systems in past, present an...
Floating classifications - Knowledge Organization Systems in past, present an...
 
Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)Digging into the Knowledge Graph (2017-2020)
Digging into the Knowledge Graph (2017-2020)
 
Dilemmata of research infrastructures
Dilemmata of research infrastructuresDilemmata of research infrastructures
Dilemmata of research infrastructures
 
DARIAH Contributions 2019
DARIAH Contributions 2019DARIAH Contributions 2019
DARIAH Contributions 2019
 
Data curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research processData curation and data archiving at different stages of the research process
Data curation and data archiving at different stages of the research process
 
SUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINESSUSTAINABILITY BEYOND GUIDELINES
SUSTAINABILITY BEYOND GUIDELINES
 
Information science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital ArchiveInformation science in practice - research at a Trusted Digital Archive
Information science in practice - research at a Trusted Digital Archive
 
Why do we need to model the science system?
Why do we need to model the science system?Why do we need to model the science system?
Why do we need to model the science system?
 
Humanities and ICT
Humanities and ICTHumanities and ICT
Humanities and ICT
 
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
Comparison of methods – an unloved duty? Examples from an ongoing bibliometri...
 
Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...Between  information  retrieval  services  and bibliometrics  research. New  ...
Between  information  retrieval  services  and bibliometrics  research. New  ...
 
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
Digital Humanities in The Netherlands DARIAH, CLARIN, CLARIAH, … DHx.0 A pers...
 
Rare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studiesRare (and emergent) disciplines in the light of science studies
Rare (and emergent) disciplines in the light of science studies
 
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...Digital Humanities as Innovation:  ‘constant revolution’ or ‘moving to the su...
Digital Humanities as Innovation: ‘constant revolution’ or ‘moving to the su...
 
Digital Humanities as a Virtual Community
Digital Humanities as a Virtual CommunityDigital Humanities as a Virtual Community
Digital Humanities as a Virtual Community
 
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
Mapping Social Sciences and Humanities - Impact, Orientation, Understanding A...
 
Future of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge MapsFuture of our city - Smart Cities and Knowledge Maps
Future of our city - Smart Cities and Knowledge Maps
 
KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps KnoweScape - means and meaning of knowledge maps
KnoweScape - means and meaning of knowledge maps
 

Recently uploaded

Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
MJDuyan
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
indexPub
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
TechSoup
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
nitinpv4ai
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
EduSkills OECD
 

Recently uploaded (20)

Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) CurriculumPhilippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
Philippine Edukasyong Pantahanan at Pangkabuhayan (EPP) Curriculum
 
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
THE SACRIFICE HOW PRO-PALESTINE PROTESTS STUDENTS ARE SACRIFICING TO CHANGE T...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
Elevate Your Nonprofit's Online Presence_ A Guide to Effective SEO Strategies...
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
Skimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S EliotSkimbleshanks-The-Railway-Cat by T S Eliot
Skimbleshanks-The-Railway-Cat by T S Eliot
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
Andreas Schleicher presents PISA 2022 Volume III - Creative Thinking - 18 Jun...
 

Seed and Expand

  • 1. Seed+Expand aggregating the scientific output of the Netherlands, 2000-2010 Linda Reijnhoudt, Rodrigo Costas, Ed Noyons, Katy Börner, Andrea Scharnhorst 1 linda.reijnhoudt@dans.knaw.nl, andrea.scharnhorst@dans.knaw.nl DANS, Royal Netherlands Academy of Arts and Sciences (KNAW), the Hague, the Netherlands 2 rcostas@cwts.leidenuniv.nl, noyons@cwts.leidenuniv.nl Center for Science and Technology Studies (CWTS)-Leiden University, Leiden, the Netherlands 3 katy@indiana.edu Cyberinfrastructure for Network Science Center, School of Library and Information Science, Indiana University, Bloomington, Indiana, United States of America
  • 2. to study the dynamics on the output of Dutch professors (2001-2011) but, lack of data on the output of full professors! goal
  • 3. the problem given a Dutch professor in the NARCIS system find all his/her publications how to connect bibliographic data from CWTS with the NARCIS system?
  • 4. CWTS Bibliometric publications database: ● author ● author-order ● email (sometimes) ● affiliation (sometimes) ● journal context DANS NARCIS dutch scholars: ● name, initials ● DAI ● affiliations ● organisation ● email = ?
  • 5. non trivial I ● misspelled names ○ Van Knienberg instead Van Knippenberg ● different initials / first name ○ Johannes and Hans ● different formats in the data across sources ○ Prefixes separated in the NARCIS system ■ P.M.P. | van | Bergen en Henegouwen ○ Made initials or concatenated in WoS ■ Henegouwen, PMPVE (Henegouwen, Paul M. P. van Bergen En)
  • 6. non trivial II ● multiple scholars have the same author name (homonymy) ● the same scholar with multiple author names (synonymy) ○ changes over time, e.g., due to marriage
  • 7. the raw data NARCIS database (DANS) ○ 8378 Dutch full professors ■ affiliation to dutch organizations ■ name, initials ■ email ■ DAI CWTS bibiometric data system ○ close to 23 million publications in more than 12,000 journals ○ no unique author identifier for all authors
  • 8. the Gold Standard we already know the complete oeuvre of 1400 Dutch full professors, due to manually verified publication lists by CWTS (2001- 2010) USEFUL TO VALIDATE OUR METHODOLOGY the 1400 of the 8376 (17%) full professors who already appear in this list: the Gold Standard
  • 9. the sources & main overview
  • 10. Seed+Expand main concept ● seed creation, precision ○ given a full professor, {initials, name, email, affiliations} ○ find one or more publications that are most likely authored by this professor ● seed expansion, recall ○ given these 'seed' publications, ○ find publications by the same author 1. publication-based classifications 2. Scopus Author Identifier
  • 11. seed creation 1. Email seed (EM) 2. Author Address approaches (*) a. Reprint Author (RP) b. Direct linkage author-addresses (DL) c. Approximate linkage author addresses (AL) 3. Digital Author Identifier seed (DAI) (*) For these seeds, very common names have been excluded
  • 12. seed expansion 1. CWTS Paper-Based Classification (2001-2011) ○ based on citation relationships of publications ○ 672 meso, over 20K micro disciplines ○ micro: +23% unique papers over seed ○ meso: +34% unique papers over seed 2. Scopus Author Identifier (1996-2011) ○ +69% unique papers over seed
  • 14. results ● 80% of Dutch professors detected ● Micro-disciplines: highest precision (88.5) ● Scopus Author id & micro disciplines: same recall (95.9) ● This methodology can be applied to other sets and author identity schemes (ORCID, VIVO, etc.) ● Further research on disciplinary differences and improvements
  • 15. general discussion ● increasing bibliographic data sources but still lacking author disambiguated data!! ● lack of research on how to connect databases ○ repositories ○ bibliographic databases (WoS, Scopus, etc.) ○ altmetrics ● e-mail data and DAI/ORCID-like identifiers are powerful linking elements across systems
  • 16. the end ... thank you very much for your attention! questions? comments?
  • 17. five seeds combined: 6753 of 8376 full professors found