SlideShare a Scribd company logo
1 of 27
Download to read offline
     :
 ,   
   
W G, B T, S H
Centre for R&D Monitoring and Dept MSI, KU Leuven
 Introduction 
The validity of bibliometric results for research evaluation stands and
falls with the quality of the underlying data originated from various
sources:
• bibliographic databases,
• CVs or institutional publication lists,
• geographic information,
• funding information,
• project application, etc.
☛ These sources are usually created for other purposes than the use
within the framework of bibliometrics.
G  ., Pre-bibliometric data processing, Cape Town, 2014 2/27
Introduction
Background
Therefore an own sub-discipline of bibliometrics, that can be described as
pre-bibliometric data processing, has emerged.
☞ This task is not merely kind of traditional “bibliometric technology” since it
requires specific research methodology that aims at processing and
preparing most heterogeneous data from different sources for application to
bibliometrics and research evaluation.
Objective
The objective of the paper is to describe three typical tasks representing
this type of ‘pre-bibliometrics’.
• Author identification
• Institute assignment
• Address regionalisation
G  ., Pre-bibliometric data processing, Cape Town, 2014 3/27
General issues
In principle, there are two possible ways to meet the challenge of
‘pre-bibliometrics’:
1. Using the database only (internal approach)
2. Using supplementary (external) sources such as publication lists or
CVs (extended approach)
☛ It goes without saying that the validity of results of the second
approach exceeds that of the first one as it uses additional and in
oen more reliable data sources than in bibliographic data can be
found.
G  ., Pre-bibliometric data processing, Cape Town, 2014 4/27
 I. The challenge of author identification 
Correct author identification is indispensable in micro-level studies, such
as studies of scientific careers, researchers’ mobility or in monitoring
constitution and performance of research teams.
• In order to meet the quality requirements that are necessary at this
level, all (co-)authors of relevant publications need to be identified
and assigned to their affiliation.
◦ The laer task is supported by corresponding features of the two large
citation databases Thomson Reuters’ W  S (since 2007) and
Elsevier’s S.
G  ., Pre-bibliometric data processing, Cape Town, 2014 5/27
Basic problems of author identification
Some basic problems of author identification in bibliographic databases
are related to their names:
Synonyms
• Spelling variants (e.g., umlauts, transliteration, name particles,
double names)
• Misprints
• Initials
• Name changes
• Database standard
G  ., Pre-bibliometric data processing, Cape Town, 2014 6/27
Intensifying co-authorship
The problem of synonyms
Variant 1 Variant 2 Variant 3
Umlaut Glänzel Glanzel Glaenzel
Transliteration 王悦 Wang, Y
Name particles Van De Broek, I Broek, I Vande /
Broek, IV
Vandebroek, I
Initials Wemans, Andre Wemans, ADV Wemans, A
Name change Petre, Camelia Stanciu, Camelia Camelia, Stanciu
Database VANRAAN, AFJ VanRaan, AFJ Van Raan, AFJ
Source: H  ., ISSI 2013, 2013
G  ., Pre-bibliometric data processing, Cape Town, 2014 7/27
Basic problems of author identification
Further problems of author identification in bibliographic databases
• Homonyms
• Incomplete records
◦ Incomplete double names or first names/initials
◦ Missing links with affiliation
◦ Missing, incomplete or incorrect corporate address
◦ Ambiguous or changing email address
• Mobility
• Institutional restructuring
G  ., Pre-bibliometric data processing, Cape Town, 2014 8/27
 Possible solutions 
The “internal” approach
Author IDs
Database providers made very early aempts to identify authors
uniquely. There are two basic approaches.
1. Identification by the database provider, for example,
◦ M R Author ID (since 1940, first manually, from
1985 on automated process)
◦ Elsevier’s AuthorID (since 2006, automated process with author
feedback)
2. Identification by authors, for example,
◦ TR ResearcherID (authors are fully responsible for their IDs)
◦ Open Researcher & Contributor ID – ORCID (online October 2012)
Compatible with other IDs (WoS, Scopus, PubMed) and various links
☞ Both approaches have advantages and disadvantages but ambiguity and
incorrectness cannot be completely excluded in neither of these approaches.
G  ., Pre-bibliometric data processing, Cape Town, 2014 9/27
The “internal” approach
Thomson Reuters ResearcherID
Several issues are known:
• RIDs are not only used by individual authors. Some institutes and
author groups mark their publications by an RID.
• Some RIDs claim several papers while the researcher name does not
match any of the authors.
• An RID is not always unique. Several authors are using different
RIDs to claim the same papers with these different RIDs.
G  ., Pre-bibliometric data processing, Cape Town, 2014 10/27
The “internal” approach
Some basic statistics on RIDs (WoS: 2009 – 2011) according to H  . (2013)
Some conclusions
- The example of China shows the
necessity of name disambiguation.
- RID cannot yet be used to substitute
author names.
- Email addresses, affiliation and
co-authors can be used for further
identification with severe limitations
(e.g., mobility, institutional
restructuring, etc.).
Relative frequency of publication activity of RID authors
(bars) vs. all authors (line)
G  ., Pre-bibliometric data processing, Cape Town, 2014 11/27
The “internal” approach
Automated identification on a large scale
Most techniques are based on a combination of name standardisation
and additional components that are considered characteristic for an
author.
• Combinations of name and affiliation/location or subject, shared
co-authors/partners (T  ., 2006)
• Some of the combinations are used to form identity records (B,
2011).
• Approximate Structural Equivalence (T  W, 2010) uses
authors’ “bibliometric fingerprints”. In fact, kind of bibliographic
coupling and direct citations are used.
☞ None of these methods can eliminate all Type I and II errors.
G  ., Pre-bibliometric data processing, Cape Town, 2014 12/27
The “extended” approach
The “extended” approach
External sources can be used to identify authors in bibliographic
databases.
The most convenient way to use external sources is matching the
author’s publication list with database records.
• On a large scale manual identification has become extremely
difficult.
☞ Since 1985 the process of author identification of the Mathematical
Reviews database has been automated.
Automated processes can essentially facilitate the matching of
external sources with bibliographic databases, but also result in false
positive and negative errors.
☞ The objective is to exclude false positives and to considerably reduce
the number of false negatives by final manual correction.
G  ., Pre-bibliometric data processing, Cape Town, 2014 13/27
The “extended” approach
CVs are reliable sources of the authors’ publication records. However,
(author) names alone are not sufficient for a correct match because of the
issues described in the previous section.
Main issues to be tackled when matching external data with database
records:
• Different standards of the two sources
• Incomplete, erroneous or censored data in the external source
• Database errors
G  ., Pre-bibliometric data processing, Cape Town, 2014 14/27
The “extended” approach
Similar errors can be found in external reference sources as well, and CVs
do oen not consequently follow their own standard.
This problem can be solved by including more components such as
complete title, journal title, co-authors and by using similarity measures
and scores instead of exact matches.
• Since name sequences and texts are sensitive to errors and
misprints, similarity measures can be used to assess the
“probability” of a positive match.
• One possible solution is the use of character N-grams in conjunction
with edit (Levenshtein) distance (A  T, 2013).
☞ N-gram based edit distances are sensitive to the order of
components.
G  ., Pre-bibliometric data processing, Cape Town, 2014 15/27
The “extended” approach
3-gram based similarity scores for the CV entry ‘Zhang, L., Thijs, B., Glänzel, W.,
The diffusion of H-related literature. JOI, 2011, 5 (4), 583-593’
WoS Record Variable Score
ZHANG, L, THIJS, B, GLANZEL, W, The diffusion of H-related literature, JOURNAL
OF INFORMETRICS, 5, 583, 2011
1 0.67
ZHANG, L, THIJS, B, GLANZEL, W, The diffusion of H-related literature, 5, 583, 2011 2 0.73
The diffusion of H-related literature, JOURNAL OF INFORMETRICS, 5, 583, 2011 3 0.34
ZHANG, L, THIJS, B, GLANZEL, W, The diffusion of H-related literature 4 0.65
The diffusion of H-related literature 5 0.36
The diffusion of H-related literature, JOURNAL OF INFORMETRICS 6 0.41
Source: A  T, ISSI 2013, 2013
G  ., Pre-bibliometric data processing, Cape Town, 2014 16/27
The “extended” approach
The method based on 3-grams achieved 95% accuracy in a sample set.
 A  T, ISSI 2013, 2013
Two problems remain
• False negatives are the result of unsuccessful match. Their authors
are thus incompletely identified.
• False positives are a major issue since they indicate that the match
was incorrect. Their authors are thus incorrectly identified.
☞ Similar methods are used in spam detection (cf.  K  .,
International Journal on Artificial Intelligence Tools, 2006).
G  ., Pre-bibliometric data processing, Cape Town, 2014 17/27
 II. Institutional assignment 
Data quality provided in the address fields of bibliographic databases
does in practice not meet the requirement of institutional assessment
with strong bibliometric components.
Typical tasks are
• Studies of institutional research performance
• University ranking
• Funding formulas with bibliometric components for universities or
research institutes
• Regional performance indicators
Tasks in Flanders that require correctness of address data at the level
of main institutions
• One major additional funding source and mechanism for
Flemish universities is the “Bijzonder Onderzoeksfonds” (BOF).
• A periodical task is the evaluation of “Strategic Research
Centres” (SOCs).
G  ., Pre-bibliometric data processing, Cape Town, 2014 18/27
Institutional assignment
Example of the variety of spelling variances in the WoS database
(Example)
UNIV ERLANGEN NURNBERG
FRIEDRICH ALEXANDER UNIV
UNIV HOSP ERLANGEN
UNIV ERLANGEN
UNIV ERLANGEN NUREMBERG
FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG
UNIV HOSP ERLANGEN NURNBERG
UNIVERSITAT ERLANGEN NURNBERG
KLINIKUM UNIV ERLANGEN NURNBERG
FRIEDRICH ALEXANDER UNIVERSITAT ERLANGEN NURNBERG
UNIV ERLANGEN NURNBERG POLIKLIN
POLIKLIN UNIV ERLANGEN NURNBERG
UNIV ERLANGEN NURNBERG KLINIKUM
FRIEDRICH ALEXANDER UNIV ERLANGEN
ERLANGEN UNIV HOSP
FAU UNIV ERLANGEN
KLINIKUM FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBE
UNIV ERLANGEN NURNBERG KLIN
UNIV KLINIKUM ERLANGEN NURNBERG
ERLANGEN UNIV
UNIV ERLANGEN NURNBERG HOSP
FRIEDRICH ALEXANDER UNIV ERLANGEN NUREMBERG
FRIEDRICH ALEXANDER UNIV POLIKLIN
UNIV CLIN ERLANGEN NURNBERG
UNIV ERLANGEN NURNEMBERG
UNIV ERLANGER NURNBERG
DR REMEIS STERNWARTE UNIV ERLANGEN NURNBERG
ERLANGEN NUREMBERG UNIV
ERLANGEN NUREMBURG UNIV
FAU UNIV
FREDRICH ALEXANDER UNIV ERLANGEN NURNBERG
FRIEDRICH ALEXANDER UNIV ERLANGEN NUERNBERG
FRIEDRICH ALEXANDER UNIV NURNBERG
FRIEDRICH ALEXANDER UNIV NURNBERG ERLANGEN
HOSP UNIV ERLANGEN NURNBERG
KLIN FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG
KOPFKLIN UNIV ERLANGEN NURNBERG
KUNIV ERLANGEN NURNBERG
UB ERLANGEN NURNBERG
UNIV ERLANDGEN NURNBERG
UNIV ERLANGEN NURNBERG KINDERKLIN
UNIV ERLANGEN NURNBERG KOPFKLINIKUM
UNIV ERLANGEN NURNBERG PSYCHIAT KLIN
UNIV ERLANGEN NURNBURG
UNIV ERLANTEN NURNBERG
UNIV EYE CLIN ERLANGEN NURNBERG
UNIV EYE HOSPITAL ERLANGEN NURNBERG
UNIV GRLANGEN NURNBERG
UNIV KLIN ERLANGEN NURNBERG
UNIV LIB ERLANGEN NURNBERG
UNIV NURNBERG ERLANGEN
FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG (#69)
Data source: Thomson Reuters Web of Science
G  ., Pre-bibliometric data processing, Cape Town, 2014 19/27
Institutional assignment
The BOF and SOCs cleaning procedure
1. Preparation of a complete set of spelling variances of university (or institute)
names to assign publications to the affiliation with each of the universities.
2. For further automatisation of the filtering and assignment process, all street
and city addresses, at which research groups of any of the
universities/institutes under study are located, are added.
3. If the two first filters are not sufficient to assign papers correctly, author
names can be used in addition for a final (and oen manual) check of paper
assignment.
4. Finally all publications are validated, corrected and supplemented by the
corresponding universities (institutes). This is especially important if a
corporate address is not (completely) recorded in the database.
G  ., Pre-bibliometric data processing, Cape Town, 2014 20/27
Institutional assignment
Further possibilities of automatisation in the “internal” and “extended” approach:
• Matching identified spelling variances with address data for obtaining unifying and
coding address fields.
• Matching external sources using identification keys with bibliographic components.
◦ Publication lists provided by institutions obey standards usually more strictly than authors
in their CVs.
Example of building bibliographic identification keys and matching those with WoS data
nanophysics group
Lehrstuhl für Festkörperphysik Jörg P. Kotthaus
Ludwig-Maximilians-Universität München
K. Karrai, R. J. Warburton, C. Schulhauser, A. Högele, B. Urbaszek, E.J. McGhee, A. Govorov, J. M. Garcia, B. D. Gerardot, and
P. M. Petroff
"Hybridization of electronic states in quantum dots through photon emission"
Nature 427, 135-138 (2004) ⇒ 2004-KARR-427-135-N
E. M. Weig, R. H. Blick, T. Brandes, J. Kirschbaum, W. Wegscheider, M. Bichler, and J. P. Kotthaus
"Single-Electron-Phonon Interaction in a Suspended Quantum Dot Phonon Cavity"
Phys. Rev. Lett. 92, 046804-1-4 (2004) ⇒ 2004-WEIG-92-46804-P
S. de Haan, A. Lorke, J. P. Kotthaus, W.Wegscheider, and M. Bichler
"Rectification in Mesoscopic Systems with Broken Symmetry: Quasiclassical Ballistic Versus Classical Transport"
Phys. Rev. Lett. 92, 056806-1 -4 (2004) ⇒ 2004-DE H-92-56806-P
A. Högele, B. Alen, F. Bickel, R. J. Warburton, P. M. Petroff, and K. Karrai
"Exciton fine structure splitting of single InGaAs self-assembled quantum dots"
Physica E 21, 175 – 179 (2004) ⇒ 2004-HOGE-21-175-P
Wendy U. Dittmer and Friedrich C. Simmel
"Transcriptional Control of DNA-Based Nanomachines"
Nano Letters 4, 689 - 691 (2004) ⇒ 2004-WEND-4-689-N
Dominik V. Scheible, and Robert H. Blick
"Silicon nanopillars for mechanical single-electron transport
App. Phys. Lett. 84, 4632 - 4634 (2004) ⇒ 2004-DOMI-84-4634-A
nanophysics group
Lehrstuhl für Festkörperphysik Jörg P. Kotthaus
Ludwig-Maximilians-Universität München
K. Karrai, R. J. Warburton, C. Schulhauser, A. Högele, B. Urbaszek, E.J. McGhee, A. Govorov, J. M. Garcia, B. D. Gerardot, and
P. M. Petroff
"Hybridization of electronic states in quantum dots through photon emission"
Nature 427, 135-138 (2004) ⇒ UT ISI:000187863900030
E. M. Weig, R. H. Blick, T. Brandes, J. Kirschbaum, W. Wegscheider, M. Bichler, and J. P. Kotthaus
"Single-Electron-Phonon Interaction in a Suspended Quantum Dot Phonon Cavity"
Phys. Rev. Lett. 92, 046804-1-4 (2004) ⇒ UT ISI:000188747600042
S. de Haan, A. Lorke, J. P. Kotthaus, W.Wegscheider, and M. Bichler
"Rectification in Mesoscopic Systems with Broken Symmetry: Quasiclassical Ballistic Versus Classical Transport"
Phys. Rev. Lett. 92, 056806-1 -4 (2004) ⇒ UT ISI:000188785200045
A. Högele, B. Alen, F. Bickel, R. J. Warburton, P. M. Petroff, and K. Karrai
"Exciton fine structure splitting of single InGaAs self-assembled quantum dots"
Physica E 21, 175 – 179 (2004) ⇒ UT ISI:000220873300005
Wendy U. Dittmer and Friedrich C. Simmel
"Transcriptional Control of DNA-Based Nanomachines"
Nano Letters 4, 689 - 691 (2004) ⇒ UT ISI:000220857800030
Dominik V. Scheible, and Robert H. Blick
"Silicon nanopillars for mechanical single-electron transport
App. Phys. Lett. 84, 4632 - 4634 (2004) ⇒ UT ISI:000221656900012
G  ., Pre-bibliometric data processing, Cape Town, 2014 21/27
 III. Regional assignment 
Dealing with data at regional level is faced with several challenging
issues:
• The definition or demarcation of a region is not always clear as
administrative, geographical and economic regions might differ.
• Also similar names might refer to different entities.
• The validity of benchmarking studies requires the definition of
regions at the same or at least a comparable level of aggregation.
• Regional information is rarely included in the bibliographic
databases. This results in the necessity of an “extended approach”
using external sources to assign addresses to a particular region.
G  ., Pre-bibliometric data processing, Cape Town, 2014 22/27
Regional assignment
First main issue:
Analogously to the case of subject classification schemes, comparability of results
might easily be endangered when using different hierarchical or aggregation
levels.
Some solutions
- Use predefined classification
schemes like NUTS (OECD) or
UN Geoscheme.
- Precompile (or use existing)
classification tables, e.g.
http://en.wikipedia.org/wiki/List_
of_postal_codes_in_South_Korea
Application at ECOOM
- “Specialization profiles and
diagnostic tools” within the OECD
Smart Specialization Project
G  ., Pre-bibliometric data processing, Cape Town, 2014 23/27
Regional assignment
Because of the lack of regional information in bibliographic data the
“extended” approach using external sources is the most promising
solution.
Novel techniques through web services allow the retrieval regional
information based on address information (city/postal code, street and
country). Above all, the following three services are useful for this
purpose.
• Geonames API (http://www.geonames.org)
• Google Places API (https://developers.google.com/places/)
• Wikipedia Infobox (accessible through http://dbpedia.org/)
G  ., Pre-bibliometric data processing, Cape Town, 2014 24/27
Regional assignment
The first and second service (Geonames and Google Places, respectively)
provide a boom-up approach accepting individual addresses as input
and providing a broad range of information (coordinates, postal code,
country codes, administrative levels). Both provide an output in XML
format.
Google Places has, however, restrictions as it is limited to a small number
of daily requests. Geonames is more open and has a more generous
restriction of daily 30.000 requests.
The third source (Wikipedia Infobox) allows a top-down approach.
These web services can return incorrect or inadequate results. These
services proved to be useful for requests that contain at least three
different address components like country, city, postal code or street name.
G  ., Pre-bibliometric data processing, Cape Town, 2014 25/27
 Conclusions 
• All automated processes produce errors.
• Computerised techniques can considerably facilitate author
identification by name disambiguation and using their publications.
• Significant manual effort is still necessary to resolve all remaining
issues and to complete author identification.
• The same issues also applies mutatis mutandis for the institutional
and regional cleaning, identification and assignment.
G  ., Pre-bibliometric data processing, Cape Town, 2014 26/27
Thank you very much for your aention.
Vielen Dank ür Ihre Aufmerksamkeit!
Hartelijk dank voor uw aandacht!
¡Muchísimas gracias por su atención!
Köszönöm szépen a figyelmüket!
Molte grazie per la vostra aenzione.
Muito obrigado pela vossa atenção.

More Related Content

What's hot

Statistics (All About Data)
Statistics (All About Data)Statistics (All About Data)
Statistics (All About Data)Glenn Rivera
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalAndre Vellino
 
Forensic Science Information Literacy
Forensic Science Information LiteracyForensic Science Information Literacy
Forensic Science Information LiteracyKasper Abcouwer
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicIAEME Publication
 
Seaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD WorkshopSeaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD WorkshopHao Wu
 
Research project guidelines by nmims
Research project guidelines by nmimsResearch project guidelines by nmims
Research project guidelines by nmimsHarshita Wankhedkar
 
Bibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsBibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsElaine Lasda
 
Alternative metrics approach to Croatian OA journals
Alternative metrics approach to Croatian OA journalsAlternative metrics approach to Croatian OA journals
Alternative metrics approach to Croatian OA journalsRudjer Boskovic Institute
 
Research Process: Selecting and Evaluating Sources
Research Process: Selecting and Evaluating SourcesResearch Process: Selecting and Evaluating Sources
Research Process: Selecting and Evaluating SourcesJanice Orcutt
 
Gifted Learner Annotated Bibliographies
Gifted Learner Annotated BibliographiesGifted Learner Annotated Bibliographies
Gifted Learner Annotated BibliographiesElizabeth Johns
 
Citation semantic based approaches to identify article quality
Citation semantic based approaches to identify article qualityCitation semantic based approaches to identify article quality
Citation semantic based approaches to identify article qualitycsandit
 
Recommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviewsRecommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviewsFaisal Razzak
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search EngineJay R Modi
 
Toolkit of-resources-every-researcher-should-use
Toolkit of-resources-every-researcher-should-useToolkit of-resources-every-researcher-should-use
Toolkit of-resources-every-researcher-should-useResearchLeap
 
Sample Entry of Related Literature and Related Study
Sample Entry of Related Literature and Related StudySample Entry of Related Literature and Related Study
Sample Entry of Related Literature and Related StudyJoule Coulomb Ampere
 

What's hot (20)

Statistics (All About Data)
Statistics (All About Data)Statistics (All About Data)
Statistics (All About Data)
 
Manuscript-Kiomars
Manuscript-KiomarsManuscript-Kiomars
Manuscript-Kiomars
 
Measuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equalMeasuring academic influence: Not all citations are equal
Measuring academic influence: Not all citations are equal
 
Forensic Science Information Literacy
Forensic Science Information LiteracyForensic Science Information Literacy
Forensic Science Information Literacy
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logic
 
Seaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD WorkshopSeaform Slides in VLDB 2010 PhD Workshop
Seaform Slides in VLDB 2010 PhD Workshop
 
Research project guidelines by nmims
Research project guidelines by nmimsResearch project guidelines by nmims
Research project guidelines by nmims
 
Bibliometrics: Now There Are Options
Bibliometrics: Now There Are OptionsBibliometrics: Now There Are Options
Bibliometrics: Now There Are Options
 
Syllabus final
Syllabus finalSyllabus final
Syllabus final
 
Alternative metrics approach to Croatian OA journals
Alternative metrics approach to Croatian OA journalsAlternative metrics approach to Croatian OA journals
Alternative metrics approach to Croatian OA journals
 
MSE - Year 1 - Jan 2012
MSE - Year 1 - Jan 2012MSE - Year 1 - Jan 2012
MSE - Year 1 - Jan 2012
 
Text Mining
Text MiningText Mining
Text Mining
 
Research Process: Selecting and Evaluating Sources
Research Process: Selecting and Evaluating SourcesResearch Process: Selecting and Evaluating Sources
Research Process: Selecting and Evaluating Sources
 
Gifted Learner Annotated Bibliographies
Gifted Learner Annotated BibliographiesGifted Learner Annotated Bibliographies
Gifted Learner Annotated Bibliographies
 
Citation semantic based approaches to identify article quality
Citation semantic based approaches to identify article qualityCitation semantic based approaches to identify article quality
Citation semantic based approaches to identify article quality
 
Recommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviewsRecommendations for selection process automation in systematic reviews
Recommendations for selection process automation in systematic reviews
 
Role of Text Mining in Search Engine
Role of Text Mining in Search EngineRole of Text Mining in Search Engine
Role of Text Mining in Search Engine
 
Toolkit of-resources-every-researcher-should-use
Toolkit of-resources-every-researcher-should-useToolkit of-resources-every-researcher-should-use
Toolkit of-resources-every-researcher-should-use
 
Sample Entry of Related Literature and Related Study
Sample Entry of Related Literature and Related StudySample Entry of Related Literature and Related Study
Sample Entry of Related Literature and Related Study
 
Bibliographic metadata (including citation)
Bibliographic metadata (including citation)Bibliographic metadata (including citation)
Bibliographic metadata (including citation)
 

Similar to Pre-bibliometrics Cape Town 2014

Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...NASIG
 
The academic impact of research: Current and the future citation trends in de...
The academic impact of research: Current and the future citation trends in de...The academic impact of research: Current and the future citation trends in de...
The academic impact of research: Current and the future citation trends in de...Nader Ale Ebrahim
 
Running Head DESCRIPTIVE STATISTICS COMPUTING .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING                    .docxRunning Head DESCRIPTIVE STATISTICS COMPUTING                    .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING .docxtodd271
 
Are You Ready to Write Up Your Quantitative Data?
 Are You Ready to Write Up Your Quantitative Data? Are You Ready to Write Up Your Quantitative Data?
Are You Ready to Write Up Your Quantitative Data?DoctoralNet Limited
 
The dos and don'ts in individudal level bibliometrics
The dos and don'ts in individudal level bibliometricsThe dos and don'ts in individudal level bibliometrics
The dos and don'ts in individudal level bibliometricsPaul Wouters
 
Bibliometrix Seminar
Bibliometrix SeminarBibliometrix Seminar
Bibliometrix SeminarMassimo Aria
 
Digital Dissertation Overview - Dissertation Top Gun
Digital Dissertation Overview - Dissertation Top GunDigital Dissertation Overview - Dissertation Top Gun
Digital Dissertation Overview - Dissertation Top GunLifeAfterTelevisionInc
 
Bibliometrics in the library Wageningen UR Library experience
Bibliometrics in the library Wageningen UR Library experienceBibliometrics in the library Wageningen UR Library experience
Bibliometrics in the library Wageningen UR Library experienceWouter Gerritsma
 
Tonta kiev-2015-research-assessment-using-bibliometric-measures
Tonta kiev-2015-research-assessment-using-bibliometric-measuresTonta kiev-2015-research-assessment-using-bibliometric-measures
Tonta kiev-2015-research-assessment-using-bibliometric-measuresYasar Tonta
 
How to conduct a bibliometric analysis?
How to conduct a bibliometric analysis?How to conduct a bibliometric analysis?
How to conduct a bibliometric analysis?Anandhan22
 
PPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptxPPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptxTutors India
 
PPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptxPPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptxphdassistance101
 
Baobab spring 2015 usability and contextual inquiry
Baobab spring 2015   usability and contextual inquiryBaobab spring 2015   usability and contextual inquiry
Baobab spring 2015 usability and contextual inquiryHarry Hochheiser
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseASIS&T
 
Introduction to research and its different aspects
Introduction to research and its different aspectsIntroduction to research and its different aspects
Introduction to research and its different aspectsbarsharoy19
 
Pg writing aims and proposal
Pg writing aims and proposalPg writing aims and proposal
Pg writing aims and proposalRhianWynWilliams
 
Preparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docxPreparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docxsdfghj21
 
Preparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docxPreparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docxwrite5
 

Similar to Pre-bibliometrics Cape Town 2014 (20)

Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
Capturing and Analyzing Publication, Citation and Usage Data for Contextual C...
 
The academic impact of research: Current and the future citation trends in de...
The academic impact of research: Current and the future citation trends in de...The academic impact of research: Current and the future citation trends in de...
The academic impact of research: Current and the future citation trends in de...
 
Running Head DESCRIPTIVE STATISTICS COMPUTING .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING                    .docxRunning Head DESCRIPTIVE STATISTICS COMPUTING                    .docx
Running Head DESCRIPTIVE STATISTICS COMPUTING .docx
 
Are You Ready to Write Up Your Quantitative Data?
 Are You Ready to Write Up Your Quantitative Data? Are You Ready to Write Up Your Quantitative Data?
Are You Ready to Write Up Your Quantitative Data?
 
Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"Szomszor "Methods and Tools for Scholarly Data Analytics"
Szomszor "Methods and Tools for Scholarly Data Analytics"
 
The dos and don'ts in individudal level bibliometrics
The dos and don'ts in individudal level bibliometricsThe dos and don'ts in individudal level bibliometrics
The dos and don'ts in individudal level bibliometrics
 
Bibliometrix Seminar
Bibliometrix SeminarBibliometrix Seminar
Bibliometrix Seminar
 
Viva
VivaViva
Viva
 
Digital Dissertation Overview - Dissertation Top Gun
Digital Dissertation Overview - Dissertation Top GunDigital Dissertation Overview - Dissertation Top Gun
Digital Dissertation Overview - Dissertation Top Gun
 
Bibliometrics in the library Wageningen UR Library experience
Bibliometrics in the library Wageningen UR Library experienceBibliometrics in the library Wageningen UR Library experience
Bibliometrics in the library Wageningen UR Library experience
 
Tonta kiev-2015-research-assessment-using-bibliometric-measures
Tonta kiev-2015-research-assessment-using-bibliometric-measuresTonta kiev-2015-research-assessment-using-bibliometric-measures
Tonta kiev-2015-research-assessment-using-bibliometric-measures
 
How to conduct a bibliometric analysis?
How to conduct a bibliometric analysis?How to conduct a bibliometric analysis?
How to conduct a bibliometric analysis?
 
PPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptxPPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptx
 
PPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptxPPT-How to Conduct Bibliometric Analyses.pptx
PPT-How to Conduct Bibliometric Analyses.pptx
 
Baobab spring 2015 usability and contextual inquiry
Baobab spring 2015   usability and contextual inquiryBaobab spring 2015   usability and contextual inquiry
Baobab spring 2015 usability and contextual inquiry
 
RDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuseRDAP13 Elizabeth Moss: The impact of data reuse
RDAP13 Elizabeth Moss: The impact of data reuse
 
Introduction to research and its different aspects
Introduction to research and its different aspectsIntroduction to research and its different aspects
Introduction to research and its different aspects
 
Pg writing aims and proposal
Pg writing aims and proposalPg writing aims and proposal
Pg writing aims and proposal
 
Preparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docxPreparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docx
 
Preparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docxPreparing for the Literature Developing an Annotated Bibliography.docx
Preparing for the Literature Developing an Annotated Bibliography.docx
 

Recently uploaded

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfnehabiju2046
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real timeSatoshi NAKAHIRA
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Recently uploaded (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
A relative description on Sonoporation.pdf
A relative description on Sonoporation.pdfA relative description on Sonoporation.pdf
A relative description on Sonoporation.pdf
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Grafana in space: Monitoring Japan's SLIM moon lander in real time
Grafana in space: Monitoring Japan's SLIM moon lander  in real timeGrafana in space: Monitoring Japan's SLIM moon lander  in real time
Grafana in space: Monitoring Japan's SLIM moon lander in real time
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
G9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.pptG9 Science Q4- Week 1-2 Projectile Motion.ppt
G9 Science Q4- Week 1-2 Projectile Motion.ppt
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

Pre-bibliometrics Cape Town 2014

  • 1.      :  ,        W G, B T, S H Centre for R&D Monitoring and Dept MSI, KU Leuven
  • 2.  Introduction  The validity of bibliometric results for research evaluation stands and falls with the quality of the underlying data originated from various sources: • bibliographic databases, • CVs or institutional publication lists, • geographic information, • funding information, • project application, etc. ☛ These sources are usually created for other purposes than the use within the framework of bibliometrics. G  ., Pre-bibliometric data processing, Cape Town, 2014 2/27
  • 3. Introduction Background Therefore an own sub-discipline of bibliometrics, that can be described as pre-bibliometric data processing, has emerged. ☞ This task is not merely kind of traditional “bibliometric technology” since it requires specific research methodology that aims at processing and preparing most heterogeneous data from different sources for application to bibliometrics and research evaluation. Objective The objective of the paper is to describe three typical tasks representing this type of ‘pre-bibliometrics’. • Author identification • Institute assignment • Address regionalisation G  ., Pre-bibliometric data processing, Cape Town, 2014 3/27
  • 4. General issues In principle, there are two possible ways to meet the challenge of ‘pre-bibliometrics’: 1. Using the database only (internal approach) 2. Using supplementary (external) sources such as publication lists or CVs (extended approach) ☛ It goes without saying that the validity of results of the second approach exceeds that of the first one as it uses additional and in oen more reliable data sources than in bibliographic data can be found. G  ., Pre-bibliometric data processing, Cape Town, 2014 4/27
  • 5.  I. The challenge of author identification  Correct author identification is indispensable in micro-level studies, such as studies of scientific careers, researchers’ mobility or in monitoring constitution and performance of research teams. • In order to meet the quality requirements that are necessary at this level, all (co-)authors of relevant publications need to be identified and assigned to their affiliation. ◦ The laer task is supported by corresponding features of the two large citation databases Thomson Reuters’ W  S (since 2007) and Elsevier’s S. G  ., Pre-bibliometric data processing, Cape Town, 2014 5/27
  • 6. Basic problems of author identification Some basic problems of author identification in bibliographic databases are related to their names: Synonyms • Spelling variants (e.g., umlauts, transliteration, name particles, double names) • Misprints • Initials • Name changes • Database standard G  ., Pre-bibliometric data processing, Cape Town, 2014 6/27
  • 7. Intensifying co-authorship The problem of synonyms Variant 1 Variant 2 Variant 3 Umlaut Glänzel Glanzel Glaenzel Transliteration 王悦 Wang, Y Name particles Van De Broek, I Broek, I Vande / Broek, IV Vandebroek, I Initials Wemans, Andre Wemans, ADV Wemans, A Name change Petre, Camelia Stanciu, Camelia Camelia, Stanciu Database VANRAAN, AFJ VanRaan, AFJ Van Raan, AFJ Source: H  ., ISSI 2013, 2013 G  ., Pre-bibliometric data processing, Cape Town, 2014 7/27
  • 8. Basic problems of author identification Further problems of author identification in bibliographic databases • Homonyms • Incomplete records ◦ Incomplete double names or first names/initials ◦ Missing links with affiliation ◦ Missing, incomplete or incorrect corporate address ◦ Ambiguous or changing email address • Mobility • Institutional restructuring G  ., Pre-bibliometric data processing, Cape Town, 2014 8/27
  • 9.  Possible solutions  The “internal” approach Author IDs Database providers made very early aempts to identify authors uniquely. There are two basic approaches. 1. Identification by the database provider, for example, ◦ M R Author ID (since 1940, first manually, from 1985 on automated process) ◦ Elsevier’s AuthorID (since 2006, automated process with author feedback) 2. Identification by authors, for example, ◦ TR ResearcherID (authors are fully responsible for their IDs) ◦ Open Researcher & Contributor ID – ORCID (online October 2012) Compatible with other IDs (WoS, Scopus, PubMed) and various links ☞ Both approaches have advantages and disadvantages but ambiguity and incorrectness cannot be completely excluded in neither of these approaches. G  ., Pre-bibliometric data processing, Cape Town, 2014 9/27
  • 10. The “internal” approach Thomson Reuters ResearcherID Several issues are known: • RIDs are not only used by individual authors. Some institutes and author groups mark their publications by an RID. • Some RIDs claim several papers while the researcher name does not match any of the authors. • An RID is not always unique. Several authors are using different RIDs to claim the same papers with these different RIDs. G  ., Pre-bibliometric data processing, Cape Town, 2014 10/27
  • 11. The “internal” approach Some basic statistics on RIDs (WoS: 2009 – 2011) according to H  . (2013) Some conclusions - The example of China shows the necessity of name disambiguation. - RID cannot yet be used to substitute author names. - Email addresses, affiliation and co-authors can be used for further identification with severe limitations (e.g., mobility, institutional restructuring, etc.). Relative frequency of publication activity of RID authors (bars) vs. all authors (line) G  ., Pre-bibliometric data processing, Cape Town, 2014 11/27
  • 12. The “internal” approach Automated identification on a large scale Most techniques are based on a combination of name standardisation and additional components that are considered characteristic for an author. • Combinations of name and affiliation/location or subject, shared co-authors/partners (T  ., 2006) • Some of the combinations are used to form identity records (B, 2011). • Approximate Structural Equivalence (T  W, 2010) uses authors’ “bibliometric fingerprints”. In fact, kind of bibliographic coupling and direct citations are used. ☞ None of these methods can eliminate all Type I and II errors. G  ., Pre-bibliometric data processing, Cape Town, 2014 12/27
  • 13. The “extended” approach The “extended” approach External sources can be used to identify authors in bibliographic databases. The most convenient way to use external sources is matching the author’s publication list with database records. • On a large scale manual identification has become extremely difficult. ☞ Since 1985 the process of author identification of the Mathematical Reviews database has been automated. Automated processes can essentially facilitate the matching of external sources with bibliographic databases, but also result in false positive and negative errors. ☞ The objective is to exclude false positives and to considerably reduce the number of false negatives by final manual correction. G  ., Pre-bibliometric data processing, Cape Town, 2014 13/27
  • 14. The “extended” approach CVs are reliable sources of the authors’ publication records. However, (author) names alone are not sufficient for a correct match because of the issues described in the previous section. Main issues to be tackled when matching external data with database records: • Different standards of the two sources • Incomplete, erroneous or censored data in the external source • Database errors G  ., Pre-bibliometric data processing, Cape Town, 2014 14/27
  • 15. The “extended” approach Similar errors can be found in external reference sources as well, and CVs do oen not consequently follow their own standard. This problem can be solved by including more components such as complete title, journal title, co-authors and by using similarity measures and scores instead of exact matches. • Since name sequences and texts are sensitive to errors and misprints, similarity measures can be used to assess the “probability” of a positive match. • One possible solution is the use of character N-grams in conjunction with edit (Levenshtein) distance (A  T, 2013). ☞ N-gram based edit distances are sensitive to the order of components. G  ., Pre-bibliometric data processing, Cape Town, 2014 15/27
  • 16. The “extended” approach 3-gram based similarity scores for the CV entry ‘Zhang, L., Thijs, B., Glänzel, W., The diffusion of H-related literature. JOI, 2011, 5 (4), 583-593’ WoS Record Variable Score ZHANG, L, THIJS, B, GLANZEL, W, The diffusion of H-related literature, JOURNAL OF INFORMETRICS, 5, 583, 2011 1 0.67 ZHANG, L, THIJS, B, GLANZEL, W, The diffusion of H-related literature, 5, 583, 2011 2 0.73 The diffusion of H-related literature, JOURNAL OF INFORMETRICS, 5, 583, 2011 3 0.34 ZHANG, L, THIJS, B, GLANZEL, W, The diffusion of H-related literature 4 0.65 The diffusion of H-related literature 5 0.36 The diffusion of H-related literature, JOURNAL OF INFORMETRICS 6 0.41 Source: A  T, ISSI 2013, 2013 G  ., Pre-bibliometric data processing, Cape Town, 2014 16/27
  • 17. The “extended” approach The method based on 3-grams achieved 95% accuracy in a sample set.  A  T, ISSI 2013, 2013 Two problems remain • False negatives are the result of unsuccessful match. Their authors are thus incompletely identified. • False positives are a major issue since they indicate that the match was incorrect. Their authors are thus incorrectly identified. ☞ Similar methods are used in spam detection (cf.  K  ., International Journal on Artificial Intelligence Tools, 2006). G  ., Pre-bibliometric data processing, Cape Town, 2014 17/27
  • 18.  II. Institutional assignment  Data quality provided in the address fields of bibliographic databases does in practice not meet the requirement of institutional assessment with strong bibliometric components. Typical tasks are • Studies of institutional research performance • University ranking • Funding formulas with bibliometric components for universities or research institutes • Regional performance indicators Tasks in Flanders that require correctness of address data at the level of main institutions • One major additional funding source and mechanism for Flemish universities is the “Bijzonder Onderzoeksfonds” (BOF). • A periodical task is the evaluation of “Strategic Research Centres” (SOCs). G  ., Pre-bibliometric data processing, Cape Town, 2014 18/27
  • 19. Institutional assignment Example of the variety of spelling variances in the WoS database (Example) UNIV ERLANGEN NURNBERG FRIEDRICH ALEXANDER UNIV UNIV HOSP ERLANGEN UNIV ERLANGEN UNIV ERLANGEN NUREMBERG FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG UNIV HOSP ERLANGEN NURNBERG UNIVERSITAT ERLANGEN NURNBERG KLINIKUM UNIV ERLANGEN NURNBERG FRIEDRICH ALEXANDER UNIVERSITAT ERLANGEN NURNBERG UNIV ERLANGEN NURNBERG POLIKLIN POLIKLIN UNIV ERLANGEN NURNBERG UNIV ERLANGEN NURNBERG KLINIKUM FRIEDRICH ALEXANDER UNIV ERLANGEN ERLANGEN UNIV HOSP FAU UNIV ERLANGEN KLINIKUM FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBE UNIV ERLANGEN NURNBERG KLIN UNIV KLINIKUM ERLANGEN NURNBERG ERLANGEN UNIV UNIV ERLANGEN NURNBERG HOSP FRIEDRICH ALEXANDER UNIV ERLANGEN NUREMBERG FRIEDRICH ALEXANDER UNIV POLIKLIN UNIV CLIN ERLANGEN NURNBERG UNIV ERLANGEN NURNEMBERG UNIV ERLANGER NURNBERG DR REMEIS STERNWARTE UNIV ERLANGEN NURNBERG ERLANGEN NUREMBERG UNIV ERLANGEN NUREMBURG UNIV FAU UNIV FREDRICH ALEXANDER UNIV ERLANGEN NURNBERG FRIEDRICH ALEXANDER UNIV ERLANGEN NUERNBERG FRIEDRICH ALEXANDER UNIV NURNBERG FRIEDRICH ALEXANDER UNIV NURNBERG ERLANGEN HOSP UNIV ERLANGEN NURNBERG KLIN FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG KOPFKLIN UNIV ERLANGEN NURNBERG KUNIV ERLANGEN NURNBERG UB ERLANGEN NURNBERG UNIV ERLANDGEN NURNBERG UNIV ERLANGEN NURNBERG KINDERKLIN UNIV ERLANGEN NURNBERG KOPFKLINIKUM UNIV ERLANGEN NURNBERG PSYCHIAT KLIN UNIV ERLANGEN NURNBURG UNIV ERLANTEN NURNBERG UNIV EYE CLIN ERLANGEN NURNBERG UNIV EYE HOSPITAL ERLANGEN NURNBERG UNIV GRLANGEN NURNBERG UNIV KLIN ERLANGEN NURNBERG UNIV LIB ERLANGEN NURNBERG UNIV NURNBERG ERLANGEN FRIEDRICH ALEXANDER UNIV ERLANGEN NURNBERG (#69) Data source: Thomson Reuters Web of Science G  ., Pre-bibliometric data processing, Cape Town, 2014 19/27
  • 20. Institutional assignment The BOF and SOCs cleaning procedure 1. Preparation of a complete set of spelling variances of university (or institute) names to assign publications to the affiliation with each of the universities. 2. For further automatisation of the filtering and assignment process, all street and city addresses, at which research groups of any of the universities/institutes under study are located, are added. 3. If the two first filters are not sufficient to assign papers correctly, author names can be used in addition for a final (and oen manual) check of paper assignment. 4. Finally all publications are validated, corrected and supplemented by the corresponding universities (institutes). This is especially important if a corporate address is not (completely) recorded in the database. G  ., Pre-bibliometric data processing, Cape Town, 2014 20/27
  • 21. Institutional assignment Further possibilities of automatisation in the “internal” and “extended” approach: • Matching identified spelling variances with address data for obtaining unifying and coding address fields. • Matching external sources using identification keys with bibliographic components. ◦ Publication lists provided by institutions obey standards usually more strictly than authors in their CVs. Example of building bibliographic identification keys and matching those with WoS data nanophysics group Lehrstuhl für Festkörperphysik Jörg P. Kotthaus Ludwig-Maximilians-Universität München K. Karrai, R. J. Warburton, C. Schulhauser, A. Högele, B. Urbaszek, E.J. McGhee, A. Govorov, J. M. Garcia, B. D. Gerardot, and P. M. Petroff "Hybridization of electronic states in quantum dots through photon emission" Nature 427, 135-138 (2004) ⇒ 2004-KARR-427-135-N E. M. Weig, R. H. Blick, T. Brandes, J. Kirschbaum, W. Wegscheider, M. Bichler, and J. P. Kotthaus "Single-Electron-Phonon Interaction in a Suspended Quantum Dot Phonon Cavity" Phys. Rev. Lett. 92, 046804-1-4 (2004) ⇒ 2004-WEIG-92-46804-P S. de Haan, A. Lorke, J. P. Kotthaus, W.Wegscheider, and M. Bichler "Rectification in Mesoscopic Systems with Broken Symmetry: Quasiclassical Ballistic Versus Classical Transport" Phys. Rev. Lett. 92, 056806-1 -4 (2004) ⇒ 2004-DE H-92-56806-P A. Högele, B. Alen, F. Bickel, R. J. Warburton, P. M. Petroff, and K. Karrai "Exciton fine structure splitting of single InGaAs self-assembled quantum dots" Physica E 21, 175 – 179 (2004) ⇒ 2004-HOGE-21-175-P Wendy U. Dittmer and Friedrich C. Simmel "Transcriptional Control of DNA-Based Nanomachines" Nano Letters 4, 689 - 691 (2004) ⇒ 2004-WEND-4-689-N Dominik V. Scheible, and Robert H. Blick "Silicon nanopillars for mechanical single-electron transport App. Phys. Lett. 84, 4632 - 4634 (2004) ⇒ 2004-DOMI-84-4634-A nanophysics group Lehrstuhl für Festkörperphysik Jörg P. Kotthaus Ludwig-Maximilians-Universität München K. Karrai, R. J. Warburton, C. Schulhauser, A. Högele, B. Urbaszek, E.J. McGhee, A. Govorov, J. M. Garcia, B. D. Gerardot, and P. M. Petroff "Hybridization of electronic states in quantum dots through photon emission" Nature 427, 135-138 (2004) ⇒ UT ISI:000187863900030 E. M. Weig, R. H. Blick, T. Brandes, J. Kirschbaum, W. Wegscheider, M. Bichler, and J. P. Kotthaus "Single-Electron-Phonon Interaction in a Suspended Quantum Dot Phonon Cavity" Phys. Rev. Lett. 92, 046804-1-4 (2004) ⇒ UT ISI:000188747600042 S. de Haan, A. Lorke, J. P. Kotthaus, W.Wegscheider, and M. Bichler "Rectification in Mesoscopic Systems with Broken Symmetry: Quasiclassical Ballistic Versus Classical Transport" Phys. Rev. Lett. 92, 056806-1 -4 (2004) ⇒ UT ISI:000188785200045 A. Högele, B. Alen, F. Bickel, R. J. Warburton, P. M. Petroff, and K. Karrai "Exciton fine structure splitting of single InGaAs self-assembled quantum dots" Physica E 21, 175 – 179 (2004) ⇒ UT ISI:000220873300005 Wendy U. Dittmer and Friedrich C. Simmel "Transcriptional Control of DNA-Based Nanomachines" Nano Letters 4, 689 - 691 (2004) ⇒ UT ISI:000220857800030 Dominik V. Scheible, and Robert H. Blick "Silicon nanopillars for mechanical single-electron transport App. Phys. Lett. 84, 4632 - 4634 (2004) ⇒ UT ISI:000221656900012 G  ., Pre-bibliometric data processing, Cape Town, 2014 21/27
  • 22.  III. Regional assignment  Dealing with data at regional level is faced with several challenging issues: • The definition or demarcation of a region is not always clear as administrative, geographical and economic regions might differ. • Also similar names might refer to different entities. • The validity of benchmarking studies requires the definition of regions at the same or at least a comparable level of aggregation. • Regional information is rarely included in the bibliographic databases. This results in the necessity of an “extended approach” using external sources to assign addresses to a particular region. G  ., Pre-bibliometric data processing, Cape Town, 2014 22/27
  • 23. Regional assignment First main issue: Analogously to the case of subject classification schemes, comparability of results might easily be endangered when using different hierarchical or aggregation levels. Some solutions - Use predefined classification schemes like NUTS (OECD) or UN Geoscheme. - Precompile (or use existing) classification tables, e.g. http://en.wikipedia.org/wiki/List_ of_postal_codes_in_South_Korea Application at ECOOM - “Specialization profiles and diagnostic tools” within the OECD Smart Specialization Project G  ., Pre-bibliometric data processing, Cape Town, 2014 23/27
  • 24. Regional assignment Because of the lack of regional information in bibliographic data the “extended” approach using external sources is the most promising solution. Novel techniques through web services allow the retrieval regional information based on address information (city/postal code, street and country). Above all, the following three services are useful for this purpose. • Geonames API (http://www.geonames.org) • Google Places API (https://developers.google.com/places/) • Wikipedia Infobox (accessible through http://dbpedia.org/) G  ., Pre-bibliometric data processing, Cape Town, 2014 24/27
  • 25. Regional assignment The first and second service (Geonames and Google Places, respectively) provide a boom-up approach accepting individual addresses as input and providing a broad range of information (coordinates, postal code, country codes, administrative levels). Both provide an output in XML format. Google Places has, however, restrictions as it is limited to a small number of daily requests. Geonames is more open and has a more generous restriction of daily 30.000 requests. The third source (Wikipedia Infobox) allows a top-down approach. These web services can return incorrect or inadequate results. These services proved to be useful for requests that contain at least three different address components like country, city, postal code or street name. G  ., Pre-bibliometric data processing, Cape Town, 2014 25/27
  • 26.  Conclusions  • All automated processes produce errors. • Computerised techniques can considerably facilitate author identification by name disambiguation and using their publications. • Significant manual effort is still necessary to resolve all remaining issues and to complete author identification. • The same issues also applies mutatis mutandis for the institutional and regional cleaning, identification and assignment. G  ., Pre-bibliometric data processing, Cape Town, 2014 26/27
  • 27. Thank you very much for your aention. Vielen Dank ür Ihre Aufmerksamkeit! Hartelijk dank voor uw aandacht! ¡Muchísimas gracias por su atención! Köszönöm szépen a figyelmüket! Molte grazie per la vostra aenzione. Muito obrigado pela vossa atenção.