SlideShare a Scribd company logo
Can we track the geographic origin of
surnames based on bibliographic data?
Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas
15th INTERNATIONAL CONFERENCE
ON SCIENTOMETRICS & INFORMETRICS
29 June – 3 July, 2015,
Bogazici University, Istanbul, Turkey
EC3metrics spin off CWTS
Leiden University
Agenda
oBackground
oBibliographic data
oMethod 1. Kullback-Leibler divergence
oMethod 2. Concentration Index
oThe ‘golden list’
oNext or previous steps
Background
“the use of surnames in human population biology dates back to
1875, when George Darwin used frequency of occurrences of the
same surname in married couples to study in-breeding”
Kissin, 2011
WHAT IS IN A SURNAME?
o Proxy for genetic/ethnic origin
-> Epidemiology, Biomedical research
o Proxy for country origin
-> Demographic studies, migratory movements
Background
o The representation of Jewish surnames in biomedical journals
and US-patents
Kissin, 2011; Kissin & Bradley, 2013
o Relation between ethnic mix collaboration and citation impact
Freeman & Huang, 2014
… in the field of bibliometrics
Background
HOW CAN WE DETERMINE THE GEOGRAPHIC ORIGIN OF
SURNAMES?
METHODS
o Manually curated lists
o Probability and Bayesian
methods
o Clustering techniques
DATA SOURCES
o National census
o Dispersion of sources
o Lack of international
coverage
Bibliographic data
o Scientific databases as international surnames data
sources
Regional restrictions Temporal restrictions
o Establishing ‘trusted’ linkages between surnames and
countries
Reprint address First author-First address
One country publications Author-address linkages (2008)
Bibliographic data
o Scientific databases as international surnames data
sources
Regional restrictions Temporal restrictions
o Establishing ‘trusted’ linkages between surnames and
countries
Some figures:
-> 1,568,052 distinct surnames assigned
to 119 countries
-> France 8,8%; Germany 8,0%;
Russia 7,1%; Spain 4,9%
Assumptions
HYPOTHESIS 1
A surname should be assigned to the country where there
is a higher frequency of such surname
HYPOTHESIS 2
A surname should be assigned to the country where there
is a greater concentration of such surname.
Method 1. Kullback-Leibler
OPERATIONALIZATION
A surname will be assigned to a country if 1) it has the highest
frequency, and 2) there are “certain levels of assurance”.
METHOD 1
Kullback-Leibler divergence
indicates the (dis)similarity of a
global surname distribution with its
distribution in each country.
Method 2. Gini Index
OPERATIONALIZATION
A surname will be assigned to a country if it is the one with the
highest concentration of such surname.
METHOD 2
Gini Index is an inequality indicator
already employed for other
purposes in bibliometrics. It ponder
within 0 and 1 the concentration of
a surname in a country.
Kulback-Leibler vs. Gini index
Country No. surnames
FRANCE 138349
GERMANY 112445
RUSSIA 111716
SPAIN 83529
USA 76219
ITALY 69637
ENGLAND 63885
JAPAN 56345
CANADA 49775
NETHERLANDS 41306
Country No. surnames
USA 310739
FRANCE 117938
GERMANY 111375
RUSSIA 94369
ITALY 65699
JAPAN 52399
ENGLAND 47521
CANADA 46146
POLAND 44087
INDIA 42897
Method 1. Kullback-Leibler Method 2. Gini index
Top 10 countries with the highest number of surnames assigned
Kulback-Leibler vs. Gini index
Surname Country
CLINTON USA
EGGHE BELGIUM
GARFIELD USA
HERRERA SPAIN
GARCIA SPAIN
EINSTEIN USA
NOYONS NETHERLANDS
PEREIRA BRAZIL
Method 1. Kullback-Leibler Method 2. Gini index
Top 10 countries with the highest number of surnames assigned
Surname Country
CLINTON USA
EGGHE BELGIUM
GARFIELD USA
HERRERA CUBA
GARCIA CUBA
EINSTEIN ISRAEL
NOYONS NETHERLANDS
PEREIRA PORTUGAL
The ‘golden list’
Validating the methods proposed
SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS
o Coverage
o Criteria
› Language
› Ethnicity
› Historical origin
o Reliance and double assignments
The ‘golden list’
Validating the methods proposed
SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS
o Coverage
o Criteria
› Language
› Ethnicity
› Historical origin
o Reliance and double assignments
The ‘golden list’
Validating the methods proposed
Unified country Languages
Denmark Danish
England
Celtic; Anglo-Cornish; English; Scottish;
Irish
Finland Finnish
France Breton; French
Germany German
Greece Greek
Iceland Icelandic
Italy Italian
Japan Japanese
Netherlands Afrikaans; Dutch
Portugal Portuguese
Spain Basque; Catalan; Galician;
In search for a
‘golden list’ of
surnames assigned
to
countries/languages/
ethnicities
http://en.wikipedia.org/wiki/Category:Surnames_by_language
The ‘golden list’
METHOD 1 METHOD 2
Countries % coverage % correct % coverage % correct
DENMARK 91.1% 68.75% 100% 60.16%
ENGLAND 28.8% 80.97% 100% 58.56%
FINLAND 99.11 94.62% 100% 91.96%
FRANCE 88.08% 68.28% 100% 50.54%
GERMANY 52.24% 69.00% 100% 43.78%
GREECE 84.12% 78.32% 100% 78.57%
ICELAND 100.00% 65.52% 100% 100.00%
ITALY 87.65% 86.97% 100% 64.77%
JAPAN 98.74% 98.95% 100% 91.39%
NETHERLANDS 88.11% 60.96% 100% 41.67%
PORTUGAL 98.54% 92.59% 100% 91.91%
SPAIN 93.18% 48.74% 100% 54.74%
Total 73.22% 79.03% 100% 61.29%
Next or previous steps
o Is the Web of Science a good sample of the world
population?
› Country census crossed with the WoS
o Time frames and migratory movements
› Apply methods to different periods
o Validation and comparison with other techniques
› Bayesian, probability, clustering
o Multiple assignments of countries (e.g., Lee, Santos)
Thank you! elrobin@ugr.es
Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas
15th INTERNATIONAL CONFERENCE
ON SCIENTOMETRICS & INFORMETRICS
29 June – 3 July, 2015,
Bogazici University, Istanbul, Turkey
EC3metrics spin off CWTS
Leiden University

More Related Content

Similar to Can we track the geography of surnames based on bibliographic data?

Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.comDiagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
Veronica Johnson
 
Language and Thought Processes
Language and Thought ProcessesLanguage and Thought Processes
Language and Thought Processessavitach
 
IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015Alexa Walker
 
Genomics and its application in forest health
Genomics and its application in forest healthGenomics and its application in forest health
Genomics and its application in forest health
Amanda Roe
 
Development of english
Development of englishDevelopment of english
Development of english
Rachelle Guevarra
 
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Heroen Verbruggen
 
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD ThesisLaura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD ThesisLaura Wood
 
varieties-and-registers-1.pptx
varieties-and-registers-1.pptxvarieties-and-registers-1.pptx
varieties-and-registers-1.pptx
JulianneBeaNotarte
 
non mendelian inheretance.pptx
non mendelian inheretance.pptxnon mendelian inheretance.pptx
non mendelian inheretance.pptx
pauloalegria3
 
FORDA PPT.pptx
FORDA PPT.pptxFORDA PPT.pptx
FORDA PPT.pptx
pauloalegria3
 

Similar to Can we track the geography of surnames based on bibliographic data? (10)

Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.comDiagnostic Essay Sample. How to Write a Diagnostic Essay  CustomEssayMeister.com
Diagnostic Essay Sample. How to Write a Diagnostic Essay CustomEssayMeister.com
 
Language and Thought Processes
Language and Thought ProcessesLanguage and Thought Processes
Language and Thought Processes
 
IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015IPinCH_aDNAFactSheet_final_revisedOct2015
IPinCH_aDNAFactSheet_final_revisedOct2015
 
Genomics and its application in forest health
Genomics and its application in forest healthGenomics and its application in forest health
Genomics and its application in forest health
 
Development of english
Development of englishDevelopment of english
Development of english
 
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...Evolution of the thermal niche and its impact on biodiversity patterns in sea...
Evolution of the thermal niche and its impact on biodiversity patterns in sea...
 
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD ThesisLaura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
Laura Wood Diversity and Distribution of Amphibians in Luxembourg PhD Thesis
 
varieties-and-registers-1.pptx
varieties-and-registers-1.pptxvarieties-and-registers-1.pptx
varieties-and-registers-1.pptx
 
non mendelian inheretance.pptx
non mendelian inheretance.pptxnon mendelian inheretance.pptx
non mendelian inheretance.pptx
 
FORDA PPT.pptx
FORDA PPT.pptxFORDA PPT.pptx
FORDA PPT.pptx
 

More from Nicolas Robinson-Garcia

Task specialization across research careers
Task specialization across research careersTask specialization across research careers
Task specialization across research careers
Nicolas Robinson-Garcia
 
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso AbiertoNuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nicolas Robinson-Garcia
 
Indicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidadIndicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidad
Nicolas Robinson-Garcia
 
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Nicolas Robinson-Garcia
 
The effects of specialization on research careers
The effects of specialization on research careersThe effects of specialization on research careers
The effects of specialization on research careers
Nicolas Robinson-Garcia
 
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
Nicolas Robinson-Garcia
 
Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...
Nicolas Robinson-Garcia
 
Towards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientistsTowards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientists
Nicolas Robinson-Garcia
 
Breaking the Wall of Science Policy
Breaking the Wall of Science PolicyBreaking the Wall of Science Policy
Breaking the Wall of Science Policy
Nicolas Robinson-Garcia
 
Practical Applications of Altmetrics
Practical Applications of AltmetricsPractical Applications of Altmetrics
Practical Applications of Altmetrics
Nicolas Robinson-Garcia
 
Introduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google ScholarIntroduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google Scholar
Nicolas Robinson-Garcia
 
Aplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las AltmétricasAplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las Altmétricas
Nicolas Robinson-Garcia
 
Curso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias SocialesCurso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias Sociales
Nicolas Robinson-Garcia
 
Altmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucionalAltmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucional
Nicolas Robinson-Garcia
 
From theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC frameworkFrom theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC framework
Nicolas Robinson-Garcia
 
Practical applications of altmetrics
Practical applications of altmetricsPractical applications of altmetrics
Practical applications of altmetrics
Nicolas Robinson-Garcia
 
Disentangling gold open access
Disentangling gold open accessDisentangling gold open access
Disentangling gold open access
Nicolas Robinson-Garcia
 
Making an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicatorsMaking an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicators
Nicolas Robinson-Garcia
 
The SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiencesThe SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiences
Nicolas Robinson-Garcia
 
Indicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricosIndicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricos
Nicolas Robinson-Garcia
 

More from Nicolas Robinson-Garcia (20)

Task specialization across research careers
Task specialization across research careersTask specialization across research careers
Task specialization across research careers
 
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso AbiertoNuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
Nuevas fuentes bibliométricas abiertas: Altmetrics y Acceso Abierto
 
Indicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidadIndicadores avanzados: Acceso Abierto y movilidad
Indicadores avanzados: Acceso Abierto y movilidad
 
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...Unveiling the Ecosystem of Science: How can we characterize and assess divers...
Unveiling the Ecosystem of Science: How can we characterize and assess divers...
 
The effects of specialization on research careers
The effects of specialization on research careersThe effects of specialization on research careers
The effects of specialization on research careers
 
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
¿Cómo preparar y afrontar con éxito una estancia de investigación internacional?
 
Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...Aligning scientific impact and societal relevance: The roles of academic enga...
Aligning scientific impact and societal relevance: The roles of academic enga...
 
Towards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientistsTowards a multidimensional valuation model of scientists
Towards a multidimensional valuation model of scientists
 
Breaking the Wall of Science Policy
Breaking the Wall of Science PolicyBreaking the Wall of Science Policy
Breaking the Wall of Science Policy
 
Practical Applications of Altmetrics
Practical Applications of AltmetricsPractical Applications of Altmetrics
Practical Applications of Altmetrics
 
Introduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google ScholarIntroduction to bibliometric data sources - Google Scholar
Introduction to bibliometric data sources - Google Scholar
 
Aplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las AltmétricasAplicaciones prácticas de las Altmétricas
Aplicaciones prácticas de las Altmétricas
 
Curso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias SocialesCurso básico de lenguaje R aplicado a las Ciencias Sociales
Curso básico de lenguaje R aplicado a las Ciencias Sociales
 
Altmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucionalAltmétricas aplicadas a nivel institucional
Altmétricas aplicadas a nivel institucional
 
From theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC frameworkFrom theory to practice: Operationalization of the GTEC framework
From theory to practice: Operationalization of the GTEC framework
 
Practical applications of altmetrics
Practical applications of altmetricsPractical applications of altmetrics
Practical applications of altmetrics
 
Disentangling gold open access
Disentangling gold open accessDisentangling gold open access
Disentangling gold open access
 
Making an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicatorsMaking an impact: Scientific profiles and bibliometric indicators
Making an impact: Scientific profiles and bibliometric indicators
 
The SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiencesThe SSH conundrum: A matter of audiences
The SSH conundrum: A matter of audiences
 
Indicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricosIndicadores de movilidad científica basados en datos bibliométricos
Indicadores de movilidad científica basados en datos bibliométricos
 

Recently uploaded

How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
vaibhavrinwa19
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
heathfieldcps1
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
JosvitaDsouza2
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 

Recently uploaded (20)

How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Acetabularia Information For Class 9 .docx
Acetabularia Information For Class 9  .docxAcetabularia Information For Class 9  .docx
Acetabularia Information For Class 9 .docx
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
The basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptxThe basics of sentences session 5pptx.pptx
The basics of sentences session 5pptx.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx1.4 modern child centered education - mahatma gandhi-2.pptx
1.4 modern child centered education - mahatma gandhi-2.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 

Can we track the geography of surnames based on bibliographic data?

  • 1. Can we track the geographic origin of surnames based on bibliographic data? Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas 15th INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS 29 June – 3 July, 2015, Bogazici University, Istanbul, Turkey EC3metrics spin off CWTS Leiden University
  • 2. Agenda oBackground oBibliographic data oMethod 1. Kullback-Leibler divergence oMethod 2. Concentration Index oThe ‘golden list’ oNext or previous steps
  • 3. Background “the use of surnames in human population biology dates back to 1875, when George Darwin used frequency of occurrences of the same surname in married couples to study in-breeding” Kissin, 2011 WHAT IS IN A SURNAME? o Proxy for genetic/ethnic origin -> Epidemiology, Biomedical research o Proxy for country origin -> Demographic studies, migratory movements
  • 4. Background o The representation of Jewish surnames in biomedical journals and US-patents Kissin, 2011; Kissin & Bradley, 2013 o Relation between ethnic mix collaboration and citation impact Freeman & Huang, 2014 … in the field of bibliometrics
  • 5. Background HOW CAN WE DETERMINE THE GEOGRAPHIC ORIGIN OF SURNAMES? METHODS o Manually curated lists o Probability and Bayesian methods o Clustering techniques DATA SOURCES o National census o Dispersion of sources o Lack of international coverage
  • 6. Bibliographic data o Scientific databases as international surnames data sources Regional restrictions Temporal restrictions o Establishing ‘trusted’ linkages between surnames and countries Reprint address First author-First address One country publications Author-address linkages (2008)
  • 7. Bibliographic data o Scientific databases as international surnames data sources Regional restrictions Temporal restrictions o Establishing ‘trusted’ linkages between surnames and countries Some figures: -> 1,568,052 distinct surnames assigned to 119 countries -> France 8,8%; Germany 8,0%; Russia 7,1%; Spain 4,9%
  • 8. Assumptions HYPOTHESIS 1 A surname should be assigned to the country where there is a higher frequency of such surname HYPOTHESIS 2 A surname should be assigned to the country where there is a greater concentration of such surname.
  • 9. Method 1. Kullback-Leibler OPERATIONALIZATION A surname will be assigned to a country if 1) it has the highest frequency, and 2) there are “certain levels of assurance”. METHOD 1 Kullback-Leibler divergence indicates the (dis)similarity of a global surname distribution with its distribution in each country.
  • 10. Method 2. Gini Index OPERATIONALIZATION A surname will be assigned to a country if it is the one with the highest concentration of such surname. METHOD 2 Gini Index is an inequality indicator already employed for other purposes in bibliometrics. It ponder within 0 and 1 the concentration of a surname in a country.
  • 11. Kulback-Leibler vs. Gini index Country No. surnames FRANCE 138349 GERMANY 112445 RUSSIA 111716 SPAIN 83529 USA 76219 ITALY 69637 ENGLAND 63885 JAPAN 56345 CANADA 49775 NETHERLANDS 41306 Country No. surnames USA 310739 FRANCE 117938 GERMANY 111375 RUSSIA 94369 ITALY 65699 JAPAN 52399 ENGLAND 47521 CANADA 46146 POLAND 44087 INDIA 42897 Method 1. Kullback-Leibler Method 2. Gini index Top 10 countries with the highest number of surnames assigned
  • 12. Kulback-Leibler vs. Gini index Surname Country CLINTON USA EGGHE BELGIUM GARFIELD USA HERRERA SPAIN GARCIA SPAIN EINSTEIN USA NOYONS NETHERLANDS PEREIRA BRAZIL Method 1. Kullback-Leibler Method 2. Gini index Top 10 countries with the highest number of surnames assigned Surname Country CLINTON USA EGGHE BELGIUM GARFIELD USA HERRERA CUBA GARCIA CUBA EINSTEIN ISRAEL NOYONS NETHERLANDS PEREIRA PORTUGAL
  • 13. The ‘golden list’ Validating the methods proposed SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS o Coverage o Criteria › Language › Ethnicity › Historical origin o Reliance and double assignments
  • 14. The ‘golden list’ Validating the methods proposed SEARCHING A ‘GOLDEN LIST’ TO VALIDATE THE RESULTS o Coverage o Criteria › Language › Ethnicity › Historical origin o Reliance and double assignments
  • 15. The ‘golden list’ Validating the methods proposed Unified country Languages Denmark Danish England Celtic; Anglo-Cornish; English; Scottish; Irish Finland Finnish France Breton; French Germany German Greece Greek Iceland Icelandic Italy Italian Japan Japanese Netherlands Afrikaans; Dutch Portugal Portuguese Spain Basque; Catalan; Galician; In search for a ‘golden list’ of surnames assigned to countries/languages/ ethnicities http://en.wikipedia.org/wiki/Category:Surnames_by_language
  • 16. The ‘golden list’ METHOD 1 METHOD 2 Countries % coverage % correct % coverage % correct DENMARK 91.1% 68.75% 100% 60.16% ENGLAND 28.8% 80.97% 100% 58.56% FINLAND 99.11 94.62% 100% 91.96% FRANCE 88.08% 68.28% 100% 50.54% GERMANY 52.24% 69.00% 100% 43.78% GREECE 84.12% 78.32% 100% 78.57% ICELAND 100.00% 65.52% 100% 100.00% ITALY 87.65% 86.97% 100% 64.77% JAPAN 98.74% 98.95% 100% 91.39% NETHERLANDS 88.11% 60.96% 100% 41.67% PORTUGAL 98.54% 92.59% 100% 91.91% SPAIN 93.18% 48.74% 100% 54.74% Total 73.22% 79.03% 100% 61.29%
  • 17. Next or previous steps o Is the Web of Science a good sample of the world population? › Country census crossed with the WoS o Time frames and migratory movements › Apply methods to different periods o Validation and comparison with other techniques › Bayesian, probability, clustering o Multiple assignments of countries (e.g., Lee, Santos)
  • 18. Thank you! elrobin@ugr.es Nicolas Robinson-Garcia, Ed Noyons & Rodrigo Costas 15th INTERNATIONAL CONFERENCE ON SCIENTOMETRICS & INFORMETRICS 29 June – 3 July, 2015, Bogazici University, Istanbul, Turkey EC3metrics spin off CWTS Leiden University